Skip to content

Latest commit

 

History

History
366 lines (292 loc) · 7.35 KB

6.824 Notes on the distributed system.md

File metadata and controls

366 lines (292 loc) · 7.35 KB

Class 1.

Google File System

GFS

Map Reduce

Usually Map is reading from local system
shuffle
	some in streams of data
10 Tegabyte

Class 2. Rpc and Threads

  • Use Go in the Lab Go has good rpc;amazing in threads,lock,concurrent. Reference book: Effective Go.
  • goroutine reason to use threads:
  • I/O concurrency
  • Multi-core parallelism
  • Convenience compared to event-driven Channel in Go.

Coodination

Channels
sync.Cond
waitGroup
// Serial Crawler
func Serial(url string, fetcher Fecther, fetched map[string]bool) {
	if fetched[url] {
		return
	}
	fetched[url] = true
	urls, err := fetcher.Fetch(url)
	if err != nil {
		return
	}
	for _, u := range urls {
		Serial(u, fetcher, fetched)
	}
	return
}

// Concurrent crawler
type fetchState struct {
	mu     Sync.Mutex
	fetched map[string]bool
}

func ConcurrentMutex(url string, fetcher Fetcher, f *fetchState) {
	f.mu.Lock()
	already := f.fetched[url]
	f.fetched[url] = true
	f.mu.Unlock()
	if already {
		return
	}
	urls, err := fetcher.Fetch(url)
	if err != nil {
		return
	}
	var done sync.WaitGroup
	for _, u := range urls {
		done.Add(1)
		go func(u string) {
			defer done.Done()
			ConcurrentMutex(u, fetcher, f)
		}(u)
	}
	done.Wait()
	return
}


func makeState() *fetchState {
}

func worker(url string, ch chan []string, fetcher Fetcher) {
	urls, err := fetcher.Fetch(url)
	if err != nil {
		ch <- [] string{}
	} else {
		ch <- urls
	}
}

func master(ch chan []string, fetcher Fetcher) {
	n := 1
	fetched := make(map[string]bool)
	for urls := range ch {
		for _, u := range urls {
			if fetched[u] == false {
				fetched[u] = true
				n += 1
				go worker(u, ch, fetcher)
			}
		}
		n -= 1
		if n == 0 {
			break
		}
	}
}

func ConcurrentChannel(url string, fetcher Fetcher) {
	ch := make(chan []string)
	go func() {
		ch <- []string{url}
	}()
	master(ch, fetcher)
}

Shell code:
go run -race crawler.go

closure 的参数变量会分配到堆上

go: race detector

Class 3. Storage

Why hard: performance faults tolerance consistency strong consistency

  • Bad Repl Design many client writes different servers

GFS

Big Fast
Global
Sharding
Automatic Recovery
handle:checkpoint
  • Master Data

    • filename
    • handle -> list of chunk servers version (# 17) master server primary (v) server lease expiration
    • chunks
    • buffer
    • send
    • recv and merge
  • Reads and Writes primary and secondary nodes

  • Read Operation

    • client send name, offset to server
    • Master server find chunk server by name in the handle
    • server respond the data
  • Write Operation

    • no primary - on master
    • find up to date replicas
    • pick Primary, Secondary
    • increment Version#
    • Tells P,S, V#
  • Split Brain

  • Concurrent Record Appends

    • when break down the request return an error
    • file order
    • if the primary crash
    • if the secondary also crashes

Class 4. Primary-Backup Replication

Profile: Distributed systesms fault tolerance.
backup.

Replicated State Machine Q: What kind of states are supposed to be replicated? A: some application level should be replicated

Non-determined events

  • log entry
  • inputs - packets - data -interrupt
  • weird instructions
  • multicore behavoir prediction

interrupt during a thousand instructions running on the vmm architecure.(How does the machine get rid of the influence of the computer?)

Example

Primary Server ->Db Server Step1: client send increment request Step2: Primary dies, the backup doesn't receive the backup logging. Result: Client can't get consistent data. The backup knows the same connection and TCP sequence like primary server. TEST-AND-SET

Lecture 5. (The Lab Guide)

Go, Threads, and Raft

package main
import "sync"
func main() {
    var a string
    var wg sync.WaitGroup
    wg.Add(1)
    go func() {
        a = "hello world"
        wg.Done()
    }()
    wg.Wait()
    println(a)
}
package main
import "sync"
import "time"
func main() {
    couter := 0
    var mu sync.Mutex
    for i := 0; i < 1000; i++ {
        go func() {
            mu.Lock()
            defer mu.Unlock()
            counter = counter + 1
        }()
    }
    time.Sleep(1 * time.Second)
    mu.Lock()
    println(counter)
    mu.Unlock()
}

Condition Variable

  • aim: to improve efficiency and the code writing way.
  • problem: lost wake up problem
  • difference between and signal and broadcast in Golang.
  • The TA usually doesn't use the channel but the primitives.

Part III

lecture-6 Fault Tolerance-Raft I

Aim: rule-out the split-brain phenomena

Background

Majority, 或者quorum,指超过一半的机器数目存活
    2f + 1 机器中, 最多容忍f台机器宕机

Raft 协议内容

  • 数据结构 k/v 结构
    log entry 存储操作
    election timer
    term 操作序号,半递增
    request votes

  • Majority Vote(Leader Elect)

    Raft

  • Log Entry

  1. Raft的Function 作用:

    maintain replication

  2. layer k/v storage above raft

  3. progress request->replicate data to major machines->update k/v storage(committed)

  4. The meaning of log in Raft

  5. to save the operation order of client request

  6. to help with resend the request to worker servers when they missed some operations

  7. to help the leader server rejoin the cluster when it reboots

Q: what if the workers are unable to process requests from the leader in time?

解决log divergent

场景: leader服务器部分发送log entry到server后自己crash掉了; 新选的leader重新发送log

比较: Paxos没有leader, Raft有; 而且Raft更加高效

What's the concrete behavior of split-brain? 两个实体(消息队列,机器)中有一个不可用

Q: Whether it will be slow to send so many requests in the process in general situation? A: The teacher doesn't think so.

lecture-7 Fault Tolerance Raft II

Q: Why not longest log as leader? A: Because some log

If a node is the leader, it must get majority votes from other nodes.

Election Restriction

分区问题下的majority选举导致的log不一致

log backup acceleration

Fast Backup

one follower:

three cases(数字为任期)
case1:
S1 4 5 5
S2 4 6 6 6

case2:
S1 4 4 4
S2 4 6 6 6

case3:
S1 4
S2 4 6 6 6

State Machine Safety

Persistent

For Each Node:

log  
current term  
voted for  
currentTerm 必须持久化, otherwise 当高任期的leader crash, 其他的follower可能重新选择低的term开始

For Log:

commitIndex: the max index for log committed  

Log compaction and snapshot

the aim: the reduce the waste of replaying the steps when the leader crash for a long time or has missed millions of entries.

snapshot: 舍弃掉已经被client确认的log部分;

snapshot的问题: 如果有follower的最终log在snapshot点之前,则它无法恢复。 解决方式: install snapshot rpc

correct system

linearizability

在序列图里如果有环则不是linearizable

Chapter 8 Zookeeper