Google File System
GFS
Usually Map is reading from local system
shuffle
some in streams of data
10 Tegabyte
- Use Go in the Lab Go has good rpc;amazing in threads,lock,concurrent. Reference book: Effective Go.
- goroutine reason to use threads:
- I/O concurrency
- Multi-core parallelism
- Convenience compared to event-driven Channel in Go.
Channels
sync.Cond
waitGroup
// Serial Crawler
func Serial(url string, fetcher Fecther, fetched map[string]bool) {
if fetched[url] {
return
}
fetched[url] = true
urls, err := fetcher.Fetch(url)
if err != nil {
return
}
for _, u := range urls {
Serial(u, fetcher, fetched)
}
return
}
// Concurrent crawler
type fetchState struct {
mu Sync.Mutex
fetched map[string]bool
}
func ConcurrentMutex(url string, fetcher Fetcher, f *fetchState) {
f.mu.Lock()
already := f.fetched[url]
f.fetched[url] = true
f.mu.Unlock()
if already {
return
}
urls, err := fetcher.Fetch(url)
if err != nil {
return
}
var done sync.WaitGroup
for _, u := range urls {
done.Add(1)
go func(u string) {
defer done.Done()
ConcurrentMutex(u, fetcher, f)
}(u)
}
done.Wait()
return
}
func makeState() *fetchState {
}
func worker(url string, ch chan []string, fetcher Fetcher) {
urls, err := fetcher.Fetch(url)
if err != nil {
ch <- [] string{}
} else {
ch <- urls
}
}
func master(ch chan []string, fetcher Fetcher) {
n := 1
fetched := make(map[string]bool)
for urls := range ch {
for _, u := range urls {
if fetched[u] == false {
fetched[u] = true
n += 1
go worker(u, ch, fetcher)
}
}
n -= 1
if n == 0 {
break
}
}
}
func ConcurrentChannel(url string, fetcher Fetcher) {
ch := make(chan []string)
go func() {
ch <- []string{url}
}()
master(ch, fetcher)
}
Shell code:
go run -race crawler.go
closure 的参数变量会分配到堆上
go: race detector
Why hard: performance faults tolerance consistency strong consistency
- Bad Repl Design many client writes different servers
Big Fast
Global
Sharding
Automatic Recovery
handle:checkpoint
-
Master Data
- filename
- handle -> list of chunk servers version (# 17) master server primary (v) server lease expiration
- chunks
- buffer
- send
- recv and merge
-
Reads and Writes primary and secondary nodes
-
Read Operation
- client send name, offset to server
- Master server find chunk server by name in the handle
- server respond the data
-
Write Operation
- no primary - on master
- find up to date replicas
- pick Primary, Secondary
- increment Version#
- Tells P,S, V#
-
Split Brain
-
Concurrent Record Appends
- when break down the request return an error
- file order
- if the primary crash
- if the secondary also crashes
Profile:
Distributed systesms fault tolerance.
backup.
Replicated State Machine Q: What kind of states are supposed to be replicated? A: some application level should be replicated
Non-determined events
- log entry
- inputs - packets - data -interrupt
- weird instructions
- multicore behavoir prediction
interrupt during a thousand instructions running on the vmm architecure.(How does the machine get rid of the influence of the computer?)
Primary Server ->Db Server Step1: client send increment request Step2: Primary dies, the backup doesn't receive the backup logging. Result: Client can't get consistent data. The backup knows the same connection and TCP sequence like primary server. TEST-AND-SET
package main
import "sync"
func main() {
var a string
var wg sync.WaitGroup
wg.Add(1)
go func() {
a = "hello world"
wg.Done()
}()
wg.Wait()
println(a)
}
package main
import "sync"
import "time"
func main() {
couter := 0
var mu sync.Mutex
for i := 0; i < 1000; i++ {
go func() {
mu.Lock()
defer mu.Unlock()
counter = counter + 1
}()
}
time.Sleep(1 * time.Second)
mu.Lock()
println(counter)
mu.Unlock()
}
Condition Variable
- aim: to improve efficiency and the code writing way.
- problem: lost wake up problem
- difference between and signal and broadcast in Golang.
- The TA usually doesn't use the channel but the primitives.
Part III
- Test Script
- https://gist.github.com/jonhoo/f686cacb4b9fe716d5aa go-tests-many.sh
Aim: rule-out the split-brain phenomena
Majority, 或者quorum,指超过一半的机器数目存活
2f + 1 机器中, 最多容忍f台机器宕机
-
数据结构 k/v 结构
log entry 存储操作
election timer
term 操作序号,半递增
request votes -
Majority Vote(Leader Elect)
Raft
-
Log Entry
-
Raft的Function 作用:
maintain replication
-
layer k/v storage above raft
-
progress request->replicate data to major machines->update k/v storage(committed)
-
The meaning of log in Raft
-
to save the operation order of client request
-
to help with resend the request to worker servers when they missed some operations
-
to help the leader server rejoin the cluster when it reboots
Q: what if the workers are unable to process requests from the leader in time?
场景: leader服务器部分发送log entry到server后自己crash掉了; 新选的leader重新发送log
比较: Paxos没有leader, Raft有; 而且Raft更加高效
What's the concrete behavior of split-brain? 两个实体(消息队列,机器)中有一个不可用
Q: Whether it will be slow to send so many requests in the process in general situation? A: The teacher doesn't think so.
Q: Why not longest log as leader? A: Because some log
If a node is the leader, it must get majority votes from other nodes.
分区问题下的majority选举导致的log不一致
log backup acceleration
one follower:
three cases(数字为任期)
case1:
S1 4 5 5
S2 4 6 6 6
case2:
S1 4 4 4
S2 4 6 6 6
case3:
S1 4
S2 4 6 6 6
State Machine Safety
For Each Node:
log
current term
voted for
currentTerm 必须持久化, otherwise 当高任期的leader crash, 其他的follower可能重新选择低的term开始
For Log:
commitIndex: the max index for log committed
the aim: the reduce the waste of replaying the steps when the leader crash for a long time or has missed millions of entries.
snapshot: 舍弃掉已经被client确认的log部分;
snapshot的问题: 如果有follower的最终log在snapshot点之前,则它无法恢复。 解决方式: install snapshot rpc
linearizability
在序列图里如果有环则不是linearizable