Distributed Algorithm - syllabus
CommentSyllabus
从基础问题到研究问题,到系统和工具
Model of computation
- Different dimensions in defining a computation model
Basic problems
- Basic message-passing algorithms
- Leader election in rings
- Mutual exclusion in shared memory
- Fault-tolerant consensus
Research problems
- Distributed algorithms
- Consensus algorithms: Paxos, Raft, Zab
- Specificaiton, verification, testing
- Distributed algorithms
From algorithms to systems
- Data replication: TPaxos, ParallelRaft (Ali PolarDB)
- Coordination service: Zookeeper (Apache)
- …
Outline
Distrubuted Algorithms (DAs)
- Fundamental concepts
Model of computation
- Basic idea
- RAM model for sequential algorithms
Model of computationsfor DAs
- Communication medium, Timing model
- Failure model, progress condition
- Correctness condition: specification of the problem
- Simulation among models
Algorithms
- Abstract: designed over an abstract machine (the model of computation)
- Formal: rigid mathematical treatments
Distributed algotrithms
- Model of distributed computation
- Mathematical proofs, analysis, …
- In relation to other concepts
- Distributed conputing theory
- Distributed systems
Computing and Computer
The computer seems to be able to do anything
But something cannot be efficiently done by a computer
Computing
- Encoding everything into ‘0’s and ‘1’s
- Operations over ‘1’s and ‘0’s
- Decoding the ‘1’s and ‘0’s
Computer
- A set of operations it can do
- Limited operations, unlimited combinations
- Perform specified operations
- Quickly, inexhaustibly, with no intelligence
- A set of operations it can do
计算机只能做简单重复的事情。
Algorithm
Algorithm is the spirit of computing
- To solve a specific problem (so called an algorithmic problem)
- Combination of basic operations
- in a precise and elegant way
Essential issues
- Model of computation
- Algorithm design
- Algorithm analysis
Model of Computation
Problem 1
- Why the algorithms we learn can run almost everywhere?
Problem 2
- Why the algorithms we learn can be implemented in any language?
Machine-independent algorithms run on an abstract machine
- Turing machine: over-qualify
- RAM model: simple but powerful
Turing Machine
Tape in
- Encoding the input
Tape out
- Recording the output
Control of the read-write head
- Operations the machine can do
RAM Model
Each simple operation takes one time step
- E.g., key comparison, +/-, memory access, …
Complex operations shoule be decomposed
- Loop, Subroutine
Memory
- Memory access is a simple operation
- Unlimited memory
Model of Computation for DAs
Communicaion medium: message passing & shared memory
Progess method: synchronous & asynchronous
Why one model can’t cover DA?
Single serialized machine: low complexity to cover
Distributed systems vary
Message-passing Model
Processors
- p_0, p_1, …, p_n-1 are nodes of the graph
- Each modeled as a state machine
Channel from p_i to p_j
- outbuf variable of p_i (physical channel)
- inbuf variable of p_j (incoming message queue)
Comfiguraition of the System
A snapshot of the entire system
- Processor states
- Channels states
Formally, a vector of
- Local variables
- Incoming message queues
- outbufs
Event: Delivery (送达)
- Moves a message from sender’s outbuf to receiver’s inbuf
- Message will be abailable next time receiver takes a step
Event: Computation
Start with old accessible state
- Local vars + incoming messages
Apply the state machine transition function
- Handle all incoming messages
End with new accessible state
- Empty inbufs
- New outgoing messages
Asynchronous Execution
System execution
- <config, event, config, event, config, …>
Initial configuration
- Esch processor is in its initial state
- All inbufs are empty
System progress
- <config_old, event, config_new>
- config_new is the same as config_old except:
- a) if delivery (msgReceive) event
- Spedified msg is transgerred from sender’s outbur to receiver’s inbuf
- b) if computation event
- Specified processor’s state (including outbufs) change according to transition function
- a) if delivery (msgReceive) event
- config_new is the same as config_old except:
- <config_old, event, config_new>
An execution is admissible in a (reliable) asynchronous model if
- Every message in an outbuf is eventually delivered
- Every processor takes an infinite number of steps
- No constraints on when these events take place
- Arbitrary message delays
- Relative processor speeds
Reliability lies in that
- No message is lost
- No processor stops working
Complexity Measures
- Message complexity (发多少消息才能完成这件事)
- Maximum number of messages sent in any admissible execution
- Time complexity
- Maximum “time” until all processes terminalte in any admissible execution
- How to measure time in an asynchronous execurion?
- Produce a timed execution by assigning non-decreading real times to events so that time between sengding and receiving any message is at most 1.
- Time complexity: maximum time until termination in any timed admissible execution.
Synchronous Executions
- Admissible execution
- An infinite sequence of rounds
- A round is a sequence of deliver events moving all messages in transit into inbufs, followed by a sequence of computation events, one for each processor
- Progress in lockstep
- Every message sent is delivered
- Every processor takes an infinite number of steps
- An infinite sequence of rounds
- Time complexity
- The number of rounds until termination
Shared Memory Model
- Processors communicate via a set of shared variables (also called shared registers)
- Each shared variable has a type, defining a set of primitive operations (performed atomiacally)
- Shared variables
- read, write
- compare & swap (CAS)
- read-modify-write (RMW)
Execution
Configuration
(q0, q1, …, qn-1, r0, r1, …, rm-1)
process states shared variables
Events
- Computation steps taken by the process
- At each computation step, the shared variable is accessed
Admissible execution
- An execution is admissible if evety processor takes infinite number of steps
Shared variable access
At each computation step by p_i, the following happen atomically:
a) p_i chooses a ashared variable to access with a specific operation, based on p_i’s current state;
b) The specified operation is performed on the shared variable;
c) p_i’s state changes according to p_i’s transition function, based on p_i’s current state and the value the shared memory operation performed.
Complexity Measures
- Shared variables
- Number of distinct shared variables required
- Shared space
- Amount of shared space
- E.g, # of bits, # of distinct values, …
Changes from the MSG Model
- Communication medium changes
- No inbuf and outbuf state components
- Configuration includes values for shared variables
- Execution manner changes
- One event type: one computation step by a process
- p_i’s state in old configuration specifies whith shared variable is to be accessed and with which primitive
- shared variable’s value in the new configuration changes according to the primitive’s semantics
- p_i’s state in the new configuration changes according to its old state and the result of the primitive
- One event type: one computation step by a process
Google: data center as a computer
Correctness Condition
- How to specify an algorithmic problem?
- E.g., GCD(Euclid), Sorting
- How to speicify a distributed-algorithmic problem?
- A reactive system
- Safety
- Liveness
DAs are always running.
不同模型可以互相模拟
Simulation among MoCs
- Example 1: Synchronizer
- Illusion of synchronous rounds
- Over asynchronous communication
- 2: Distributed Shared Memory (DSM)
- Illusion of a shared memory
- Over message passing