Syllabus

从基础问题到研究问题,到系统和工具

  • Model of computation

    • Different dimensions in defining a computation model
  • Basic problems

    • Basic message-passing algorithms
    • Leader election in rings
    • Mutual exclusion in shared memory
    • Fault-tolerant consensus
  • Research problems

    • Distributed algorithms
      • Consensus algorithms: Paxos, Raft, Zab
      • Specificaiton, verification, testing
  • From algorithms to systems

    • Data replication: TPaxos, ParallelRaft (Ali PolarDB)
    • Coordination service: Zookeeper (Apache)

Outline

  • Distrubuted Algorithms (DAs)

    • Fundamental concepts
  • Model of computation

    • Basic idea
    • RAM model for sequential algorithms
  • Model of computationsfor DAs

    • Communication medium, Timing model
    • Failure model, progress condition
    • Correctness condition: specification of the problem
    • Simulation among models
  • Algorithms

    • Abstract: designed over an abstract machine (the model of computation)
    • Formal: rigid mathematical treatments
  • Distributed algotrithms

    • Model of distributed computation
    • Mathematical proofs, analysis, …
    • In relation to other concepts
      • Distributed conputing theory
      • Distributed systems

Computing and Computer

  • The computer seems to be able to do anything

  • But something cannot be efficiently done by a computer

  • Computing

    • Encoding everything into ‘0’s and ‘1’s
    • Operations over ‘1’s and ‘0’s
    • Decoding the ‘1’s and ‘0’s
  • Computer

    • A set of operations it can do
      • Limited operations, unlimited combinations
    • Perform specified operations
      • Quickly, inexhaustibly, with no intelligence

计算机只能做简单重复的事情。

Algorithm

  • Algorithm is the spirit of computing

    • To solve a specific problem (so called an algorithmic problem)
    • Combination of basic operations
      • in a precise and elegant way
  • Essential issues

    • Model of computation
    • Algorithm design
    • Algorithm analysis

Model of Computation

  • Problem 1

    • Why the algorithms we learn can run almost everywhere?
  • Problem 2

    • Why the algorithms we learn can be implemented in any language?
  • Machine-independent algorithms run on an abstract machine

    • Turing machine: over-qualify
    • RAM model: simple but powerful

Turing Machine

  • Tape in

    • Encoding the input
  • Tape out

    • Recording the output
  • Control of the read-write head

    • Operations the machine can do

RAM Model

  • Each simple operation takes one time step

    • E.g., key comparison, +/-, memory access, …
  • Complex operations shoule be decomposed

    • Loop, Subroutine
  • Memory

    • Memory access is a simple operation
    • Unlimited memory

Model of Computation for DAs

Communicaion medium: message passing & shared memory

Progess method: synchronous & asynchronous

Why one model can’t cover DA?

  • Single serialized machine: low complexity to cover

  • Distributed systems vary

Message-passing Model

  • Processors

    • p_0, p_1, …, p_n-1 are nodes of the graph
    • Each modeled as a state machine
  • Channel from p_i to p_j

    • outbuf variable of p_i (physical channel)
    • inbuf variable of p_j (incoming message queue)

Comfiguraition of the System

  • A snapshot of the entire system

    • Processor states
    • Channels states
  • Formally, a vector of

    • Local variables
    • Incoming message queues
    • outbufs

Event: Delivery (送达)

  • Moves a message from sender’s outbuf to receiver’s inbuf
    • Message will be abailable next time receiver takes a step

Event: Computation

  • Start with old accessible state

    • Local vars + incoming messages
  • Apply the state machine transition function

    • Handle all incoming messages
  • End with new accessible state

    • Empty inbufs
    • New outgoing messages

Asynchronous Execution

  • System execution

    • <config, event, config, event, config, …>
  • Initial configuration

    • Esch processor is in its initial state
    • All inbufs are empty
  • System progress

    • <config_old, event, config_new>
      • config_new is the same as config_old except:
        • a) if delivery (msgReceive) event
          • Spedified msg is transgerred from sender’s outbur to receiver’s inbuf
        • b) if computation event
          • Specified processor’s state (including outbufs) change according to transition function
  • An execution is admissible in a (reliable) asynchronous model if

    • Every message in an outbuf is eventually delivered
    • Every processor takes an infinite number of steps
    • No constraints on when these events take place
      • Arbitrary message delays
      • Relative processor speeds
  • Reliability lies in that

    • No message is lost
    • No processor stops working

Complexity Measures

  • Message complexity (发多少消息才能完成这件事)
    • Maximum number of messages sent in any admissible execution
  • Time complexity
    • Maximum “time” until all processes terminalte in any admissible execution
    • How to measure time in an asynchronous execurion?
      • Produce a timed execution by assigning non-decreading real times to events so that time between sengding and receiving any message is at most 1.
      • Time complexity: maximum time until termination in any timed admissible execution.

Synchronous Executions

  • Admissible execution
    • An infinite sequence of rounds
      • A round is a sequence of deliver events moving all messages in transit into inbufs, followed by a sequence of computation events, one for each processor
    • Progress in lockstep
      • Every message sent is delivered
      • Every processor takes an infinite number of steps
  • Time complexity
    • The number of rounds until termination

Shared Memory Model

  • Processors communicate via a set of shared variables (also called shared registers)
    • Each shared variable has a type, defining a set of primitive operations (performed atomiacally)
  • Shared variables
    • read, write
    • compare & swap (CAS)
    • read-modify-write (RMW)

Execution

  • Configuration

    (q0, q1, …, qn-1, r0, r1, …, rm-1)

    process states shared variables

  • Events

    • Computation steps taken by the process
    • At each computation step, the shared variable is accessed
  • Admissible execution

    • An execution is admissible if evety processor takes infinite number of steps
  • Shared variable access

    • At each computation step by p_i, the following happen atomically:

      a) p_i chooses a ashared variable to access with a specific operation, based on p_i’s current state;

      b) The specified operation is performed on the shared variable;

      c) p_i’s state changes according to p_i’s transition function, based on p_i’s current state and the value the shared memory operation performed.

Complexity Measures

  • Shared variables
    • Number of distinct shared variables required
  • Shared space
    • Amount of shared space
    • E.g, # of bits, # of distinct values, …

Changes from the MSG Model

  • Communication medium changes
    • No inbuf and outbuf state components
    • Configuration includes values for shared variables
  • Execution manner changes
    • One event type: one computation step by a process
      • p_i’s state in old configuration specifies whith shared variable is to be accessed and with which primitive
      • shared variable’s value in the new configuration changes according to the primitive’s semantics
      • p_i’s state in the new configuration changes according to its old state and the result of the primitive

Google: data center as a computer

Correctness Condition

  • How to specify an algorithmic problem?
    • E.g., GCD(Euclid), Sorting
  • How to speicify a distributed-algorithmic problem?
    • A reactive system
    • Safety
    • Liveness

DAs are always running.

不同模型可以互相模拟

Simulation among MoCs

  • Example 1: Synchronizer
    • Illusion of synchronous rounds
    • Over asynchronous communication
  • 2: Distributed Shared Memory (DSM)
    • Illusion of a shared memory
    • Over message passing