Aug 15, 2024

Synchrony, Asynchrony, and Partial Synchrony in Distributed Systems

Distributed systems live and die by their timing assumptions. Whether you can bound message delays and clock drift changes what you can prove about safety and liveness, and it shapes which consensus protocol you pick.

This post contrasts synchronous, asynchronous, and partially synchronous models, then maps popular protocols (Paxos, Raft, HotStuff, Tendermint, Bitcoin, Ethereum) onto those assumptions.

The three timing models

1) Synchronous systems

A system is synchronous if there are known upper bounds on:

  • Message delay (every message arrives within D time)
  • Processing time (each step finishes within P time)
  • Clock drift (clocks stay within E)

In a synchronous model you can set timeouts that are guaranteed to be long enough. That makes liveness easier to prove.

Example:

  • A tightly controlled cluster in one data center with bounded network latency, controlled hardware, and NTP/PTP clock sync.

2) Asynchronous systems

A system is asynchronous if there are no known bounds on message delay or clock drift. Messages can be delayed arbitrarily long, and you cannot distinguish a slow node from a failed node.

Key implication:

  • The FLP impossibility result says no deterministic consensus protocol can guarantee both safety and liveness in a fully asynchronous system with even one faulty node.

Example:

  • The public internet during congestion or routing incidents. You cannot assume any fixed delay bound.

3) Partially synchronous systems

Partial synchrony sits between the two extremes. There are two common formulations:

  • There is an unknown bound on message delay that eventually holds (after some unknown Global Stabilization Time).
  • Or the bounds exist but are unknown to the protocol.

This model is realistic for many real networks: things can be unpredictable, but eventually the network behaves “well enough.”

Example:

  • A geo-distributed service with occasional congestion, but stable periods where latency is bounded.

Why this matters: safety vs liveness

  • Safety: “nothing bad happens.” Protocols often preserve safety even under severe delays.
  • Liveness: “something good eventually happens.” Liveness usually needs timing assumptions so progress is guaranteed.

A lot of consensus protocols are built to be safe under asynchrony, and live under partial synchrony.

Protocols and their timing assumptions

Paxos

  • Safety: holds under asynchrony.
  • Liveness: needs partial synchrony (eventual stable leader + message delivery). Timeouts are used to detect failures and trigger leader changes.

Why: Paxos assumes messages eventually get delivered, but does not require a known bound until it needs to make progress.

Raft

  • Safety: holds under asynchrony.
  • Liveness: partial synchrony (stable leader, bounded delays so heartbeats work).

Raft is explicit about timeouts and leader election. If messages are delayed too long, nodes can trigger elections indefinitely.

HotStuff

  • Safety: holds under asynchrony.
  • Liveness: partial synchrony.

HotStuff (and its descendants) use quorum certificates and timeouts; progress depends on the network eventually stabilizing.

Tendermint (BFT)

  • Safety: holds under asynchrony (up to 1/3 Byzantine faults).
  • Liveness: partial synchrony.

Tendermint uses rounds and timeouts; once the network delay is bounded, it can finalize blocks quickly.

Bitcoin (PoW)

  • Safety/finality: probabilistic, based on chain growth and chain quality.
  • Liveness: partial synchrony assumptions on propagation; if the network is too slow relative to block time, forks become frequent.

Bitcoin does not assume strict synchrony, but it relies on eventual message propagation and a stable network rate to keep reorgs rare.

Ethereum

  • PoW era (historical): similar to Bitcoin’s partial synchrony with probabilistic finality.
  • PoS era (Gasper = LMD GHOST + Casper FFG): safety holds under asynchrony; liveness relies on partial synchrony and honest majority of stake.

Ethereum PoS uses epochs and timeouts (slot timing) to progress. If the network is unstable, finality can be delayed.

Quick comparison

SystemSynchrony assumption for safetySynchrony assumption for livenessFinality
PaxosAsynchronousPartial synchronyDeterministic
RaftAsynchronousPartial synchronyDeterministic
HotStuffAsynchronousPartial synchronyDeterministic
TendermintAsynchronousPartial synchronyDeterministic (BFT)
BitcoinProbabilisticPartial synchronyProbabilistic
Ethereum (PoS)AsynchronousPartial synchronyProbabilistic + checkpoints

Practical intuition with examples

  • If you are building a bank-grade database in a controlled environment, you might treat your system as synchronous and use tight timeouts to get fast failover.
  • If you are building a global blockchain, you should assume asynchrony for safety and partial synchrony for liveness; expect delays, reorgs, and temporary stalls.
  • If your system must tolerate the public internet, avoid protocols that require strict bounds to be correct.

Takeaways

  • Synchrony is about bounded time. Asynchrony is about no bounds.
  • Partial synchrony is the realistic middle ground for most distributed systems.
  • Most modern consensus protocols are safe in asynchronous settings and live in partially synchronous settings.

If you are choosing a protocol, make your timing assumptions explicit. It helps you reason about failure modes, timeouts, and the trade-offs between fast finality and resilience to unpredictable networks.


Thanks for reading! If you want to see future content, you can follow me on Twitter or get connected over at LinkedIn.


Support My Content

If you find my content helpful, consider supporting a humanitarian cause (building homes for elderly people in rural Terai region of Nepal) that I am planning with your donation:

Ethereum (ETH)

0xB62409A5B227D2aE7D8C66fdaA5EEf4eB4E37959

Thank you for your support!