Sync Protocol
DSM replication has two related layers: the runtime control plane and the repair-oriented data plane.
End-To-End Flow
1. application calls register.put(...) / lease.acquire(...) / crdt.update(...)
2. DsmRuntime writes the change locally
3. RuntimePlatformSyncService emits a platform envelope
4. peers apply the delta if they are already in sync
5. if a peer is behind, RuntimeDataPlaneReplicationService repairs it
6. cluster returns to a converged stateTwo-Layer Picture
normal path
local write -> control plane delta -> peers apply update
recovery path
peer falls behind -> digest check -> snapshot/replay -> peer catches upControl Plane
RuntimePlatformSyncService replicates register, lease, and CRDT deltas over platform envelopes. This is the normal propagation path after the application mutates a collection handle.
ASCII view:
local mutation
|
v
[DsmRuntime]
|
v
[RuntimePlatformSyncService]
|
+--> peer-b applies delta
+--> peer-c applies deltaThis path is for the healthy, steady-state case where peers are online and current.
Example mental model:
register.put(route-hint)
-> local runtime commits it
-> sync service emits one delta envelope
-> healthy peers apply the same updateData Plane Repair
RuntimeDataPlaneReplicationService repairs lagging peers using:
- digest exchange
- snapshot transfer
- replay flows
That repair path matters when peers miss messages, rejoin after downtime, or need to reconcile state after cluster instability.
ASCII repair view:
peer-c falls behind
|
v
digest check -> state differs
|
+--> snapshot if peer is far behind
|
+--> replay if peer can catch up from history
v
peer-c returns to current cluster stateRepair is why DSM does not rely only on best-effort delta delivery.
Typical cases that trigger repair:
- a node restarted and missed some deltas
- a node was partitioned briefly
- a peer joined after the latest state already moved forward
Why The Split Exists
The control plane handles normal steady-state mutation flow. The data plane exists to restore correctness and convergence when normal flow is not enough.
More concretely:
- control plane optimizes for fast mutation propagation
- repair plane optimizes for correctness after loss, delay, or rejoin
Without the repair plane, a missed message could leave a node permanently stale.
Collection-Specific Behavior
- registers replicate metadata-backed entity updates
- leases replicate ownership state and lease transitions
- CRDTs replicate updates plus state repair toward convergence
Examples:
- register:
route-hintsdelta announces a new address foredge-eu-west-1 - lease:
shard-ownerdelta announces owner change or renew result forshard-17 - CRDT:
request-counterdelta carries update intent while repair transfers merged state when needed
Sequence Sketch
node-a node-b
| |
| put(route-hint) |
|---- delta envelope ---->|
| | apply update
| |
if node-b missed it:
node-a node-b
| |
|<--- digest mismatch ----|
|---- replay/snapshot --->|
| | catch upWhere To Read Executable Behavior
If you want to see the protocol working in code rather than prose, start with:
dsm-integration-test/.../TwoNodeIntegrationTestfor two-node register replicationdsm-integration-test/.../RuntimeIntegrationTestfor register, lease, and CRDT behavior together