Hiểu về thời gian trong hệ thống phân tán: clock drift, logical clocks, Lamport timestamps, vector clocks, causality, và concurrent events. Học cách order events khi không có global clock.
Chia sẻ bài học
Có một câu hỏi tưởng chừng đơn giản:
"Event A xảy ra trước hay sau event B?"
Trong single-machine application, câu trả lời dễ: check timestamp. CPU clock cho biết thứ tự chính xác.
Nhưng trong distributed system với 3 servers ở 3 locations khác nhau, câu hỏi này trở nên cực kỳ khó.
Server 1 (Singapore): Event A timestamp = 10:00:00.123
Server 2 (Tokyo): Event B timestamp = 10:00:00.120
Event B nhỏ hơn → xảy ra trước? Không chắc.
Vì clock của 2 servers không sync hoàn hảo. Clock drift có thể làm timestamp sai lệch milliseconds, thậm chí seconds.
Đây chính là Clock Problem - một trong những vấn đề khó nhất trong distributed systems.
Lesson này dạy bạn:
Trước khi đi sâu vào giải pháp, hiểu tại sao ordering events lại critical.
User update profile ở Server A
timestamp: 10:00:00.100
name: "Alice" → "Bob"
User update profile ở Server B (concurrent)
timestamp: 10:00:00.095
name: "Alice" → "Charlie"
Khi replicate, apply order nào?
Nếu dựa vào timestamp → "Charlie" wins (timestamp nhỏ hơn).
Nhưng có thể Server B's clock chậm 10ms → thực tế "Bob" mới là update cuối.
Wrong ordering = wrong final state.
Transaction 1: Transfer $100 (A → B)
Transaction 2: Check balance of A
Nếu Transaction 2 đọc trước khi Transaction 1 complete → stale data.
Phải đảm bảo causality: check balance phải thấy transfer đã xong.
Event 1: Order created
Event 2: Payment processed
Event 3: Order shipped
Events phải replay đúng thứ tự.
Nếu ship trước khi payment → logic sai.
Mental model: Distributed systems cần ordering để maintain correctness và consistency.
Physical clock = system clock của server, đo thời gian thực.
Mỗi server có quartz crystal oscillator đếm thời gian.
Server 1: 2025-02-15 10:00:00.000
Server 2: 2025-02-15 10:00:00.003 (drift +3ms)
Server 3: 2025-02-15 09:59:59.998 (drift -2ms)
Problem: Clocks không bao giờ perfect sync.
Clock drift = sự sai lệch giữa clocks của các servers.
Nguyên nhân:
Thực tế drift rate:
Ý nghĩa:
±50 ppm = ±50 microseconds mỗi giây
= ±4.3 seconds mỗi ngày
= ±26 minutes mỗi năm
Chỉ sau 1 ngày không sync, 2 servers có thể sai lệch vài giây.
Giải pháp: Sync clocks với central time source.
flowchart TB
NTP[NTP Server \n Atomic Clock]
S1[Server 1]
S2[Server 2]
S3[Server 3]
NTP -->|Sync| S1
NTP -->|Sync| S2
NTP -->|Sync| S3
NTP hoạt động:
Accuracy:
Problems vẫn còn:
1. Network jitter
2. Clock skew
3. Không perfect
Key insight: Physical clocks không reliable cho ordering trong distributed systems.
Breakthrough thinking: Chúng ta không cần biết exact time, chỉ cần biết order của events.
Logical clock = counter để track order, không phải actual time.
Leslie Lamport (1978) đưa ra logical clock algorithm đơn giản nhưng powerful.
Mỗi process có counter (Lamport clock):
1. Internal event: LC = LC + 1
2. Send message: LC = LC + 1, attach LC to message
3. Receive message: LC = max(LC, message.LC) + 1
sequenceDiagram
participant P1 as Process 1
participant P2 as Process 2
participant P3 as Process 3
Note over P1: LC=0
Note over P2: LC=0
Note over P3: LC=0
Note over P1: Event A \n LC=1
P1->>P2: Message (LC=2)
Note over P1: Send \n LC=2
Note over P2: Receive \n LC=max(0,2)+1=3
Note over P2: Event B \n LC=4
P2->>P3: Message (LC=5)
Note over P2: Send \n LC=5
Note over P3: Receive \n LC=max(0,5)+1=6
Note over P3: Event C \n LC=7
Step-by-step:
Process 1:
LC=1: Event A (internal)
LC=2: Send message to P2
Process 2:
LC=3: Receive from P1 (max(0,2)+1)
LC=4: Event B (internal)
LC=5: Send message to P3
Process 3:
LC=6: Receive from P2 (max(0,5)+1)
LC=7: Event C (internal)
Ordering:
Lamport guarantee: Nếu A → B (A happens before B), thì LC(A) < LC(B).
A → B có nghĩa là:
Nếu không có A → B và không có B → A → A và B concurrent.
Problem: Lamport chỉ đảm bảo 1 chiều.
If A → B then LC(A) < LC(B) TRUE
If LC(A) < LC(B) then A → B FALSE
Ví dụ:
Process 1: Event A (LC=5)
Process 2: Event B (LC=8)
LC(A) < LC(B), nhưng không có causal relationship → A và B concurrent.
Lamport không phân biệt được causality vs concurrency.
Vector clocks solve Lamport's limitation.
Mỗi process maintain vector of counters - 1 counter cho mỗi process.
Process 1: VC = [1, 0, 0] // [P1, P2, P3]
Process 2: VC = [0, 1, 0]
Process 3: VC = [0, 0, 1]
1. Internal event: VC[i] = VC[i] + 1 (i = own process)
2. Send message:
- VC[i] = VC[i] + 1
- Attach VC to message
3. Receive message:
- VC[i] = max(VC[i], message.VC[i]) for all i
- VC[own] = VC[own] + 1
sequenceDiagram
participant P1 as Process 1
participant P2 as Process 2
Note over P1: [0,0]
Note over P2: [0,0]
Note over P1: Event A \n [1,0]
Note over P1: Event B \n [2,0]
P1->>P2: Message \n VC=[3,0]
Note over P1: Send \n [3,0]
Note over P2: Receive \n [3,1]
Note over P2: Event C \n [3,2]
Note over P1: Event D \n [4,0]
Comparison:
Event A: [1,0]
Event B: [2,0]
Event C: [3,2]
Event D: [4,0]
A → B?
[1,0] < [2,0] (mọi component ≤) → YES, causal
B → C?
[2,0] < [3,2] → YES, causal (B happened before C)
C → D?
[3,2] vs [4,0]: 3<4 but 2>0 → NO, concurrent
D → C?
[4,0] vs [3,2]: 4>3 → NO, concurrent
C và D concurrent - không có causal relationship.
VC1 < VC2 (VC1 causally precedes VC2) nếu:
- VC1[i] ≤ VC2[i] for all i
- ∃j where VC1[j] < VC2[j]
VC1 || VC2 (concurrent) nếu:
- ∃i where VC1[i] < VC2[i]
- ∃j where VC1[j] > VC2[j]
Vector clocks detect concurrency chính xác.
| Aspect | Lamport | Vector Clocks |
|---|---|---|
| Size | O(1) - single number | O(N) - N processes |
| Causality detection | One-way only | Both ways |
| Concurrency detection | No | Yes |
| Use case | Simple ordering | Conflict detection |
Lamport: Lightweight, đủ cho ordering logs, events.
Vector clocks: Detect conflicts, cần cho replication, version control.
Causality = relationship giữa cause và effect.
Event A: User click "Submit Order"
Event B: System save order to database
Event C: System send confirmation email
A → B → C: Clear causal chain.
Order này phải maintain. Không thể send email trước khi save database.
Process 1: User A update profile
Process 2: User B update profile (different user)
No causal relationship → concurrent.
Order không quan trọng. Có thể apply updates theo bất kỳ order nào.
1. Data consistency
Write X=1 on Server A
Write X=2 on Server B (causally after)
Replicate phải maintain order: final value = 2.
2. Application logic
Order created → Payment processed → Order shipped
Violate causality = logic error.
3. Conflict detection
Concurrent writes:
User A: X=1
User B: X=2
Không có causal order → conflict → need resolution strategy.
Git commits dùng Directed Acyclic Graph (DAG) - causality graph.
commit A → commit B → commit C
↓
commit D (branch)
Merge conflicts = concurrent edits (B→C vs B→D).
Vector clocks detect conflicting writes.
Write 1: VC=[1,0], value="Alice"
Write 2: VC=[0,1], value="Bob" (concurrent)
System detect concurrent writes → application resolves conflict.
Eventual consistency với automatic conflict resolution.
Causality tracking đảm bảo operations commute (order không matter).
Events stored với causal metadata (Lamport/Vector clock).
Replay events maintain causality → consistent state.
Modern approach: Combine physical time + logical clock.
Problem với pure logical clocks:
Problem với physical clocks:
HLC = Physical time + Lamport-style counter.
HLC = (physical_time, logical_counter)
Rules:
1. Internal event:
pt = max(physical_clock, last_hlc.pt)
if pt == last_hlc.pt:
lc = last_hlc.lc + 1
else:
lc = 0
hlc = (pt, lc)
2. Send/Receive: similar logic
Benefit:
Used by: CockroachDB, MongoDB (kinda), Google Spanner (TrueTime).
Reality check: Không phải lúc nào cũng cần precise ordering.
1. Maintaining data consistency
2. Causal dependencies
3. Conflict detection
1. Independent events
2. Eventually consistent systems
3. Idempotent operations
Mental model: Ordering có cost (complexity, performance). Chỉ enforce khi cần thiết.
Lamport: Minimal - single integer.
Vector clocks: O(N) per process - high với nhiều servers.
HLC: Similar to Lamport + physical clock sync.
Trade-off: Accuracy vs overhead.
Vector clocks grow with system size.
10 servers: 10 integers per event
1000 servers: 1000 integers per event
Solution:
Logical clocks khó debug vì không có real time.
Best practice: Log both logical và physical timestamps.
{
"event": "order_created",
"lamport_clock": 12345,
"wall_clock": "2025-02-15T10:00:00Z",
"vector_clock": [5, 3, 8]
}
Core insight lesson này:
Trong distributed systems, thời gian không phải là global linear axis.
Single machine:
Event A (10:00:00) → Event B (10:00:01)
Clear timeline
Distributed system:
Server 1: Event A (10:00:00.000)
Server 2: Event B (10:00:00.001)
→ Không chắc A trước B
Thay vì rely vào physical time, dùng causal relationships:
Physical clocks: Useful cho humans, không reliable cho ordering.
Logical clocks: Reliable cho ordering, không có real time meaning.
Hybrid clocks: Best of both worlds.
Architect decision: Pick clock strategy based on requirements - consistency level, debugging needs, performance constraints.
1. Physical clocks drift - không reliable cho event ordering
±10-100ms error là normal. NTP giúp nhưng không perfect.
2. Logical clocks order events without physical time
Lamport timestamps đảm bảo causal order. Lightweight, O(1).
3. Vector clocks detect causality và concurrency
Phân biệt được causal vs concurrent events. Cost: O(N) space.
** 4. Happened-before relationship (→) define causality**
A → B hoặc A || B (concurrent). Không có middle ground.
5. HLC combine physical time + logical ordering
Best of both worlds. Used by modern databases.
6. Ordering có cost - chỉ enforce khi cần thiết
Independent events không cần strict ordering. Trade-off complexity vs correctness.
7. Mental model: Time ≠ global timeline trong distributed systems
Think in terms of causality, not wall clock.
Phase 3 hoàn thành. Bạn đã học distributed systems fundamentals:
Phase 4 tiếp theo sẽ focus vào scalability & performance - bottleneck thinking, caching strategies, CDN, database scaling patterns.
Remember: In distributed systems, "what happened first?" is often the wrong question. The right question is "did A cause B?" Understand causality, not just timestamps.