Scalability Là Gì? Hướng Dẫn Scale Hệ Thống Từ 1K Đến 10M Users

Tôi còn nhớ ngày startup tôi làm việc đạt 10,000 users đầu tiên.

Team celebrate. Website vẫn chạy smooth. Everything good.

Một tuần sau: 50,000 users.

Website bắt đầu chậm. Response time từ 200ms lên 2 giây.

Hai tuần sau: 100,000 users.

3 giờ sáng, tôi nhận call: "Website down. Database CPU 100%. Users không login được."

CTO hỏi: "Em có plan để scale không?"

Tôi: "... Scale nghĩa là gì ạ?"

Đó là lần tôi học về scalability theo cách khó nhất.

Scalability Là Gì?

Định Nghĩa

Scalability là khả năng của hệ thống handle tăng trưởng về load (users, requests, data) mà không cần redesign hoàn toàn.

Không phải về làm hệ thống "nhanh". Mà về làm hệ thống có thể grow.

Scalable system:
1,000 users → Works well
10,000 users → Still works well
100,000 users → Still works well (maybe add servers)
1,000,000 users → Still works well (architecture evolved)

Non-scalable system:
1,000 users → Works well
10,000 users → Slow
100,000 users → Crashes
1,000,000 users → Impossible

Scalability vs Performance

Common confusion: Scalability ≠ Performance

Performance:
- How fast cho fixed load
- Optimize algorithms
- Reduce latency
- Example: 100ms → 50ms response time

Scalability:
- Handle growing load
- Add capacity
- Maintain performance khi load tăng
- Example: 1K users → 100K users, vẫn 100ms

Real example:

System A: Very fast (10ms response)
- 1,000 users: 10ms ✓
- 10,000 users: 10ms ✓
- 100,000 users: CRASH ✗
→ High performance, low scalability

System B: Reasonable fast (100ms response)
- 1,000 users: 100ms ✓
- 10,000 users: 100ms ✓
- 100,000 users: 100ms ✓
- 1,000,000 users: 100ms ✓
→ Good performance, high scalability

System B is better for growing business!

Tại Sao Scalability Quan Trọng?

Lý Do 1: Business Growth

Startup trajectory:
Month 1: 100 users
Month 3: 1,000 users (10x growth)
Month 6: 10,000 users (10x growth)
Month 12: 100,000 users (10x growth)

If system không scalable:
→ Crashes khi viral
→ Lose users (bad UX)
→ Lose revenue
→ Lose competitive advantage

Real story:

Twitter 2008 "Fail Whale" - Website down liên tục vì không scale được. Gần như kill company.

Instagram 2010 - Designed for scale từ đầu với simple architecture. Grew to 1M users trong 2 tháng. Sold to Facebook for $1B.

Lý Do 2: Cost Efficiency

Non-scalable approach:
- 1K users: $100/month (OK)
- 10K users: $10,000/month (linear cost!)
- 100K users: $1,000,000/month (impossible!)

Scalable approach:
- 1K users: $100/month
- 10K users: $500/month (economies of scale)
- 100K users: $5,000/month (affordable!)

Lý Do 3: Competitive Advantage

Company A (không scalable):
- Launch viral feature
- Website crashes
- 1 week to fix
- Users churn to competitor

Company B (scalable):
- Launch viral feature
- Auto-scales to handle load
- Zero downtime
- Users happy, growth continues

Vertical Scaling vs Horizontal Scaling

Vertical Scaling (Scale Up)

Definition: Thêm resources cho existing server

Current server: 8 CPU, 16GB RAM
Vertical scale: 32 CPU, 128GB RAM

Same server, more powerful

Ưu điểm:

✅ Extremely simple
   - No code changes
   - No architecture changes
   - Just upgrade hardware

✅ No distributed complexity
   - Single database
   - ACID transactions work
   - No data consistency issues

✅ Fast to implement
   - Cloud: Click button → Upgraded
   - Physical: Swap hardware

Nhược điểm:

❌ Physical limits
   - Cannot buy infinite CPU/RAM
   - Max out eventually

❌ Expensive at scale
   - 16GB → 32GB: 2x price
   - 32GB → 64GB: 3x price
   - 64GB → 128GB: 5x price
   - Diminishing returns

❌ Single point of failure
   - Server down = Total downtime
   - Maintenance = Downtime

❌ Downtime required
   - Must restart to upgrade
   - 5-30 minutes offline

Real numbers (AWS EC2):

t3.medium (2 CPU, 4GB RAM): $30/month
t3.xlarge (4 CPU, 16GB RAM): $120/month (4x price)
t3.2xlarge (8 CPU, 32GB RAM): $240/month (8x price)
m5.8xlarge (32 CPU, 128GB RAM): $1,100/month (36x price!)

Cost scaling faster than capacity!

Khi nào dùng vertical scaling:

✓ Startup phase (< 10K users)
✓ Simple operations critical
✓ Budget constraints
✓ Small team (không có distributed expertise)
✓ Database needs (PostgreSQL scale tốt vertically)
✓ Timeline tight (ship fast)

Horizontal Scaling (Scale Out)

Definition: Thêm nhiều servers

Current: 1 server
Horizontal scale: 10 servers

More machines, same power each

Ưu điểm:

✅ Theoretically unlimited scale
   - Add 100 servers = 100x capacity
   - No ceiling

✅ Cost-effective at scale
   - Linear cost growth
   - Use commodity hardware

✅ High availability
   - 1 server dies → Others continue
   - No single point of failure

✅ Zero-downtime scaling
   - Add servers without restart
   - Rolling deployments

Nhược điểm:

❌ Extreme complexity
   - Load balancing needed
   - Data consistency challenges
   - Network becomes bottleneck
   - Distributed transactions hard

❌ Application changes required
   - Code must be stateless
   - Session management complex
   - Cannot use server memory

❌ More operational overhead
   - Monitor nhiều machines
   - Deployment phức tạp
   - Debugging harder

Comparison:

┌─────────────────┬──────────────┬──────────────┐
│                 │   Vertical   │  Horizontal  │
├─────────────────┼──────────────┼──────────────┤
│ Complexity      │    Simple    │   Complex    │
│ Cost (small)    │     Low      │    Medium    │
│ Cost (large)    │   Very high  │  Reasonable  │
│ Limit           │   Physical   │   Unlimited  │
│ Downtime        │     Yes      │      No      │
│ Implementation  │  Immediate   │  Takes time  │
└─────────────────┴──────────────┴──────────────┘

Real architecture example:

Instagram early days (2010):
- Start: 1 server
- 10K users: Upgrade to bigger server (vertical)
- 100K users: Add read replicas (horizontal)
- 1M users: Multiple app servers + sharded DB (horizontal)
- 10M users: Full distributed architecture

→ Start simple, scale gradually

Identifying Bottlenecks

Scalability = Finding và fixing bottlenecks

The Bottleneck Mindset

System is only as fast as slowest component

Request flow:
Client (50ms)
  → Load Balancer (5ms)
  → API Server (20ms)
  → Database Query (500ms) ← BOTTLENECK
  → API Server (10ms)
  → Client (50ms)

Total: 635ms
79% time in database!

Optimizing API from 20ms → 10ms = 10ms savings
Optimizing database from 500ms → 100ms = 400ms savings

→ Always optimize bottleneck first

Common Bottlenecks by Component

1. Database Bottleneck

Symptoms:
- Slow queries (> 1 second)
- High CPU usage (> 80%)
- Connection pool exhausted
- Disk I/O saturated

Causes:
- Missing indexes
- N+1 queries
- Full table scans
- Too many writes

Solutions:
- Add indexes (immediate)
- Query optimization
- Add read replicas (horizontal)
- Database sharding (last resort)

Example:

-- Slow query (no index)
SELECT * FROM users WHERE email = 'john@example.com';
-- 10M rows → 5 seconds

-- Add index
CREATE INDEX idx_users_email ON users(email);
-- Same query → 10ms

500x improvement!

2. Network Bottleneck

Symptoms:
- High latency between services
- Bandwidth saturation
- Packet loss
- Timeouts

Causes:
- Geographic distance
- Large payloads
- Too many round trips
- Network congestion

Solutions:
- CDN for static assets
- Data compression
- Batch requests
- Caching
- Geographic distribution

3. CPU Bottleneck

Symptoms:
- CPU usage > 80%
- Slow processing
- Queued requests
- Timeouts

Causes:
- Inefficient algorithms
- Too much computation
- Synchronous processing
- Single-threaded bottleneck

Solutions:
- Algorithm optimization
- Async processing
- Horizontal scaling
- Caching computed results

4. Memory Bottleneck

Symptoms:
- High memory usage (> 80%)
- Swapping to disk
- Out of memory errors
- Garbage collection pauses

Causes:
- Memory leaks
- Large objects in memory
- Too much caching
- Insufficient capacity

Solutions:
- Fix memory leaks
- Optimize data structures
- Eviction policies for cache
- Vertical scaling (more RAM)

Bottleneck Detection Process

Step 1: Measure everything
- Response times
- CPU usage
- Memory usage
- Database query times
- Network latency

Step 2: Find the slowest part
- What takes most time?
- What uses most resources?
- What fails first under load?

Step 3: Verify it's the bottleneck
- If fixed, will it improve system?
- Is it consistently the slowest?
- Does it affect user experience?

Step 4: Optimize
- Start with quick wins (indexes)
- Then bigger changes (caching)
- Finally architecture (sharding)

Step 5: Measure again
- Did it improve?
- New bottleneck appeared?
- Keep iterating

Real-World Scaling Examples

Example 1: E-commerce Startup (0 → 100K users)

Phase 1: Launch (0-1K users)

Architecture:
- Single server (8 CPU, 16GB RAM)
- PostgreSQL database
- No caching
- No load balancer

Cost: $100/month
Performance: 100ms average
Status: Perfect for launch

Phase 2: Growth (1K-10K users)

Problem: Database queries slowing down

Bottleneck: Missing indexes, N+1 queries

Solution:
- Add database indexes
- Optimize queries
- Add Redis cache for hot data

Cost: $200/month (added Redis)
Performance: Back to 100ms
Change: Minimal (query optimization + cache)

Phase 3: Viral (10K-50K users)

Problem: Single server maxed out

Bottleneck: CPU at 90%, occasional crashes

Solution:
- Vertical scale to bigger server (16 CPU, 64GB)
- Add database read replica
- CDN for static assets

Cost: $500/month
Performance: 120ms average
Change: Configuration only, no code changes

Phase 4: Scale (50K-100K users)

Problem: Single server again at limit

Bottleneck: Database writes, server CPU

Solution:
- Horizontal scaling: 3 app servers + load balancer
- Database with 2 read replicas
- Aggressive caching strategy
- CDN for all media

Cost: $1,500/month
Performance: 100ms average
Change: Stateless application, session in Redis

Lessons:

✓ Start simple (monolith)
✓ Optimize before scaling (indexes, queries)
✓ Vertical scale first (easy wins)
✓ Horizontal when necessary
✓ Add complexity gradually
✓ Always measure before/after

Total timeline: 12 months
Total rewrites: 0 (evolved architecture)

Initial State (100K users):

Architecture:
- 5 app servers
- PostgreSQL (master + 2 replicas)
- Redis cache cluster
- S3 for media
- CloudFront CDN

Works well at 100K users

Growth Challenge (1M users):

Problem:
- Database writes hitting limit (5K writes/second)
- PostgreSQL master maxed out
- Feed generation slow

Bottleneck: Database write capacity

Solution 1: Optimize writes
- Batch inserts
- Async processing via queue
- Reduce write amplification

Result: Handle 10K writes/second
Cost: Minimal
Timeline: 2 weeks

Next Challenge (5M users):

Problem:
- Feed generation overwhelming
- 1 post → fanout to 500 followers
- = 10M posts/day × 500 = 5B feed updates/day

Bottleneck: Fanout to followers

Solution: Hybrid fanout strategy
- Normal users (< 1K followers): Pre-compute feeds
- Celebrities (> 1K followers): Compute on-demand
- Message queue for async fanout

Result: Handle 50M users+ potential
Cost: +$2K/month (queue + workers)
Timeline: 1 month implementation

Scale to 10M users:

Final architecture:
- 50 app servers (auto-scaling)
- Sharded PostgreSQL (10 shards)
- Distributed Redis cluster
- Kafka for message queue
- Geographic distribution (US, EU, Asia)
- Microservices for key features

Cost: $15K/month
Performance: < 200ms for 95% requests
Availability: 99.95%
Team: 20 engineers

Total evolution: 2 years from 100K → 10M users

Key learnings:

✓ Measure everything (metrics critical)
✓ Optimize before re-architecting
✓ Scale gradually, not big bang
✓ Different scale = different challenges
✓ Trade-offs change with scale
✓ Team must grow with system

Scalability Best Practices

1. Design for Statelessness

Stateful (bad for scaling):
class UserSession:
    sessions = {}  # In-memory state
    
    def login(user_id):
        sessions[user_id] = Session()  # Tied to this server!

Problem: Load balancer phải route user → same server

Stateless (good for scaling):
def login(user_id):
    session = create_session(user_id)
    redis.set(f"session:{user_id}", session)  # Any server can access
    return session_token

Benefit: Any server can handle any request

2. Use Database Indexes

Without index:
SELECT * FROM users WHERE email = 'user@example.com';
→ Full table scan (10M rows)
→ 5,000ms

With index:
CREATE INDEX idx_users_email ON users(email);
→ Index lookup
→ 5ms

1000x faster!

Rule: Index all columns used in WHERE, JOIN, ORDER BY

3. Cache Aggressively

Read-heavy systems (90%+ reads):
→ Cache everything reasonable

Cache strategy:
- Hot data: 100% cache hit rate
- Warm data: 80% cache hit rate
- Cold data: Query database

Example:
No cache: 10K database queries/second
With cache (90% hit rate): 1K database queries/second

10x reduction in database load

4. Async Everything Possible

Synchronous (blocks user):
def create_order(data):
    order = save_to_db(data)
    send_email(order)  # Waits 2 seconds
    update_inventory(order)  # Waits 1 second
    notify_warehouse(order)  # Waits 500ms
    return order  # User waits 3.5 seconds!

Asynchronous (doesn't block):
def create_order(data):
    order = save_to_db(data)
    queue.publish('order.created', order.id)  # 10ms
    return order  # User waits 100ms only!

@worker.task
def process_order(order_id):
    send_email(order_id)
    update_inventory(order_id)
    notify_warehouse(order_id)

35x faster user experience!

5. Monitor and Alert

Cannot scale what you don't measure

Key metrics:
- Response time (p50, p95, p99)
- Error rate
- Request rate
- CPU/Memory usage
- Database query time
- Cache hit rate

Alerts:
- Response time > 500ms
- Error rate > 1%
- CPU > 80%
- Database connections > 80%

→ Know problems before users complain

6. Plan for Failure

Murphy's Law: Everything that can fail, will fail

Design for failure:
- Multiple app servers (1 dies, others continue)
- Database replicas (failover automatic)
- Message queue (retries on failure)
- Circuit breakers (stop cascading failures)
- Graceful degradation (degrade features, not crash)

Example:
Recommendations service down:
✗ Don't crash entire app
✓ Show generic recommendations
✓ Log error
✓ Alert team
✓ User still has working app

Common Scaling Mistakes

Mistake 1: Premature Optimization

❌ Bad:
"Let's use Kubernetes, Kafka, Cassandra from day 1
 for our 100-user app!"

→ 6 months development
→ $5K/month cost
→ Team overwhelmed
→ Might never launch

✅ Good:
"Start with Heroku + PostgreSQL.
 If reach 10K users, then optimize.
 Ship fast > Perfect architecture."

→ 2 weeks development
→ $200/month cost
→ Iterate based on real data

Mistake 2: Scaling Too Late

❌ Bad:
"We have 100K users and single database.
 No monitoring. No caching. No optimization.
 Wait until it crashes to think about scaling."

→ Midnight emergency
→ Rushed decisions
→ Downtime
→ User churn

✅ Good:
"At 10K users, add monitoring.
 At 30K users, add caching.
 At 50K users, add read replica.
 Plan before crisis."

→ Smooth growth
→ Informed decisions
→ No downtime

Mistake 3: Cargo Cult Architecture

❌ Bad:
"Netflix uses microservices, so we should too!"

Context ignored:
- Netflix: 200M users, 1000 engineers
- Your startup: 5K users, 3 engineers

Result: Over-engineered mess

✅ Good:
"Netflix's architecture fits their scale.
 Our scale needs simple monolith.
 Copy thinking, not architecture."

Mistake 4: Ignoring Database

❌ Bad:
"Add 10 more app servers!
 Why still slow?"

→ Database is bottleneck (not app servers)
→ Wasted money

✅ Good:
"Measure first:
 - App server: 20ms
 - Database: 500ms ← Bottleneck
 
 Optimize database (indexes, queries)
 Then add servers if needed"

Key Takeaways

Scalability definition:

Khả năng handle growing load
mà không cần complete redesign

Not về làm nhanh
Về làm hệ thống có thể grow

Two approaches:

Vertical Scaling (Scale Up):
- Add resources to server
- Simple, expensive at scale
- Has physical limits
- Good cho startup phase

Horizontal Scaling (Scale Out):
- Add more servers
- Complex, cost-effective at scale
- Theoretically unlimited
- Required cho large scale

Bottleneck thinking:

System = slowest component

Always:
1. Measure everything
2. Find bottleneck
3. Optimize bottleneck
4. Measure again
5. Repeat

Don't waste time optimizing non-bottlenecks

Scaling progression:

1K users: Single server (vertical scale nếu cần)
10K users: Optimize queries + caching
100K users: Horizontal scaling begins
1M users: Distributed architecture
10M+ users: Microservices, sharding, geo-distribution

Each stage = different challenges

Best practices:

✓ Design stateless applications
✓ Use database indexes
✓ Cache aggressively (read-heavy)
✓ Async processing (long tasks)
✓ Monitor everything
✓ Plan for failure

Avoid mistakes:

✗ Premature optimization (over-engineering)
✗ Scaling too late (crisis mode)
✗ Cargo cult architecture (copy blindly)
✗ Ignore bottlenecks (waste money)

Remember:

Scalability không phải về technology

Scalability về:
- Understanding your load
- Identifying bottlenecks
- Making appropriate trade-offs
- Evolving architecture gradually

Start simple
Measure continuously
Scale intelligently

Next Steps

Understand your current scale:

How many users?
How many requests/second?
What's your bottleneck?

Plan for 2x growth:

Not 10x (premature)
Not 1.1x (too late)
2x = Goldilocks zone

Learn systematically:

Phase 1: Vertical scaling (simple)
Phase 2: Caching (quick wins)
Phase 3: Horizontal scaling (as needed)
Phase 4: Distributed systems (when necessary)

Practice estimations:

Calculate capacity needs
Estimate costs
Plan scaling timeline

Scalability is a journey, not a destination.

Measure. Optimize. Scale. Repeat.

Bài viết này là phần của System Design From Zero to Hero - Learn system design from first principles.

Scalability Là Gì? Hướng Dẫn Scale Hệ Thống Từ 1K Đến 10M Users

Tôi còn nhớ ngày startup tôi làm việc đạt 10,000 users đầu tiên.

Team celebrate. Website vẫn chạy smooth. Everything good.

Một tuần sau: 50,000 users.

Website bắt đầu chậm. Response time từ 200ms lên 2 giây.

Hai tuần sau: 100,000 users.

3 giờ sáng, tôi nhận call: "Website down. Database CPU 100%. Users không login được."

CTO hỏi: "Em có plan để scale không?"

Tôi: "... Scale nghĩa là gì ạ?"

Đó là lần tôi học về scalability theo cách khó nhất.

Scalability Là Gì?

Định Nghĩa

Scalability là khả năng của hệ thống handle tăng trưởng về load (users, requests, data) mà không cần redesign hoàn toàn.

Không phải về làm hệ thống "nhanh". Mà về làm hệ thống có thể grow.

Scalable system:
1,000 users → Works well
10,000 users → Still works well
100,000 users → Still works well (maybe add servers)
1,000,000 users → Still works well (architecture evolved)

Non-scalable system:
1,000 users → Works well
10,000 users → Slow
100,000 users → Crashes
1,000,000 users → Impossible

Scalability vs Performance

Common confusion: Scalability ≠ Performance

Performance:
- How fast cho fixed load
- Optimize algorithms
- Reduce latency
- Example: 100ms → 50ms response time

Scalability:
- Handle growing load
- Add capacity
- Maintain performance khi load tăng
- Example: 1K users → 100K users, vẫn 100ms

Real example:

System A: Very fast (10ms response)
- 1,000 users: 10ms ✓
- 10,000 users: 10ms ✓
- 100,000 users: CRASH ✗
→ High performance, low scalability

System B: Reasonable fast (100ms response)
- 1,000 users: 100ms ✓
- 10,000 users: 100ms ✓
- 100,000 users: 100ms ✓
- 1,000,000 users: 100ms ✓
→ Good performance, high scalability

System B is better for growing business!

Tại Sao Scalability Quan Trọng?

Lý Do 1: Business Growth

Startup trajectory:
Month 1: 100 users
Month 3: 1,000 users (10x growth)
Month 6: 10,000 users (10x growth)
Month 12: 100,000 users (10x growth)

If system không scalable:
→ Crashes khi viral
→ Lose users (bad UX)
→ Lose revenue
→ Lose competitive advantage

Real story:

Twitter 2008 "Fail Whale" - Website down liên tục vì không scale được. Gần như kill company.

Instagram 2010 - Designed for scale từ đầu với simple architecture. Grew to 1M users trong 2 tháng. Sold to Facebook for $1B.

Lý Do 2: Cost Efficiency

Non-scalable approach:
- 1K users: $100/month (OK)
- 10K users: $10,000/month (linear cost!)
- 100K users: $1,000,000/month (impossible!)

Scalable approach:
- 1K users: $100/month
- 10K users: $500/month (economies of scale)
- 100K users: $5,000/month (affordable!)

Lý Do 3: Competitive Advantage

Company A (không scalable):
- Launch viral feature
- Website crashes
- 1 week to fix
- Users churn to competitor

Company B (scalable):
- Launch viral feature
- Auto-scales to handle load
- Zero downtime
- Users happy, growth continues

Vertical Scaling vs Horizontal Scaling

Vertical Scaling (Scale Up)

Definition: Thêm resources cho existing server

Current server: 8 CPU, 16GB RAM
Vertical scale: 32 CPU, 128GB RAM

Same server, more powerful

Ưu điểm:

✅ Extremely simple
   - No code changes
   - No architecture changes
   - Just upgrade hardware

✅ No distributed complexity
   - Single database
   - ACID transactions work
   - No data consistency issues

✅ Fast to implement
   - Cloud: Click button → Upgraded
   - Physical: Swap hardware

Nhược điểm:

❌ Physical limits
   - Cannot buy infinite CPU/RAM
   - Max out eventually

❌ Expensive at scale
   - 16GB → 32GB: 2x price
   - 32GB → 64GB: 3x price
   - 64GB → 128GB: 5x price
   - Diminishing returns

❌ Single point of failure
   - Server down = Total downtime
   - Maintenance = Downtime

❌ Downtime required
   - Must restart to upgrade
   - 5-30 minutes offline

Real numbers (AWS EC2):

t3.medium (2 CPU, 4GB RAM): $30/month
t3.xlarge (4 CPU, 16GB RAM): $120/month (4x price)
t3.2xlarge (8 CPU, 32GB RAM): $240/month (8x price)
m5.8xlarge (32 CPU, 128GB RAM): $1,100/month (36x price!)

Cost scaling faster than capacity!

Khi nào dùng vertical scaling:

✓ Startup phase (< 10K users)
✓ Simple operations critical
✓ Budget constraints
✓ Small team (không có distributed expertise)
✓ Database needs (PostgreSQL scale tốt vertically)
✓ Timeline tight (ship fast)

Horizontal Scaling (Scale Out)

Definition: Thêm nhiều servers

Current: 1 server
Horizontal scale: 10 servers

More machines, same power each

Ưu điểm:

✅ Theoretically unlimited scale
   - Add 100 servers = 100x capacity
   - No ceiling

✅ Cost-effective at scale
   - Linear cost growth
   - Use commodity hardware

✅ High availability
   - 1 server dies → Others continue
   - No single point of failure

✅ Zero-downtime scaling
   - Add servers without restart
   - Rolling deployments

Nhược điểm:

❌ Extreme complexity
   - Load balancing needed
   - Data consistency challenges
   - Network becomes bottleneck
   - Distributed transactions hard

❌ Application changes required
   - Code must be stateless
   - Session management complex
   - Cannot use server memory

❌ More operational overhead
   - Monitor nhiều machines
   - Deployment phức tạp
   - Debugging harder

Comparison:

┌─────────────────┬──────────────┬──────────────┐
│                 │   Vertical   │  Horizontal  │
├─────────────────┼──────────────┼──────────────┤
│ Complexity      │    Simple    │   Complex    │
│ Cost (small)    │     Low      │    Medium    │
│ Cost (large)    │   Very high  │  Reasonable  │
│ Limit           │   Physical   │   Unlimited  │
│ Downtime        │     Yes      │      No      │
│ Implementation  │  Immediate   │  Takes time  │
└─────────────────┴──────────────┴──────────────┘

Real architecture example:

Instagram early days (2010):
- Start: 1 server
- 10K users: Upgrade to bigger server (vertical)
- 100K users: Add read replicas (horizontal)
- 1M users: Multiple app servers + sharded DB (horizontal)
- 10M users: Full distributed architecture

→ Start simple, scale gradually

Identifying Bottlenecks

Scalability = Finding và fixing bottlenecks

The Bottleneck Mindset

System is only as fast as slowest component

Request flow:
Client (50ms)
  → Load Balancer (5ms)
  → API Server (20ms)
  → Database Query (500ms) ← BOTTLENECK
  → API Server (10ms)
  → Client (50ms)

Total: 635ms
79% time in database!

Optimizing API from 20ms → 10ms = 10ms savings
Optimizing database from 500ms → 100ms = 400ms savings

→ Always optimize bottleneck first

Common Bottlenecks by Component

1. Database Bottleneck

Symptoms:
- Slow queries (> 1 second)
- High CPU usage (> 80%)
- Connection pool exhausted
- Disk I/O saturated

Causes:
- Missing indexes
- N+1 queries
- Full table scans
- Too many writes

Solutions:
- Add indexes (immediate)
- Query optimization
- Add read replicas (horizontal)
- Database sharding (last resort)

Example:

-- Slow query (no index)
SELECT * FROM users WHERE email = 'john@example.com';
-- 10M rows → 5 seconds

-- Add index
CREATE INDEX idx_users_email ON users(email);
-- Same query → 10ms

500x improvement!

2. Network Bottleneck

Symptoms:
- High latency between services
- Bandwidth saturation
- Packet loss
- Timeouts

Causes:
- Geographic distance
- Large payloads
- Too many round trips
- Network congestion

Solutions:
- CDN for static assets
- Data compression
- Batch requests
- Caching
- Geographic distribution

3. CPU Bottleneck

Symptoms:
- CPU usage > 80%
- Slow processing
- Queued requests
- Timeouts

Causes:
- Inefficient algorithms
- Too much computation
- Synchronous processing
- Single-threaded bottleneck

Solutions:
- Algorithm optimization
- Async processing
- Horizontal scaling
- Caching computed results

4. Memory Bottleneck

Symptoms:
- High memory usage (> 80%)
- Swapping to disk
- Out of memory errors
- Garbage collection pauses

Causes:
- Memory leaks
- Large objects in memory
- Too much caching
- Insufficient capacity

Solutions:
- Fix memory leaks
- Optimize data structures
- Eviction policies for cache
- Vertical scaling (more RAM)

Bottleneck Detection Process

Step 1: Measure everything
- Response times
- CPU usage
- Memory usage
- Database query times
- Network latency

Step 2: Find the slowest part
- What takes most time?
- What uses most resources?
- What fails first under load?

Step 3: Verify it's the bottleneck
- If fixed, will it improve system?
- Is it consistently the slowest?
- Does it affect user experience?

Step 4: Optimize
- Start with quick wins (indexes)
- Then bigger changes (caching)
- Finally architecture (sharding)

Step 5: Measure again
- Did it improve?
- New bottleneck appeared?
- Keep iterating

Real-World Scaling Examples

Example 1: E-commerce Startup (0 → 100K users)

Phase 1: Launch (0-1K users)

Architecture:
- Single server (8 CPU, 16GB RAM)
- PostgreSQL database
- No caching
- No load balancer

Cost: $100/month
Performance: 100ms average
Status: Perfect for launch

Phase 2: Growth (1K-10K users)

Problem: Database queries slowing down

Bottleneck: Missing indexes, N+1 queries

Solution:
- Add database indexes
- Optimize queries
- Add Redis cache for hot data

Cost: $200/month (added Redis)
Performance: Back to 100ms
Change: Minimal (query optimization + cache)

Phase 3: Viral (10K-50K users)

Problem: Single server maxed out

Bottleneck: CPU at 90%, occasional crashes

Solution:
- Vertical scale to bigger server (16 CPU, 64GB)
- Add database read replica
- CDN for static assets

Cost: $500/month
Performance: 120ms average
Change: Configuration only, no code changes

Phase 4: Scale (50K-100K users)

Problem: Single server again at limit

Bottleneck: Database writes, server CPU

Solution:
- Horizontal scaling: 3 app servers + load balancer
- Database with 2 read replicas
- Aggressive caching strategy
- CDN for all media

Cost: $1,500/month
Performance: 100ms average
Change: Stateless application, session in Redis

Lessons:

✓ Start simple (monolith)
✓ Optimize before scaling (indexes, queries)
✓ Vertical scale first (easy wins)
✓ Horizontal when necessary
✓ Add complexity gradually
✓ Always measure before/after

Total timeline: 12 months
Total rewrites: 0 (evolved architecture)

Initial State (100K users):

Architecture:
- 5 app servers
- PostgreSQL (master + 2 replicas)
- Redis cache cluster
- S3 for media
- CloudFront CDN

Works well at 100K users

Growth Challenge (1M users):

Problem:
- Database writes hitting limit (5K writes/second)
- PostgreSQL master maxed out
- Feed generation slow

Bottleneck: Database write capacity

Solution 1: Optimize writes
- Batch inserts
- Async processing via queue
- Reduce write amplification

Result: Handle 10K writes/second
Cost: Minimal
Timeline: 2 weeks

Next Challenge (5M users):

Problem:
- Feed generation overwhelming
- 1 post → fanout to 500 followers
- = 10M posts/day × 500 = 5B feed updates/day

Bottleneck: Fanout to followers

Solution: Hybrid fanout strategy
- Normal users (< 1K followers): Pre-compute feeds
- Celebrities (> 1K followers): Compute on-demand
- Message queue for async fanout

Result: Handle 50M users+ potential
Cost: +$2K/month (queue + workers)
Timeline: 1 month implementation

Scale to 10M users:

Final architecture:
- 50 app servers (auto-scaling)
- Sharded PostgreSQL (10 shards)
- Distributed Redis cluster
- Kafka for message queue
- Geographic distribution (US, EU, Asia)
- Microservices for key features

Cost: $15K/month
Performance: < 200ms for 95% requests
Availability: 99.95%
Team: 20 engineers

Total evolution: 2 years from 100K → 10M users

Key learnings:

✓ Measure everything (metrics critical)
✓ Optimize before re-architecting
✓ Scale gradually, not big bang
✓ Different scale = different challenges
✓ Trade-offs change with scale
✓ Team must grow with system

Scalability Best Practices

1. Design for Statelessness

Stateful (bad for scaling):
class UserSession:
    sessions = {}  # In-memory state
    
    def login(user_id):
        sessions[user_id] = Session()  # Tied to this server!

Problem: Load balancer phải route user → same server

Stateless (good for scaling):
def login(user_id):
    session = create_session(user_id)
    redis.set(f"session:{user_id}", session)  # Any server can access
    return session_token

Benefit: Any server can handle any request

2. Use Database Indexes

Without index:
SELECT * FROM users WHERE email = 'user@example.com';
→ Full table scan (10M rows)
→ 5,000ms

With index:
CREATE INDEX idx_users_email ON users(email);
→ Index lookup
→ 5ms

1000x faster!

Rule: Index all columns used in WHERE, JOIN, ORDER BY

3. Cache Aggressively

Read-heavy systems (90%+ reads):
→ Cache everything reasonable

Cache strategy:
- Hot data: 100% cache hit rate
- Warm data: 80% cache hit rate
- Cold data: Query database

Example:
No cache: 10K database queries/second
With cache (90% hit rate): 1K database queries/second

10x reduction in database load

4. Async Everything Possible

Synchronous (blocks user):
def create_order(data):
    order = save_to_db(data)
    send_email(order)  # Waits 2 seconds
    update_inventory(order)  # Waits 1 second
    notify_warehouse(order)  # Waits 500ms
    return order  # User waits 3.5 seconds!

Asynchronous (doesn't block):
def create_order(data):
    order = save_to_db(data)
    queue.publish('order.created', order.id)  # 10ms
    return order  # User waits 100ms only!

@worker.task
def process_order(order_id):
    send_email(order_id)
    update_inventory(order_id)
    notify_warehouse(order_id)

35x faster user experience!

5. Monitor and Alert

Cannot scale what you don't measure

Key metrics:
- Response time (p50, p95, p99)
- Error rate
- Request rate
- CPU/Memory usage
- Database query time
- Cache hit rate

Alerts:
- Response time > 500ms
- Error rate > 1%
- CPU > 80%
- Database connections > 80%

→ Know problems before users complain

6. Plan for Failure

Murphy's Law: Everything that can fail, will fail

Design for failure:
- Multiple app servers (1 dies, others continue)
- Database replicas (failover automatic)
- Message queue (retries on failure)
- Circuit breakers (stop cascading failures)
- Graceful degradation (degrade features, not crash)

Example:
Recommendations service down:
✗ Don't crash entire app
✓ Show generic recommendations
✓ Log error
✓ Alert team
✓ User still has working app

Common Scaling Mistakes

Mistake 1: Premature Optimization

❌ Bad:
"Let's use Kubernetes, Kafka, Cassandra from day 1
 for our 100-user app!"

→ 6 months development
→ $5K/month cost
→ Team overwhelmed
→ Might never launch

✅ Good:
"Start with Heroku + PostgreSQL.
 If reach 10K users, then optimize.
 Ship fast > Perfect architecture."

→ 2 weeks development
→ $200/month cost
→ Iterate based on real data

Mistake 2: Scaling Too Late

❌ Bad:
"We have 100K users and single database.
 No monitoring. No caching. No optimization.
 Wait until it crashes to think about scaling."

→ Midnight emergency
→ Rushed decisions
→ Downtime
→ User churn

✅ Good:
"At 10K users, add monitoring.
 At 30K users, add caching.
 At 50K users, add read replica.
 Plan before crisis."

→ Smooth growth
→ Informed decisions
→ No downtime

Mistake 3: Cargo Cult Architecture

❌ Bad:
"Netflix uses microservices, so we should too!"

Context ignored:
- Netflix: 200M users, 1000 engineers
- Your startup: 5K users, 3 engineers

Result: Over-engineered mess

✅ Good:
"Netflix's architecture fits their scale.
 Our scale needs simple monolith.
 Copy thinking, not architecture."

Mistake 4: Ignoring Database

❌ Bad:
"Add 10 more app servers!
 Why still slow?"

→ Database is bottleneck (not app servers)
→ Wasted money

✅ Good:
"Measure first:
 - App server: 20ms
 - Database: 500ms ← Bottleneck
 
 Optimize database (indexes, queries)
 Then add servers if needed"

Key Takeaways

Scalability definition:

Khả năng handle growing load
mà không cần complete redesign

Not về làm nhanh
Về làm hệ thống có thể grow

Two approaches:

Vertical Scaling (Scale Up):
- Add resources to server
- Simple, expensive at scale
- Has physical limits
- Good cho startup phase

Horizontal Scaling (Scale Out):
- Add more servers
- Complex, cost-effective at scale
- Theoretically unlimited
- Required cho large scale

Bottleneck thinking:

System = slowest component

Always:
1. Measure everything
2. Find bottleneck
3. Optimize bottleneck
4. Measure again
5. Repeat

Don't waste time optimizing non-bottlenecks

Scaling progression:

1K users: Single server (vertical scale nếu cần)
10K users: Optimize queries + caching
100K users: Horizontal scaling begins
1M users: Distributed architecture
10M+ users: Microservices, sharding, geo-distribution

Each stage = different challenges

Best practices:

✓ Design stateless applications
✓ Use database indexes
✓ Cache aggressively (read-heavy)
✓ Async processing (long tasks)
✓ Monitor everything
✓ Plan for failure

Avoid mistakes:

✗ Premature optimization (over-engineering)
✗ Scaling too late (crisis mode)
✗ Cargo cult architecture (copy blindly)
✗ Ignore bottlenecks (waste money)

Remember:

Scalability không phải về technology

Scalability về:
- Understanding your load
- Identifying bottlenecks
- Making appropriate trade-offs
- Evolving architecture gradually

Start simple
Measure continuously
Scale intelligently

Next Steps

Understand your current scale:

How many users?
How many requests/second?
What's your bottleneck?

Plan for 2x growth:

Not 10x (premature)
Not 1.1x (too late)
2x = Goldilocks zone

Learn systematically:

Phase 1: Vertical scaling (simple)
Phase 2: Caching (quick wins)
Phase 3: Horizontal scaling (as needed)
Phase 4: Distributed systems (when necessary)

Practice estimations:

Calculate capacity needs
Estimate costs
Plan scaling timeline

Scalability is a journey, not a destination.

Measure. Optimize. Scale. Repeat.

Bài viết này là phần của System Design From Zero to Hero - Learn system design from first principles.

Scalability Là Gì? Hướng Dẫn Scale Hệ Thống Từ 1K Đến 10M Users

Scalability Là Gì?

Định Nghĩa

Scalability vs Performance

Tại Sao Scalability Quan Trọng?

Lý Do 1: Business Growth

Lý Do 2: Cost Efficiency

Lý Do 3: Competitive Advantage

Vertical Scaling vs Horizontal Scaling

Vertical Scaling (Scale Up)

Horizontal Scaling (Scale Out)

Identifying Bottlenecks

The Bottleneck Mindset

Common Bottlenecks by Component

Bottleneck Detection Process

Real-World Scaling Examples

Example 1: E-commerce Startup (0 → 100K users)

Example 2: Social Media App (100K → 10M users)

Scalability Best Practices

1. Design for Statelessness

2. Use Database Indexes

3. Cache Aggressively

4. Async Everything Possible

5. Monitor and Alert

6. Plan for Failure

Common Scaling Mistakes

Mistake 1: Premature Optimization

Mistake 2: Scaling Too Late

Mistake 3: Cargo Cult Architecture

Mistake 4: Ignoring Database

Key Takeaways

Next Steps

Scalability Là Gì? Hướng Dẫn Scale Hệ Thống Từ 1K Đến 10M Users

Scalability Là Gì?

Định Nghĩa

Scalability vs Performance

Tại Sao Scalability Quan Trọng?

Lý Do 1: Business Growth

Lý Do 2: Cost Efficiency

Lý Do 3: Competitive Advantage

Vertical Scaling vs Horizontal Scaling

Vertical Scaling (Scale Up)

Horizontal Scaling (Scale Out)

Identifying Bottlenecks

The Bottleneck Mindset

Common Bottlenecks by Component

Bottleneck Detection Process

Real-World Scaling Examples

Example 1: E-commerce Startup (0 → 100K users)

Example 2: Social Media App (100K → 10M users)

Scalability Best Practices

1. Design for Statelessness

2. Use Database Indexes

3. Cache Aggressively

4. Async Everything Possible

5. Monitor and Alert

6. Plan for Failure

Common Scaling Mistakes

Mistake 1: Premature Optimization

Mistake 2: Scaling Too Late

Mistake 3: Cargo Cult Architecture

Mistake 4: Ignoring Database

Key Takeaways

Next Steps