# Horizontal vs Vertical Scaling
There are two fundamental approaches to handling more load:
Vertical Scaling (Scale Up) Horizontal Scaling (Scale Out) ┌─────────────────────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │ App │ │ App │ │ App │ │ ███████████████ │ CPU ↑ │ #1 │ │ #2 │ │ #3 │ │ ███████████████ │ RAM ↑ └──┬───┘ └──┬───┘ └──┬───┘ │ ███████████████ │ Disk ↑ │ │ │ │ BIG SERVER │ ┌──▼────────▼────────▼──┐ │ │ │ Load Balancer │ └─────────────────────┘ └────────────────────────┘ Pros: Simple, no code changes Pros: No upper limit, redundancy Cons: Has a ceiling, SPOF Cons: Complexity, distributed state
Strategy: Scale vertically first (it's simpler). When you hit limits (~$10K+ machines, or need redundancy), go horizontal. Most systems end up using both.
# Sharding
Sharding splits your data across multiple databases. Each shard holds a subset of data and operates independently.
Hash-based sharding:
shard = hash(user_id) % num_shards
user_id=42 → hash=7839 → 7839 % 4 = shard 3
user_id=99 → hash=2104 → 2104 % 4 = shard 0
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Shard 0 │ │Shard 1 │ │Shard 2 │ │Shard 3 │
│users: │ │users: │ │users: │ │users: │
│99,156 │ │23,781 │ │44,512 │ │42,301 │
└────────┘ └────────┘ └────────┘ └────────┘Challenge: Cross-shard queries (e.g., "find all orders over $100") require scatter-gather across all shards. Design your shard key to match your most common query patterns.
# Caching Layers
Multi-tier caching is the single most effective scaling technique. Each layer reduces load on the next.
Request Flow:
Browser Cache → CDN → API Gateway Cache → App Memory (L1)
│
Redis/Memcached (L2)
│
Database
Layer │ Latency │ Hit Rate │ Scope
───────────────┼──────────┼──────────┼────────────
Browser cache │ 0 ms │ ~40% │ Per user
CDN │ 5 ms │ ~60% │ Per region
L1 (in-proc) │ <1 ms │ ~80% │ Per server
L2 (Redis) │ 1-2 ms │ ~95% │ Shared
Database │ 5-50 ms │ — │ Source of truth# Capacity Planning
Good capacity planning prevents both outages (too little) and waste (too much). Start with traffic estimation, then work backwards to resources.
- Estimate QPS — DAU × actions per user ÷ 86400, then 3x for peak
- Estimate storage — Records per day × avg size × retention period
- Estimate bandwidth — QPS × avg response size
- Add headroom — 2-3x for spikes, plan 6 months ahead
⚡ Key Takeaways
- Scale vertically first, horizontally when you need it
- Sharding enables horizontal scaling for databases — choose shard keys carefully
- Multi-layer caching is the #1 scaling lever — cache aggressively
- Consistent hashing minimizes data movement when adding/removing nodes
- Always start with back-of-envelope math before building