# Why System Design?
As a Senior Engineer, you're expected to design systems, not just write code. System design is about making trade-offs — there's no perfect solution, only the best fit for your constraints.
The key question: How do you build a system that serves millions of users, stays available when things fail, and evolves as requirements change?
# Load Balancing
A load balancer sits between clients and servers, distributing traffic to prevent any single server from becoming a bottleneck.
Client Requests
│
▼
┌─────────────┐
│ Load Balancer│
└──────┬──────┘
┌──┴──┬──────┐
▼ ▼ ▼
Server Server Server
A B CKey Strategies
- Round Robin — rotate through servers sequentially
- Least Connections — route to the server with fewest active requests
- IP Hash — same client IP always hits the same server (sticky sessions)
- Weighted — more powerful servers get more traffic
# Caching
Caching stores frequently accessed data in a faster layer (memory) to avoid expensive recomputation or database queries.
Cache Layers
Browser Cache → CDN Edge → App Cache (Redis) → Database (~0ms) (~5ms) (~1ms) (~10-100ms)
Cache Patterns
- Cache-Aside — App checks cache first, fills on miss (most common)
- Write-Through — Writes go to cache AND database together
- Write-Behind — Writes go to cache only, async flush to DB (risky)
- Read-Through — Cache itself fetches from DB on miss
# CDN (Content Delivery Network)
CDNs distribute static content to edge servers worldwide. A user in Tokyo gets served from an Asian edge node instead of a US-based origin server — cutting latency from ~150ms to ~5ms.
Push vs Pull CDN
- Pull — CDN fetches from origin on first request, then caches. Simple, but first request is slow.
- Push — You upload content to CDN proactively. Better for content you know will be popular.
# Horizontal vs Vertical Scaling
- Vertical (Scale Up) — Add more CPU/RAM to one machine. Simple but has a ceiling and a single point of failure.
- Horizontal (Scale Out) — Add more machines. No ceiling, better fault tolerance, but adds complexity (load balancing, data consistency).
Most real-world systems use both: vertically scale each machine to a reasonable size, then horizontally scale the fleet.
⚡ Key Takeaways
- System design is about trade-offs, not perfect solutions
- Load balancers distribute traffic; health checks remove unhealthy nodes
- Multi-layer caching (browser → CDN → app → DB) dramatically reduces latency
- CDNs bring content closer to users geographically
- Prefer horizontal scaling for large systems — it removes single points of failure
- Always think about: latency, throughput, availability, consistency