Introduction

System design is not just for interviews—it’s a daily skill for building scalable, reliable, and maintainable software. Whether you are building a small startup app or a massive distributed system, these 20 concepts form the foundation of modern software architecture.

In this guide, we break down 20 essential system design concepts with clear definitions, real-world examples, diagrams, and—most importantly—how to explain them in an interview.

1. Load Balancing

Definition:
The process of distributing incoming network traffic across multiple servers to ensure no single server becomes overwhelmed.

Real-World Example:

Nginx/HAProxy: Distributing HTTP requests to a cluster of web servers.
AWS ELB: automatically routing user traffic to healthy EC2 instances.

Visual:

       [User]
          |
    [Load Balancer]
      /   |   \
[Server1] [Server2] [Server3]

💡 Interview Tip:
"I use load balancing to improve availability and fault tolerance. If one server goes down, the balancer detects the failure and reroutes traffic to healthy instances, ensuring zero downtime."

2. Caching

Definition:
Storing copies of frequently accessed data in a temporary, high-speed storage layer (RAM) to speed up subsequent requests.

Real-World Example:

Redis: Storing user session data or leaderboard scores in memory.
Browser Cache: Storing static assets (CSS, images) locally on the user's device.

Visual:

[App] -> [Cache (Redis)] --(Hit)--> Return Data
                |
             (Miss)
                v
            [Database]

💡 Interview Tip:
"Caching trades memory for speed. I’d implement a 'Cache-Aside' strategy: check cache first; if missing, fetch from DB and update cache. This significantly reduces database load."

3. Database Sharding

Definition:
Splitting a large dataset into smaller, manageable chunks (shards) distributed across multiple servers, usually based on a "shard key" (e.g., User ID).

Real-World Example:

Instagram: Sharding photo metadata based on User ID.
Discord: Sharding message history based on Channel ID.

💡 Interview Tip:
"Sharding effectively solves horizontal scaling for write-heavy databases. However, it introduces complexity in joins and transactions. I would choose a sharding key carefully to avoid 'hot shards'."

4. Replication

Definition:
Keeping copies of the same data on multiple machines to ensure data availability and redundancy.

Real-World Example:

Master-Slave: One primary DB for writes, multiple read-replicas for reads (e.g., PostgreSQL).
Multi-Region: Replicating data across US-East and EU-West for disaster recovery.

💡 Interview Tip:
"Replication increases read throughput and data durability. I would use asynchronous replication for performance, acknowledging the trade-off of potential 'eventual consistency'."

5. CAP Theorem

Definition:
A distributed system can deliver only two of three guarantees: Consistency (every read receives the most recent write), Availability (every request receives a response), and Partition Tolerance (system continues despite network failures).

Real-World Example:

CP (Consistency + Partition Tolerance): Banking systems (ATM) where accuracy is critical.
AP (Availability + Partition Tolerance): Social media structure where seeing a slightly old post is acceptable.

💡 Interview Tip:
"In a distributed system, network partitions are inevitable (P). So the real choice is between C and A. For a payment system, I'd pick Consistency (CP); for a news feed, availability (AP)."

6. Consistent Hashing

Definition:
A hashing technique used in distributed systems (like caches) to minimize data movement when nodes are added or removed. Ideally, only K/n keys need remapping.

Real-World Example:

DynamoDB / Cassandra: Distributing data across nodes in a ring topology.
Discord: Routing voice chat traffic to specific nodes.

💡 Interview Tip:
"Standard hashing (key % n) breaks when n changes, causing massive cache misses. Consistent hashing solves this by placing nodes on a ring, ensuring unrelated keys stay put during scaling events."

7. Message Queues

Definition:
A buffer that enables asynchronous communication between services. Producers send messages; consumers process them at their own pace.

Real-World Example:

Email Sending: User clicks "Sign Up" → Job added to queue → Worker sends email later.
Video Processing: Youtube upload → Queue → Background transcoding service.

Visual:

[Order Svc] --> [Queue (Kafka/SQS)] --> [Shipping Svc]

💡 Interview Tip:
"Message queues decouple services. If the consumer service crashes, messages persist in the queue until it recovers, preventing data loss and allowing for load leveling."

8. Rate Limiting

Definition:
Controlling the number of requests a user can send to a system within a specific time frame to prevent abuse.

Real-World Example:

Twitter API: Limiting users to 500 tweets per day.
Login: Blocking IP after 5 failed password attempts.

💡 Interview Tip:
"I would implement rate limiting (e.g., Token Bucket algorithm) at the API Gateway level to protect backend services from DDoS attacks and noisy neighbors."

9. API Gateway

Definition:
A single entry point for client requests that routes them to appropriate microservices. It often handles cross-cutting concerns like auth, logging, and rate limiting.

Real-World Example:

Netflix Zuul: Routes millions of requests to catalog, user, and playback services.
AWS API Gateway: Frontend for Serverless functions.

💡 Interview Tip:
"Using an API Gateway usually simplifies the client side—they only need to know one domain (api.mysite.com). It also offloads security and SSL termination from individual microservices."

10. Microservices

Definition:
An architectural style where an application is built as a collection of small, independent services that communicate over APIs.

Real-World Example:

Uber: Separate services for Passenger Management, Driver Tracking, and Payments.
Amazon: Hundreds of services for Product Page, Recommendations, Cart, etc.

Internal Link:
See Service Mesh Explained for how to manage communication between these services.

💡 Interview Tip:
"Microservices allow teams to deploy independently and choose the best tech stack for each problem. However, they drastically increase operational complexity compared to a monolith."

11. Service Discovery

Definition:
A mechanism for services to automatically detect and find the network location (IP + Port) of other services in a dynamic environment (like Kubernetes).

Real-World Example:

Consul / Eureka: Registry where services register themselves on startup.
Kubernetes DNS: Resolving my-service.default.svc.cluster.local.

💡 Interview Tip:
"Hardcoding IPs is impossible in cloud environments where instances spin up and down. Service Discovery acts as a dynamic phonebook for the cluster."

12. CDNs (Content Delivery Networks)

Definition:
A geographically distributed network of proxy servers that serve static content from a location closest to the user.

Real-World Example:

Cloudflare / Akamai: Serving images, CSS, and JS files from a server in London for a user in London, even if the main DB is in New York.
Video Streaming: Netflix Open Connect.

💡 Interview Tip:
"CDNs drastically reduce latency (TTFB) and offload static traffic from your origin servers. For a global app, a CDN is mandatory for a good UX."

13. Database Indexing

Definition:
A data structure (like B-Tree) that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space.

Real-World Example:

Indexing the email column in a user table to make SELECT * FROM users WHERE email=? instant.

💡 Interview Tip:
"Indexes turn O(N) full table scans into O(log N) lookups. However, over-indexing slows down INSERT and UPDATE operations, so it requires balance."

14. Partitioning

Definition:
The general concept of dividing data. Sharding (mentioned above) is a form of horizontal partitioning. Vertical partitioning splits tables by columns (e.g., storing CLOBs separately).

Real-World Example:

Time-based partitioning: Storing logs in daily tables (access_logs_2025_12_27).

💡 Interview Tip:
"Partitioning improves query performance by allowing the DB engine to 'prune' unnecessary partitions. If I query for December data, the DB skips scanning January-November."

15. Eventual Consistency

Definition:
A consistency model used in distributed computing where the system guarantees that if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.

Real-World Example:

DNS Propagation: Changing a domain IP takes minutes/hours to reflect globally.
Social Media Follows: You follow someone, but your friend doesn't see it on their feed for a few seconds.

💡 Interview Tip:
"We accept eventual consistency in exchange for high availability (BASE vs ACID). It's fine for 'Likes', but usually not for 'Bank Balances'."

16. WebSockets

Definition:
A persistent, full-duplex communication channel over a single TCP connection, allowing the server to push updates to the client in real-time.

Real-World Example:

Chat Apps: WhatsApp / Slack.
Live Sports Scores: Updating scoreboards without refreshing the page.

Visual:

[Client] <====(Persistent Connection)====> [Server]

💡 Interview Tip:
"Unlike HTTP Polling (Client asking 'Any new data?'), WebSockets allow the server to say 'Here is new data!'. This reduces latency and unnecessary network overhead."

17. Scalability

Definition:
The capability of a system to handle a growing amount of work by adding resources.

Types:

Vertical (Scale Up): Adding more RAM/CPU to a single server.
Horizontal (Scale Out): Adding more servers to the pool.

💡 Interview Tip:
"Vertical scaling hits a hardware limit and represents a single point of failure. Horizontal scaling is infinite in theory but introduces software complexity (statelessness, load balancing)."

18. Fault Tolerance

Definition:
The ability of a system to continue operating properly in the event of the failure of some of its components.

Real-World Example:

RAID: Hard drive failure doesn't result in data loss.
Kubernetes: Automatically restarting crashed pods.

💡 Interview Tip:
"To achieve fault tolerance, I assume every component will eventually fail. We use redundancy (Replication) and isolation (Bulkheads) to ensure a partial failure doesn't become a total outage."

19. Monitoring & Observability

Definition:
Collecting metrics, logs, and traces to understand the internal state of the system and detect issues before users do.

Real-World Example:

Prometheus/Grafana: Visualizing CPU usage and Request Latency (p99).
Datadog/NewRelic: Full stack monitoring.

💡 Interview Tip:
"You can't fix what you can't measure. In a distributed system, tracing (tracking a request across services) is vital to identify bottlenecks."

20. Circuit Breaker Pattern

Definition:
A design pattern used to detect failures and encapsulate the logic of preventing a failure from constantly recurring (e.g., during maintenance or temporary external system failure).

Real-World Example:

Netflix Hystrix: If the Recommendation Service is slow/down, the Netflix app stops calling it and shows a "Popular Movies" fallback list instead of a spinning loader.

Visual:

[Circuit Closed] -> Flow Normal
[Circuit Open]   -> Fail Fast (Return Error/Fallback immediately)
[Half Open]      -> Test if system is back up

💡 Interview Tip:
"It pushes back pressure. Instead of overwhelming a struggling service with retries, the circuit breaker opens to give the downstream service time to recover."

Final Thoughts

Mastering these 20 concepts helps you move from being a "coder" to an "architect." In system design interviews, knowing the definition gets you points, but explaining the trade-offs (the "Interview Tips" above) gets you the job.

Next Steps:

Read about Service Mesh to see how many of these concepts come together in infrastructure.
Check out the Algorithm visualizer to see consistent hashing in action.