Skip to main content

System Design

Scalability and System Design are very large topics with many topics and resources, since there is a lot to consider when designing a software/hardware system that can scale. Expect to spend quite a bit of time on this
Considerations:
- Scalability
  - Distill large data sets to single values
  - Transform one data set to another
  - Handling obscenely large amounts of data
- System design
  - features sets
  - interfaces
  - class hierarchies
  - designing a system under certain constraints
  - simplicity and robustness
  - tradeoffs
  - performance analysis and optimization
START HERE: The System Design Primer (GitHub)
System Design (GitHub)
System Design from HiredInTech (article)
How Do I Prepare To Answer Design Questions In A Technical Interview? - Quora (web)
8 Things You Need to Know Before a System Design Interview (article)
System Design Interview (GitHub) - There are a lot of resources in this one. Look through the articles and examples. I put some of them below
How to ace a systems design interview (article)
Numbers Everyone Should Know (article)
How long does it take to make a context switch? (article)
A plain English introduction to CAP Theorem (article)
Database Normalization - 1NF, 2NF, 3NF and 4NF - Youtube (video)
Transactions Across Datacenters - Youtube (video)
Distributed Systems, Spring 2020 - MIT 6.824 (20 videos)
Consensus Algorithms:
- Paxos - Paxos Agreement - Computerphile (video)
- Raft - An Introduction to the Raft Distributed Consensus Algorithm - Youtube (video)
  - Easy-to-read paper (article)
  - Infographic (web)
Consistent Hashing (article)
NoSQL Patterns (article)
Scalability:
- You don't need all of these. Just pick a few that interest you.
- Scalability - Harvard CS75 (video)
- Short article series:
  - 1- Clones (article)
  - 2- Database (article)
  - 3- Cache (article)
  - 4 - Asynchronism (article)
- Scalable Web Architecture and Distributed Systems (article)
- Fallacies of Distributed Computing Explained (pdf)
- Jeff Dean - Building Software Systems At Google and Lessons Learned - Youtube (video)
- Introduction to Architecting Systems for Scale (article)
- Scaling mobile games to a global audience using App Engine and Cloud Datastore - Youtube (video)
- How Google Does Planet-Scale Engineering for Planet-Scale Infra - Youtube (video)
- The Importance of Algorithms - TopCoder (article)
- Sharding (article)
- Engineering for the Long Game - Astrid Atkinson Keynote - Youtube (video)
- 7 Years Of YouTube Scalability Lessons In 30 Minutes (article)
  - Scalability at Youtube - Youtube (video)
- How PayPal Scaled To Billions Of Transactions Daily Using Just 8VMs (article)
- How to Remove Duplicates in Large Datasets (article)
- A look inside Etsy's scale and engineering culture with Jon Cowie - Youtube (video)
- What Led Amazon to its Own Microservices Architecture (article)
- To Compress Or Not To Compress, That Was Uber's Question (article)
- When Should Approximate Query Processing Be Used? (article)
- Google's Transition From Single Datacenter, To Failover, To A Native Multihomed Architecture (article)
- The Image Optimization Technology That Serves Millions Of Requests Per Day (article)
- A Patreon Architecture Short (article)
- Tinder: How Does One Of The Largest Recommendation Engines Decide Who You'll See Next? (article)
- Design Of A Modern Cache (article)
- Live Video Streaming At Facebook Scale (article)
- A Beginner's Guide To Scaling To 11 Million+ Users On Amazon's AWS (article)
- A 360 Degree View Of The Entire Netflix Stack (article)
- Latency Is Everywhere And It Costs You Sales - How To Crush It (article)
- What Powers Instagram: Hundreds of Instances, Dozens of Technologies (article)
- Salesforce Architecture - How They Handle 1.3 Billion Transactions A Day (article)
- ESPN's Architecture At Scale - Operating At 100,000 Duh Nuh Nuhs Per Second (article)
- Messaging, Serialization, and Queueing Systems are used to glue services together
  - Thrift
  - Protocol Buffers
  - gRPC
- Twitter:
  - O'Reilly MySQL CE 2011: Jeremy Cole, "Big and Small Data at @Twitter" - Youtube (video)
  - Timelines at Scale (video)
Practicing the system design process: Here are some ideas to try working through on paper, each with some documentation on how it was handled in the real world:
- Cheat Sheet
- Flow:
  1. Understand the problem and scope:
    - Define the use cases, with interviewer's help
    - Suggest additional features
    - Remove items that interviewer deems out of scope
    - Assume high availability is required, add as a use case
  2. Think about constraints:
    - Ask how many requests per month
    - Ask how many requests per second (they may volunteer it or make you do the math)
    - Estimate reads vs. writes percentage
    - Keep 80/20 rule in mind when estimating
    - How much data written per second
    - Total storage required over 5 years
    - How much data read per second
  3. Abstract design:
    - Layers (service, data, caching)
    - Infrastructure: load balancing, messaging
    - Rough overview of any key algorithm that drives the service
    - Consider bottlenecks and determine solutions
- Exercises:

Papers

Love classic papers?
1978: Communicating Sequential Processes
- implemented in Go
2003: The Google File System
- replaced by Colossus in 2012
2004: MapReduce: Simplified Data Processing on Large Clusters
- mostly replaced by Cloud Dataflow?
2006: Bigtable: A Distributed Storage System for Structured Data
2006: The Chubby Lock Service for Loosely-Coupled Distributed Systems
2007: Dynamo: Amazon’s Highly Available Key-value Store
- The Dynamo paper kicked off the NoSQL revolution
2007: What Every Programmer Should Know About Memory (very long, and the author encourages skipping of some sections)
2012: AddressSanitizer: A Fast Address Sanity Checker:
- paper
- video
2013: Spanner: Google’s Globally-Distributed Database:
- paper
- video
2014: Machine Learning: The High-Interest Credit Card of Technical Debt
2015: Continuous Pipelines at Google
2015: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads
2015: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
2015: How Developers Search for Code: A Case Study
More papers: 1,000 papers

Papers