Requirements (5 mins):
Functional Requirements
Identify core features (e.g., "Users should be able to post tweets"). Prioritize 2-3 key features.
- users can browse lists of coding problems
- users can view the problem and coding in different languages
- users can submit a solution to the coding problem and get the result
- users can view the lead board
Non-Functional Requirements
Focus on system qualities like scalability, latency, and availability. Quantify where possible (e.g., "render feeds in under 200ms").
- scale 1m DAU
- availability >> consistency
- security, isolate env to run code
- user should be able to validate the solution within 1 second
Capacity Estimation
Skip unnecessary calculations unless they directly impact the design (e.g., sharding in a TopK system).
100k users take competition, refresh lead board requests 6 per minute, QPS is 100k x 6 / 60 = 10k, heavy load read
Core Entities (2 mins)
Identify key entities (e.g., User, Tweet, Follow) to define the system's foundation.
- problem
- solution
- lead board
API/System Interface (5 mins)
Define the contract between the system and users. Prefer RESTful APIs unless GraphQL is necessary.
- GET /problems?page&company&category -> [problems]
- GET /problem/id -> {desc, category, tags ...}
- POST /solution/problem_id, body {lang, code} -> [pass/fail, timecost]
- GET /leadboard/problem_id?page -> [rankings]
[Optional] Data Flow (5 mins)
Describe high-level processes for data-heavy systems (e.g., web crawlers).
High-Level Design (10-15 mins)
Draw the system architecture, focusing on core components (e.g., servers, databases). Keep it simple and iterate based on API endpoints.
Deep Dives (10 mins)
Address non-functional requirements, edge cases, and bottlenecks. Proactively improve the design (e.g., scaling, caching, database sharding).
how to achieve isolation and security
- mount the code as read only, and write any output to /tmp directory
- set limits for CPU/memory usage for the container
- avoid infinite loop or long time run, run as subprocess and monitor timeout, kill if needed
- limited network access (VPC)
- no system calls, mock it, or restrict it
how to solve the heavy read load of query lead board
We can use cache, but it won't be up to date, we can use Redis sorted set to implement leadboard, update both Redis and DB, but just query Redis to get the top N.
how to scale to support 100k concurrent users for competition
The submission is CPU intensive, and we can auto scale the docker containers using cloud service, and in case of peek usage, we can add queue in our system, but that will change the POST solution API to async, and we need add another API to query the result.
how to write test cases efficiently for all languages
We can define the test cases using test vector, define the input, and expected output using JSON format.