Requirements (5 mins):

Functional Requirements

Identify core features (e.g., "Users should be able to post tweets"). Prioritize 2-3 key features.

Non-Functional Requirements

Focus on system qualities like scalability, latency, and availability. Quantify where possible (e.g., "render feeds in under 200ms").

Capacity Estimation

Skip unnecessary calculations unless they directly impact the design (e.g., sharding in a TopK system).

Core Entities (2 mins)

Identify key entities (e.g., User, Tweet, Follow) to define the system's foundation.

API/System Interface (5 mins)

Define the contract between the system and users. Prefer RESTful APIs unless GraphQL is necessary.

[Optional] Data Flow (5 mins)

Describe high-level processes for data-heavy systems (e.g., web crawlers).

High-Level Design (10-15 mins)

Draw the system architecture, focusing on core components (e.g., servers, databases). Keep it simple and iterate based on API endpoints.

image

Deep Dives (10 mins)

Address non-functional requirements, edge cases, and bottlenecks. Proactively improve the design (e.g., scaling, caching, database sharding).

image

search within 500 ms

sql range query for lat/long is not efficient, we can either use geo index or quad tree. For keyword search, we could use the reverse index. ElasticSearch support those features, and we can use CDC (Change Data Capture) feature in DB to sync the data using queue/stream. However this adds extra service and introduces some complexity in the system. Based on our estimation above, the database size is around 1TB, and we can actually use Postgres with PostGIS plugin in this case, it support full text search as well as geo indexing.

search by predefined names like city etc

Search within radius won't work in this case b/c the shape is polygons. We can download polygons data from geoapify and add location table:

Postgres with PostGIS plugin and Elasticsearch both support polygons query. However it's not efficient, and we can precompute the location names and stored in DB to avoid compute on each query, and compute once when the record is created.

how to update avg rating

Query all review records for one business and calculating the score is not optimal, we can actually update it once new review record added, using a iterative formulae to calculate the avg value, so basically $avg(n) = [r(n) + r(n-1) + ... + 1] / n = r(n) / n + avg(n-1) \times (n-1) / n$, however there could be concurrency update issues, which could be either via row level lock or optimistic concurrency control introduced in Avoid double booking

one review per user per biz

application level checking works, but would be better to have the constraints at DB layer

ALTER TABLE reviews
ADD CONSTRAINT unique_user_business UNIQUE (user_id, business_id);