om pharate

Twitter System Design

om pharate

Mar 19

Let’s look at the requirements to start with

βš™οΈπŸ“Œ Functional Requirements

βš™οΈπŸ“Œ Non Functional Requirements

πŸ‘₯πŸ§‘β€πŸ€β€πŸ§‘ Types Of Users

πŸ”πŸ‘€ Capacity Estimation for Twitter System Design

Let us assume we have 1 billion total users with 200 million daily active users(DAU). and on average each user tweets 5 times a day

200M * 5 tweets = 1B/day
Enter fullscreen mode Exit fullscreen mode

Tweets can also contain media such as images, or videos. we can assume that 10 percent of tweets are media files shared by users.

10% * 1B = 100M/day
Enter fullscreen mode Exit fullscreen mode

1 billion Requests per day translate into 12k requests per second.

1B / (24 hrs * 3600 seconds) = 12k requests/seconds
Enter fullscreen mode Exit fullscreen mode

Let's assume each message on average is 100 bytes, we will require about 100 GB of database storage every day.

1 billion * 100 bytes = 100 GB/day 
Enter fullscreen mode Exit fullscreen mode

πŸ—„οΈπŸ’Ύ Data Storage

πŸ“ Note: Remember how Twitter is a very read-heavy system? Well, while designing a read-heavy system, we need to make sure that we are precomputing and caching as much as we can to keep the latency as low as possible.

Twitter System Design Microservices Architecture

Twitter System Design Database Architecture

Twitter System Design High Level Architecture

πŸ”₯πŸ“ˆ High Level Design

Twitter System Design High Level Architecture

View High-Quality Image - Download

πŸ”Ή Register & Login Flow πŸ”‘πŸ‘€
When a user registers or logs in, the request goes through a Load Balancer (LB) to the User Service. The service interacts with the User DB MySQL Cluster, which stores authentication details. To speed up login verification, Redis is used for caching user session tokens. After authentication, a session or JWT is issued to the user for subsequent requests.

πŸ”Ή User Follow/Unfollow Flow πŸ”„πŸ‘₯
When a user follows or unfollows someone, the request is sent via the LB to the Graph Service, which manages user relationships. This service updates the User Graph DB (MySQL Cluster) and caches frequently accessed follow data in Redis to reduce DB queries. Updates to the user’s feed can be sent to Kafka for processing, ensuring scalability.

πŸ”Ή Tweet Flow πŸ“πŸ“’
When a user posts a new tweet, the request is handled by the Tweet Service, which stores the tweet in the Cassandra Cluster (User Tweets DB) for scalability. If the tweet contains media, it is stored in Blob Storage and served via the CDN. The tweet event is also pushed to Kafka, where different consumers (such as Tweet Preprocessing Service and Analytics Service) process it for engagement tracking and recommendations.

πŸ”Ή Search & Analytics Flow πŸ”πŸ“Š
The Search Service queries Elasticsearch, which is updated via the Search Consumer that listens to Kafka events. To improve performance, Redis caches frequently searched terms and trending topics. The Analytics Service tracks user interactions and forwards insights to Apache Spark, which processes large-scale data and stores it in a Hadoop Cluster. Processed insights are then used to enhance recommendations and notifications.

πŸ’― Conclusion

This system is well-architected with horizontal scalability, event-driven processing, and efficient caching strategies! πŸš€