Limited Time Offer:Up to 20% off Hello Interview Premium
Up to 20% off Hello Interview Premium 🎉
Hello Interview
Learn System Design
Introduction
How to Prepare
Delivery Framework
Core Concepts
Key Technologies
Common Patterns
Question Breakdowns
Networking Essentials
API Design
Data Modeling
Caching
Sharding
Consistent Hashing
CAP Theorem
Database Indexing
Numbers to Know
Bitly
Dropbox
Local Delivery Service
Ticketmaster
FB News Feed
Tinder
LeetCode
WhatsApp
Rate Limiter
FB Live Comments
FB Post Search
YouTube Top K
Uber
YouTube
Web Crawler
Ad Click Aggregator
News Aggregator
Yelp
Strava
Online Auction
Price Tracking Service
Instagram
Robinhood
Google Docs
Distributed Cache
Job Scheduler
Payment System
Metrics Monitoring
ChatGPT
Real-time Updates
Dealing with Contention
Multi-step Processes
Scaling Reads
Scaling Writes
Handling Large Blobs
Managing Long Running Tasks
Redis
Elasticsearch
Kafka
API Gateway
Cassandra
DynamoDB
PostgreSQL
Flink
ZooKeeper
Time Series Databases
Data Structures for Big Data
Vector Databases
Vote For New Content
Pricing
Sign in / Sign up
Search
⌘K
Pricing

Tutor

Common Problems

ChatGPT

Real-time Updates
Managing Long Running Tasks
Published
ByEvan King·
hard

Try This Problem Yourself

Practice with guided hints and real-time feedback

Understanding the Problem

💬 What is ChatGPT? Unless you've been living under a rock, you know what ChatGPT is. It's a conversational AI product where users send prompts in natural language and get responses streamed back from a large language model. Conversations are saved, so users can come back to an old chat and pick up right where they left off.
For this problem we treat the LLM as a black box we call, not something we train or run the internals of. All the design lives in the serving system around it, in how we stream tokens back fast, how we schedule scarce GPUs, and how we keep cost sane as conversations grow. We'll also scope this to text in, text out only, with no images, audio, or video, and no editing or branching of existing messages.

Functional Requirements

Core Requirements
  1. Users should be able to send a prompt in a chat and receive an AI-generated response.
  2. Users should be able to view past chats and resume a conversation, with the chat's prior context carried into the prompt.
Below the line (out of scope)
  • Editing or branching existing messages.
  • Image, audio, or video input and output (text only).
  • Sharing chats or collaborating on a chat with other users.
  • Custom GPTs, tool / function calling, and web browsing.
  • Full-text search across a user's chat history.

Non-Functional Requirements

Non-functional requirements cover the properties of the system that matter to the user and the business.
ChatGPT feels broken if you stare at a blank screen for a few seconds after hitting enter, so latency to the first token matters more than total completion time. Because GPUs are the scarce, expensive resource here, the system has to be deliberate about who gets compute and when. ChatGPT serves a little over 200M daily active users at the time of writing, so that's the scale we'll design against.
With that framing, here are the requirements that actually shape the design.

The Set Up

Planning the Approach

Defining the Core Entities

API or System Interface

High-Level Design

1) Users should be able to send a prompt and receive an AI-generated response

2) Users should be able to view past chats and resume a conversation with context carried across turns

Potential Deep Dives

1) How do we stream tokens back fast, and keep the stream smooth?

2) How do we route and schedule generation requests across GPU workers?

3) How do we keep heavy users from monopolizing GPUs while giving paid tiers a better experience?

4) As conversations get longer, how do we control inference cost without making the assistant feel forgetful?

Cancelling a run and reclaiming the GPU

Some additional deep dives you might consider

What is Expected at Each Level?

Mid-level

Senior

Staff+

Purchase Premium to Keep Reading

Unlock this article and so much more with Hello Interview Premium
Buy Premium

Currently up to 20% off

Hello Interview Premium

System Design Guided Practice
Exclusive content
Recent interview questions
Learn More
Reading Progress

On This Page

Understanding the Problem

Functional Requirements

Non-Functional Requirements

The Set Up

Planning the Approach

Defining the Core Entities

API or System Interface

High-Level Design

1) Users should be able to send a prompt and receive an AI-generated response

2) Users should be able to view past chats and resume a conversation with context carried across turns

Potential Deep Dives

1) How do we stream tokens back fast, and keep the stream smooth?

2) How do we route and schedule generation requests across GPU workers?

3) How do we keep heavy users from monopolizing GPUs while giving paid tiers a better experience?

4) As conversations get longer, how do we control inference cost without making the assistant feel forgetful?

Some additional deep dives you might consider

What is Expected at Each Level?

Mid-level

Senior

Staff+

Questions
Meta SWE Interview QuestionsAmazon SWE Interview QuestionsGoogle SWE Interview QuestionsOpenAI SWE Interview QuestionsEngineering Manager (EM) Interview Questions
Learn
Learn System DesignLearn DSALearn BehavioralLearn ML System DesignLearn Low Level DesignGuided Practice
Links
FAQPricingGift PremiumHello Interview Premium
Legal
Terms and ConditionsPrivacy PolicySecurity
Contact
About UsProduct Support

7511 Greenwood Ave North Unit #4238 Seattle WA 98103


© 2026 Optick Labs Inc. All rights reserved.