Design a rate limiter for a distributed system — what algorithms exist and which would you choose for an API gateway?

Question

At a cloud provider interview:

> "We need to implement rate limiting at our API gateway. Millions of users, distributed across 10 regions. Requirements: 100 req/s per user, with burst support up to 150. Which algorithm would you choose and why?"

Interview OS Community · Accepted Answer

## Rate Limiting Algorithms ### 1. Token Bucket (Recommended for burst support) ``` Bucket holds tokens (max: 150) Refills at: 100 tokens/sec Each request consumes 1 token Bucket empty → reject Timeline: [150 tokens] → burst of 150 requests → [0 tokens] → reject refill at 100/sec [100 tokens after 1s] → accept 100 more ``` ```typescript // Redis implementation async function tokenBucket(userId: string): Promise { const key = `bucket:${userId}`; const [tokens, lastRefill] = await redis.hmget(key, "tokens", "lastRefill"); const now = Date.now(); const elapsed = (now - parseInt(lastRefill)) / 1000; const newTokens = Math.min( MAX_TOKENS, // 150 parseInt(tokens) + elapsed * REFILL_RATE // 100/sec ); if (newTokens < 1) return false; // Rate limited await redis.hmset(key, { tokens: newTokens - 1, lastRefill: now, }); redis.expire(key, Math.ceil(MAX_TOKENS / REFILL_RATE)); return true; // Allowed } ``` ### 2. Sliding Window Counter ```typescript async function slidingWindow(userId: string): Promise { const now = Date.now(); const windowKey = `rate:${userId}:${Math.floor(now / 1000)}`; const count = await redis.incr(windowKey); if (count === 1) await redis.expire(windowKey, 2); // 2s buffer return count <= LIMIT; // 100 } ``` ## Algorithm Comparison | Algorithm | Burst Support | Memory | Accuracy | Use Case | |-----------|--------------|--------|----------|----------| | **Token Bucket** | ✅ Yes | O(1) | Medium | API gateway (recommended) | | **Leaky Bucket** | ❌ No (smooth) | O(1) | High | Traffic shaping | | **Fixed Window** | ✅ (edge burst) | O(1) | Low | Simple limits | | **Sliding Window Log** | ✅ Yes | O(N) | High | Precise audit | | **Sliding Window Counter** | Partial | O(1) | Medium | Production default | ## Distributed Rate Limiting Challenge ``` Region A: user has 50 tokens Region B: user has 50 tokens Total: 100 tokens across regions → user gets 200 req/s (2x limit!) ``` ### Solutions **Option A: Redis Cluster (shared state)** ```typescript // All regions point to same Redis cluster const redis = new Redis.Cluster([ { host: "redis-use1.example.com" }, { host: "redis-euw1.example.com" }, ]); // Adds ~2-5ms latency per cross-region check ``` **Option B: Regional limits + sticky sessions** ``` // Divide limit across regions: 100/10 = 10 req/s per region // User typically hits 1-2 regions → gets 10-20 req/s // Simple but imprecise ``` **Option C: Async reconciliation (best for scale)** ``` // Each region allows request, logs to Kafka // Background job checks global rate, revokes tokens if exceeded // Eventual consistency — slight over-limit possible ``` ## Production Choice: Token Bucket with Redis Cluster ```nginx # Nginx API Gateway config limit_req_zone $user_id zone=api:10m rate=100r/s; limit_req zone=api burst=50 nodelay; # burst=50 allows bursting to 150, nodelay serves immediately ```

Interview OS

Design a rate limiter for a distributed system — what algorithms exist and which would you choose for an API gateway?

Question Details

Suggested Solution

Rate Limiting Algorithms

1. Token Bucket (Recommended for burst support)

2. Sliding Window Counter

Algorithm Comparison

Distributed Rate Limiting Challenge

Solutions

Production Choice: Token Bucket with Redis Cluster

Discussion (0)

Algorithm	Burst Support	Memory	Accuracy	Use Case
Token Bucket	✅ Yes	O(1)	Medium	API gateway (recommended)
Leaky Bucket	❌ No (smooth)	O(1)	High	Traffic shaping
Fixed Window	✅ (edge burst)	O(1)	Low	Simple limits
Sliding Window Log	✅ Yes	O(N)	High	Precise audit
Sliding Window Counter	Partial	O(1)	Medium	Production default