hardBackend EngineerTechnology
What is database sharding, and how does it differ from replication? When would you use each?
Posted 18/04/2026
by Mehedy Hasan Ador
Question Details
At a scale-up company interview:
> "Our MongoDB cluster handles 10K writes/sec. We're hitting the capacity of a single primary. What's our scaling strategy — replication, sharding, or both?"
> "Our MongoDB cluster handles 10K writes/sec. We're hitting the capacity of a single primary. What's our scaling strategy — replication, sharding, or both?"
Suggested Solution
Replication vs Sharding
Replication (Copies of same data)
Primary → Secondary 1
→ Secondary 2
→ Secondary 3
All nodes have the SAME data
Purpose: High availability, read scaling, disaster recoverySharding (Data split across nodes)
Shard 1: Users A-M → Primary + 2 Secondaries (replica set)
Shard 2: Users N-Z → Primary + 2 Secondaries (replica set)
Purpose: Write scaling, storage scaling, horizontal scalingWhen to Use Each
MongoDB Sharding Architecture
Client → mongos (Router) → Config Servers (metadata)
↓
Shard 1 (A-M) ←→ Shard 2 (N-Z) ←→ Shard 3 (logs)
(replica set) (replica set) (replica set)
Shard Key Selection (CRITICAL)
// Enable sharding
sh.enableSharding("interviewOS");
// Choose shard key
sh.shardCollection("interviewOS.applications", { userId: 1 });
// Good shard keys:
// - High cardinality (many unique values)
// - Low frequency (even distribution)
// - Non-monotonic (not auto-increment, causes hot shards)
Bad Shard Key: Auto-increment
Shard 1: IDs 1-1000
Shard 2: IDs 1001-2000 (all NEW writes go here = hot shard!)
Good Shard Key: Hash
sh.shardCollection("interviewOS.questions", { _id: "hashed" });
// Even distribution, writes spread across all shards
Scaling Decision Tree
Is your dataset > single server capacity?
→ YES: Shard (horizontal scale)
Is your write throughput > single primary capacity?
→ YES: Shard (distribute writes)
Do you need more read throughput?
→ Add secondaries (replication)
Do you need high availability?
→ Replica set (minimum 3 nodes: 1 primary + 2 secondary)
Do you need BOTH?
→ Sharded cluster (each shard is a replica set)