mediumBackend EngineerData
Explain Python generators, iterators, and the yield keyword — when are they useful in production?
Posted 18/04/2026
by Mehedy Hasan Ador
Question Details
At a data-heavy company:
> "We need to process a 10GB CSV file but only have 2GB of RAM. Loading it all at once crashes. How do generators solve this?"
> "We need to process a 10GB CSV file but only have 2GB of RAM. Loading it all at once crashes. How do generators solve this?"
Suggested Solution
Generators — Lazy Evaluation
Regular function — loads ALL into memory
def loadall(filepath):
rows = []
with open(filepath) as f:
for line in f:
rows.append(parse(line))
return rows # 10GB in memory! 💥
Generator — yields one row at a time
def loadlazy(filepath):
with open(filepath) as f:
for line in f:
yield parse(line) # Only ONE row in memory
Usage — same interface
for row in loadlazy("huge.csv"): # Constant memory
process(row)
Generator Expression (like list comprehension but lazy)
List comprehension — all in memory
squares = [x2 for x in range(10000000)] # ~80MB
Generator expression — constant memory
squares = (x2 for x in range(10000000)) # ~200 bytes
yield from (delegating to sub-generators)
def flatten(matrix):
for row in matrix:
yield from row # Delegate to each row's iterator
list(flatten([[1, 2], [3, 4], [5]])) # [1, 2, 3, 4, 5]
Infinite Sequences
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
Take first 10
from itertools import islice
list(islice(fibonacci(), 10))
Pipeline Pattern
def readlines(path):
with open(path) as f:
yield from f
def parserows(lines):
for line in lines:
yield json.loads(line)
def filteractive(rows):
for row in rows:
if row["active"]:
yield row
Compose generators — each step is lazy
pipeline = filteractive(parserows(read_lines("data.jsonl")))
for row in pipeline:
print(row)