Sunday, 26 October 2025

How to handle a file which is too large to fit into memory in python with example ?

When you're working with a file that's too large to fit into memory—say, multi-GB logs or datasets—Python gives you several efficient ways to process it line by line or in chunks, without loading the entire file at once.

Read Line-by-Line Using a File Iterator

This is the simplest and most memory-efficient approach:

python
def process_large_file(filepath):
    with open(filepath, 'r', encoding='utf-8') as file:
        for line in file:
            # Process each line here
            print(line.strip())  # Example: strip and print

Why it works: The file object is an iterator, so it reads one line at a time—keeping memory usage low.

✅ Read in Fixed-Size Chunks

Useful when lines are very long or you want more control:

python
def read_in_chunks(file_object, chunk_size=1024):
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data

with open('large_file.txt', 'r', encoding='utf-8') as f:
    for chunk in read_in_chunks(f):
        # Process each chunk 



No comments:

Post a Comment

Data Engineering - Client Interview question regarding data collection.

What is the source of data How the data will be extracted from the source What will the data format be? How often should data be collected? ...