Releases model benchmark available version update product

[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design Philosophy

DEV Communityby DApril 1, 202612 min read1 views

<blockquote> <p><strong>From the Author:</strong><br> Recently, I introduced <strong>D-MemFS</strong> on Reddit. The response was overwhelming, confirming that memory management and file I/O performance are truly universal challenges for developers everywhere. This series is my response to that global interest.</p> </blockquote> <h3> 🧭 About this Series: The Two Sides of Development </h3> <p>To provide a complete picture of this project, I’ve split each update into two perspectives:</p> <ul> <li> <strong>Side A (Practical / from Qiita):</strong> Implementation details, benchmarks, and technical solutions.</li> <li> <strong>Side B (Philosophy / from Zenn):</strong> The development war stories, AI-collaboration, and design decisions.</li> </ul> <h2> Introduction </h2> <p>If you write in-mem

From the Author: Recently, I introduced D-MemFS on Reddit. The response was overwhelming, confirming that memory management and file I/O performance are truly universal challenges for developers everywhere. This series is my response to that global interest.

🧭 About this Series: The Two Sides of Development

To provide a complete picture of this project, I’ve split each update into two perspectives:

Side A (Practical / from Qiita): Implementation details, benchmarks, and technical solutions.
Side B (Philosophy / from Zenn): The development war stories, AI-collaboration, and design decisions.

Introduction

If you write in-memory processing in Python, you will eventually encounter this kind of failure:

Killed

Enter fullscreen mode

Exit fullscreen mode

Or on Windows, the process simply vanishes without a word. It's an OOM (Out of Memory) kill. Both io.BytesIO and dict will expand limitlessly until memory runs out. The process disappears without you even knowing "where" or "why" it crashed—this is one of the most troublesome pitfalls of Python in-memory processing.

In this article, I will dig into how the Hard Quota design of D-MemFS solves this problem, right from its core design philosophy.

The Problem: BytesIO and dict Swell Limitlessly

First, let's clarify the problem.

from io import BytesIO

buf = BytesIO()

It won't stop no matter how much you write

It continues to succeed until physical memory runs out

for i in range(100_000): buf.write(b"x" * 10_000)*

print(buf.tell()) # 1,000,000,000 — 1 GiB`

Enter fullscreen mode

Exit fullscreen mode

This write does not fail. It stubbornly continues succeeding until the OS kills the process.

The same applies to dict.

vfs: dict[str, bytes] = {} for i in range(100_000):  vfs[f"file_{i}.bin"] = b"x" * 10_000*_

vfs: dict[str, bytes] = {} for i in range(100_000):  vfs[f"file_{i}.bin"] = b"x" * 10_000*_

No errors until 1 GiB piles up`

Enter fullscreen mode

Exit fullscreen mode

Soft Quotas Are Not Enough

An approach like "checking the size and warning after writing" is called a soft quota. But this has a fundamental flaw—the data has already been written.

# Pseudo-implementation of a soft quota (A bad example) MAX_BYTES = 100 * 1024 * 1024 # 100 MiB total = 0

# Pseudo-implementation of a soft quota (A bad example) MAX_BYTES = 100 * 1024 * 1024 # 100 MiB total = 0

def soft_write(buf: BytesIO, data: bytes) -> None: buf.write(data) # <- Writes first global total total += len(data) if total > MAX_BYTES: # <- Notices after writing raise MemoryError("quota exceeded") # Too late`

Enter fullscreen mode

Exit fullscreen mode

The moment the threshold is exceeded, the memory has already been consumed. Furthermore, rolling back the written data after throwing an exception is not easy.

D-MemFS's Hard Quota Design

The D-MemFS quota operates on a Central Bank model. Before a write is executed, it checks the remaining quota balance and immediately rejects the write if there isn't enough.

write(data) is called  ↓ Requests a reservation of len(data) bytes from the Quota Manager  ↓ Is the balance sufficient?  YES -> Decreases balance and executes write  NO -> raises MFSQuotaExceededError (the write never happens)

write(data) is called  ↓ Requests a reservation of len(data) bytes from the Quota Manager  ↓ Is the balance sufficient?  YES -> Decreases balance and executes write  NO -> raises MFSQuotaExceededError (the write never happens)

Enter fullscreen mode

Exit fullscreen mode

Data is never written. The file is not polluted, and you can catch the exception and continue processing.

Code Example: Actually Using the Quota

Basic Quota Settings and Exception Handling

from dmemfs import MemoryFileSystem, MFSQuotaExceededError

10 MiB Hard Quota

mfs = MemoryFileSystem(max_quota=10 * 1024 * 1024) mfs.mkdir("/data")

def safe_write(mfs: MemoryFileSystem, path: str, data: bytes) -> bool: """Can continue processing even if writing fails""" try: with mfs.open(path, "wb") as f: f.write(data) return True except MFSQuotaExceededError as e: print(f"[Warning] Skipped writing to {path} due to quota excess: {e}") return False

Success Case

safe_write(mfs, "/data/small.bin", b"x" * (1 * 1024 * 1024)) # 1 MiB → OK*

Failure Case (Exceeds quota)

safe_write(mfs, "/data/big.bin", b"x" * (20 * 1024 * 1024)) # 20 MiB → Skipped*

The file is not polluted (opening 'wb' leaves an empty file, but no data was written)

st = mfs.stat("/data/big.bin") print(st["size"]) # 0`

Enter fullscreen mode

Exit fullscreen mode

Processing While Checking the Remaining Quota

from dmemfs import MemoryFileSystem, MFSQuotaExceededError

QUOTA = 64 * 1024 * 1024 # 64 MiB mfs = MemoryFileSystem(max_quota=QUOTA) mfs.mkdir("/chunks")

def process_stream(stream, chunk_size: int = 4 * 1024 * 1024): """Reads a stream into memory by chunks""" chunk_index = 0 written_paths = []

for chunk in iter(lambda: stream.read(chunk_size), b""): path = f"/chunks/chunk_{chunk_index:04d}.bin" try: with mfs.open(path, "wb") as f: f.write(chunk) written_paths.append(path) chunk_index += 1 except MFSQuotaExceededError: print(f"Quota reached: Kept up to {chunk_index} chunks") break_

return written_paths`

Enter fullscreen mode

Exit fullscreen mode

Node Count Limit: MFSNodeLimitExceededError

You can also set a limit on the number of files (nodes). This helps to quickly detect bugs that cause the file count to explode.

from dmemfs import MemoryFileSystem, MFSNodeLimitExceededError

Max 100 files

mfs = MemoryFileSystem(max_nodes=100) mfs.mkdir("/logs")

for i in range(200): try: with mfs.open(f"/logs/entry_{i:04d}.log", "xb") as f: f.write(f"log entry {i}\n".encode()) except MFSNodeLimitExceededError: print(f"Node limit reached: Stopped at {i} files") break`_

Enter fullscreen mode

Exit fullscreen mode

Storage Backends: SequentialMemoryFile and RandomAccessMemoryFile

D-MemFS has two types of storage backends.

SequentialMemoryFile (Sequential)

Implemented internally as a chain of byte sequences (list[bytes]).

Fast appending and reading from the beginning
Slow random access (needs to traverse chunks)
High memory efficiency (fewer allocations)

RandomAccessMemoryFile (Random Access)

Implemented internally as a bytearray.

Fast seek + read/write
Because it pre-allocates buffers during writing, doing only sequential writing might result in wasted memory.

auto-promotion (Automatic Promotion)

When default_storage="auto" (the default), it observes the file access pattern and automatically switches backends.

File created -> Starts as SequentialMemoryFile  ↓ Random access (seek) is detected  ↓ Automatically promoted to RandomAccessMemoryFile

File created -> Starts as SequentialMemoryFile  ↓ Random access (seek) is detected  ↓ Automatically promoted to RandomAccessMemoryFile

Enter fullscreen mode

Exit fullscreen mode

# You can also explicitly pin the backend from dmemfs import MemoryFileSystem

# You can also explicitly pin the backend from dmemfs import MemoryFileSystem

mfs_seq = MemoryFileSystem(default_storage="sequential") # Always sequential mfs_ra = MemoryFileSystem(default_storage="random_access") # Always random access mfs_auto = MemoryFileSystem(default_storage="auto") # Auto (default)`

Enter fullscreen mode

Exit fullscreen mode

promotion_hard_limit: Suppressing Promotion of Giant Files

Auto-promotion entails copying data into a bytearray upon random access. If this happens with an extremely large file, memory usage temporarily doubles.

By setting promotion_hard_limit, files exceeding this size will not automatically promote.

from dmemfs import MemoryFileSystem

Files 64 MiB or larger will not automatically promote

mfs = MemoryFileSystem( max_quota=512 * 1024 * 1024, promotion_hard_limit=64 * 1024 * 1024, )`

Enter fullscreen mode

Exit fullscreen mode

In pipelines handling massive data, this is an important parameter to prevent memory spikes. In enterprise batch processing or data pipelines, this parameter acts as a safety net purposefully designed to smooth out memory spikes. Being able to strictly control the upper limit of memory usage synergizes well with K8s memory limits and CI resource constraints, tying directly into operational stability.

Memory Accounting: What is Included in the Quota

The quota tracks more than just pure data bytes.

Quota consumption = Bytes of Actual Data + Chunk Overhead

Enter fullscreen mode

Exit fullscreen mode

Since SequentialMemoryFile retains data in chunks, the chunk header information is slightly added as overhead. Because of this, when configuring "Quota = 10 MiB", the actual memory usage will confidently stay under 10 MiB (the actual data will be slightly less due to the overhead).

This design prioritizes the guarantee that "the quota is absolutely never exceeded".

Thread-Safe Atomic Operations

Quota updates are handled atomically under locks.

from dmemfs import MemoryFileSystem import threading

from dmemfs import MemoryFileSystem import threading

mfs = MemoryFileSystem(max_quota=10 * 1024 * 1024) mfs.mkdir("/concurrent")

errors = []

def writer(thread_id: int): for i in range(50): try: path = f"/concurrent/t{thread_id}f{i}.bin" with mfs.open(path, "xb") as f: f.write(b"x" * (100 * 1024)) # 100 KiB each except Exception as e: errors.append(e)

threads = [threading.Thread(target=writer, args=(i,)) for i in range(10)] for t in threads: t.start() for t in threads: t.join()

Excess requests over the quota yield exceptions, but the file system isn't broken

quota_errors = [e for e in errors if "quota" in str(e).lower()] print(f"Quota exceeded: {len(quota_errors)} times (Normal behavior)") print(f"FS Corruption: None")`

Enter fullscreen mode

Exit fullscreen mode

If the two steps of "checking the quota and writing" are separated, a race condition could occur where another thread cuts in between the check and the write to exhaust the quota. In D-MemFS, this verification and reservation are executed under a single RW lock, completely eliminating this conflict.

A World With Hard Quotas vs. Without

No Quota (BytesIO / dict) D-MemFS Hard Quota

Behavior on memory exceedance Process is OOM killed

MFSQuotaExceededError rises

Detection timing Unnoticed until OS kills it Detected instantly before write

Catching the exception Impossible (SIGKILL) Recoverable with try/except

Rollback Impossible Unnecessary since write hasn't happened

File integrity May be corrupted Guaranteed

Logging / Monitoring Often lost Can be logged as an exception

Quota Configuration Guidelines

Here are practical guidelines regarding what values to set.

import os import psutil

import os import psutil

def recommended_quota() -> int: """ Example of using a certain percentage of available memory as a quota. In production, a fixed value is more predictable. """ available = psutil.virtual_memory().available return int(available * 0.25) # 25% of available memory*

Rule of thumb for actual use cases

QUOTAS = { "unit_test": 32 * 1024 * 1024, # 32 MiB — For testing "ci_pipeline": 256 * 1024 * 1024, # 256 MiB — CI Pipeline "batch_processing": 2 * 1024 * 1024 * 1024, # 2 GiB — Batch processing }`*

Enter fullscreen mode

Exit fullscreen mode

Basic Principle: Estimate the worst-case input size and set the quota to 1.5 - 2 times that amount. If that exceeds the total memory budget of the process, reconsider the design.

Memory Guard: Detecting Physical Memory Depletion in Advance

While a Hard Quota manages the "budget within the virtual FS," there remains another problem—when the set quota exceeds the physical memory of the host machine.

For example, even if you set max_quota=4GiB, if the machine only has 2 GiB of free memory, the OS will execute an OOM kill before reaching the quota. Hard quotas alone cannot prevent this.

The Memory Guard introduced in v0.3.0 addresses these "OOMs occurring outside the quota."

3 Modes

Mode Behavior

"none" No checks (Default, backward compatible)

"init" Checks if max_quota exceeds available memory at FS initialization

"per_write" Checks physical memory balance on every write (interval specifiable)

from dmemfs import MemoryFileSystem

Detect insufficient memory upon initialization (Recommended)

mfs = MemoryFileSystem( max_quota=4 * 1024 * 1024 * 1024, # 4 GiB memory_guard="init", memory_guard_action="raise", # If "warn", yields ResourceWarning )*

Check per write (For stricter use cases)

mfs = MemoryFileSystem( max_quota=4 * 1024 * 1024 * 1024, memory_guard="per_write", memory_guard_action="warn", memory_guard_interval=1.0, # Check interval in seconds )`*

Enter fullscreen mode

Exit fullscreen mode

The Relationship Between Hard Quotas and Memory Guard

It might be easier to understand with an analogy of a house.

Hard Quota = The area of a room. A limit on how much baggage you can place.
Memory Guard = The building's load-bearing limit check. Confirming whether the building can withstand that weight in the first place.

Only when both are present is the safety of in-memory processing truly complete.

Design Ingenuity of "per_write" Mode

Since the "per_write" mode queries the OS for physical memory balance every time, there are concerns about performance impact. To address this, the memory_guard_interval parameter can control the check interval. The default is 1 second—if 1 second hasn't passed since the last check, it uses the cached value.

# Secures safety while maintaining performance even with high-frequency writes mfs = MemoryFileSystem(  max_quota=1 * 1024 * 1024 * 1024,  memory_guard="per_write",  memory_guard_action="raise",  memory_guard_interval=2.0, # Checks every 2 seconds )

# Secures safety while maintaining performance even with high-frequency writes mfs = MemoryFileSystem(  max_quota=1 * 1024 * 1024 * 1024,  memory_guard="per_write",  memory_guard_action="raise",  memory_guard_interval=2.0, # Checks every 2 seconds )

Enter fullscreen mode

Exit fullscreen mode

The guarantee of the Hard Quota that "it absolutely never exceeds the quota", and the guarantee of the Memory Guard that "it won't keep running while physical memory is lacking". This dual defense is the complete picture of D-MemFS's OOM countermeasures.

Behavior in free-threaded Python (GIL=0)

In the free-threaded mode (python3.13t) introduced in Python 3.13 onwards, there is no GIL, making thread conflicts more surface-level. D-MemFS has been tested in GIL=0 environments (369 tests × 3 OS × 3 Python versions), and quota atomicity is guaranteed by explicit locks irrelevant of the GIL.

# Testing in free-threaded Python python3.13t -c " from dmemfs import MemoryFileSystem import threading

# Testing in free-threaded Python python3.13t -c " from dmemfs import MemoryFileSystem import threading

mfs = MemoryFileSystem(max_quota=5 * 1024 * 1024) mfs.mkdir('/test')

def worker(n): for i in range(100): try: with mfs.open(f'/test/w{n}{i}.bin', 'xb') as f: f.write(b'x' * 10240) except Exception: pass*

threads = [threading.Thread(target=worker, args=(i,)) for i in range(20)] for t in threads: t.start() for t in threads: t.join() print('Completed (No crashes)') "`

Enter fullscreen mode

Exit fullscreen mode

Conclusion

OOM is a failure that is immensely difficult to debug. Staff traces are rarely left behind, and it's hard to identify which code is the cause. By "proactively rejecting writes that don't fit in the budget," Hard Quotas convert this problem into a catchable exception.

D-MemFS's quota design is based on the philosophy of "No Surprises." Memory usage will never exceed the configured limit, exceptions can be handled with try/except, and the integrity of the file system is always maintained.

If you have ever experienced an OOM failure in in-memory processing, please do give it a try.

pip install D-MemFS

Enter fullscreen mode

Exit fullscreen mode

GitHub: https://github.com/nightmarewalker/D-MemFS
PyPI: https://pypi.org/project/D-MemFS/

🔗 Links & Resources

GitHub: https://github.com/nightmarewalker/D-MemFS
Original Japanese Article: PythonのOOMキルを完全防御する：BytesIOの罠とD-MemFS「ハードクォータ」の設計思想

If you find this project interesting, a ⭐ on GitHub would be the best way to support my work!

Original source

DEV Community

https://dev.to/d_9d93cd53/side-a-completely-defending-python-from-oom-kills-the-bytesio-trap-and-d-memfs-hard-quota-2pbg

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarkavailable

ModelsLive

Netflix - yes Netflix - jumps on the AI bandwagon with video editor

Video-language model revises how objects interact when things get removed from a scene A new Netflix model promises to rewrite the way we make movies. Just imagine this. As the director of the multi-million dollar epic Car Crash III: Suddenest Impact, you've just finished filming the finale where your star, Cruz Control, drives straight into an onrushing semi.…

The Register AI/ML

1m31 minutes ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

GNews AI energy

1m3 days ago

Models

NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design

Co-designed hardware, software, and models are key to delivering the highest AI factory throughput and lowest token cost. Measuring this goes far beyond peak...

NVIDIA Developer Blog

10m2 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 165 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design Philosophy

🧭 About this Series: The Two Sides of Development

Introduction

The Problem: BytesIO and dict Swell Limitlessly

It won't stop no matter how much you write

It continues to succeed until physical memory runs out

No errors until 1 GiB piles up`

Soft Quotas Are Not Enough

D-MemFS's Hard Quota Design

Code Example: Actually Using the Quota

Basic Quota Settings and Exception Handling

10 MiB Hard Quota

Success Case

Failure Case (Exceeds quota)

The file is not polluted (opening 'wb' leaves an empty file, but no data was written)

Processing While Checking the Remaining Quota

Node Count Limit: MFSNodeLimitExceededError

Max 100 files

Storage Backends: SequentialMemoryFile and RandomAccessMemoryFile

SequentialMemoryFile (Sequential)

RandomAccessMemoryFile (Random Access)

auto-promotion (Automatic Promotion)

promotion_hard_limit: Suppressing Promotion of Giant Files

Files 64 MiB or larger will not automatically promote

Memory Accounting: What is Included in the Quota

Thread-Safe Atomic Operations

Excess requests over the quota yield exceptions, but the file system isn't broken

A World With Hard Quotas vs. Without

Quota Configuration Guidelines

Rule of thumb for actual use cases

Memory Guard: Detecting Physical Memory Depletion in Advance

3 Modes

Detect insufficient memory upon initialization (Recommended)

Check per write (For stricter use cases)

The Relationship Between Hard Quotas and Memory Guard

Design Ingenuity of "per_write" Mode

Behavior in free-threaded Python (GIL=0)

Conclusion

🔗 Links & Resources

Daily AI Digest

More about

Netflix - yes Netflix - jumps on the AI bandwagon with video editor

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Releases

Meta's AI push is reshaping how work gets done inside the company - Business Insider

Toolkits launched to boost inclusive education,Artificial Intelligence in Gulf States and Yemen - Qatar Tribune

How AI is reshaping the way data practitioners work

Bank Trojan 'Casbaneiro' Worms Through Latin America