Managing Memory Usage#

Daft is a streaming execution engine where data flows through the pipeline defined by your query. Operators process data in bounded batches, but some steps can inflate data or materialize large intermediate results. When those steps outgrow available memory, workloads slow down, spill to disk, or fail with out-of-memory (OOM) errors. This page walks through the most common sources of memory pressure and the tuning levers you can use to keep pipelines stable.

Frequent Sources of Memory Pressure#

The majority of OOM incidents we observe come from a handful of patterns:

User-defined functions (UDFs): Python or library functions use a lot of memory, such as when doing model inference.
URL or object store downloads: Highly concurrent downloads queue large responses in memory before downstream operators consume them.
Inflationary operators: Operations like decoding, decompression, or exploding increase the size of each batch.
Materializing operators: Aggregations, sorts, and joins need to hold intermediate data in memory to produce a final result.

Understanding which operations your query contains can guide you to tuning the proper parameters.

Techniques to Reduce Memory Footprint#

Tune UDF Batch Size and Concurrency#

Lower the batch_size argument on the UDF so each invocation handles less data at once.
Cap UDF concurrency if each invocation is memory hungry. Fewer concurrent tasks often outperforms repeated worker restarts caused by OOM kills.

Limit Download Concurrency#

For remote reads (HTTP, S3, GCS, etc.), reduce the max_connections parameter when using download(). Lower download concurrency can keep memory bounded while allowing downstream operators to continue consuming data.

Batch Before Inflationary Operators#

Daft parallelizes work according to batch size. If the default batch size is too large, explicitly insert df.into_batches(...) ahead of decoding, exploding, or other expansion steps. Smaller batches prevent any single task from growing beyond memory limits.

Experiment with Execution Configs (Use Caution)#

Experimental Feature

Execution configuration parameters are experimental and subject to change between releases. Use with caution in production environments.

Advanced execution parameters in daft.context expose finer-grained controls (e.g. default batch sizing, target partition sizes, operator-specific buffers). These settings are subject to change between releases, so record any overrides you rely on and review them during upgrades.

Scale Resources#

If tuning alone is insufficient:

Move to machines with more memory per CPU.
On the Kubernetes or Ray, scale out to more workers or increase the number of partitions so each worker handles less data. See Kubernetes, Ray, and Partitioning and Batching for guidance.

If workloads continue to fail, we may be under-estimating required memory or you may have discovered a gap in our heuristics. Please reach out on Slack or open a GitHub issue so we can help investigate.