Examples#
-
Multimodal Structured Outputs: Evaluating Image Understanding
Leverage image ablation to analyze textual bias in image understanding datasets.
-
Voice AI Analytics with Faster-Whisper and embed_text
Transcribe audio files into segments with timestamps and embed content.
-
MinHash Deduplication on Common Crawl
Clean web text at scale with MinHash, LSH Banding, and Connected Components.
-
Getting Started with Common Crawl
Daft provides a simple, performant, and responsible way to access Common Crawl data.
-
Audio Transcription with Whisper
Effortlessly transcribe audio to text at scale.
-
Build a 100% GPU Utilization Text Embedding Pipeline featuring spaCy and Turbopuffer
Generate and store millions of text embeddings in vector databases using distributed GPU processing and state-of-the-art models.
-
Generate Images with Stable Diffusion
Open Source image generation model on your own GPUs using Daft UDFs.
-
Daft's Four UDF Pattern Tutorial
One notebook, four UDF patterns, one dataset. Row-wise, generator, async, and stateful -- learn when to use each.
-
Window Functions: The Great Chocolate Race
Explore how window functions can reduce complex joins and groupby's to just a few simple operations.
-
Running LLMs on the Red Pajamas Dataset
Perform similarity search on Stack Exchange questions using language models and embeddings.
More Examples
For more examples, check out our new daft-examples repository!