Working with Videos
There are two main ways to work with videos in Daft:
- Use
daft.read_video_frames to read frames of a video into a DataFrame. - Use the
daft.VideoFile class to work with video files and metadata.
daft.VideoFile is a subclass of daft.File that provides a specialized interface for video-specific operations.
Reading Video Frames with daft.read_video_frames
This example shows reading a video's frames into a DataFrame using the daft.read_video_frames function.
Sampling frames by time interval
You can also downsample frames on the source side by specifying sample_interval_seconds.
Understanding Sampling Behavior
The sample_interval_seconds parameter enables time-based frame sampling, which is particularly useful for:
- Video summarization: Extract key frames at regular intervals
- Reducing processing load: Work with fewer frames while maintaining temporal coverage
- Creating thumbnails: Generate representative frames from long videos
- Analyzing trends: Sample frames at consistent time points for comparison
How Sampling Works
The sampling algorithm:
- Target times: Calculates target sampling times at 0, interval, 2interval, 3interval, ...
- Frame selection: For each target time, selects the first frame whose timestamp is >= target time
- Timestamp-based: Uses the frame's presentation timestamp (PTS), which indicates when the frame should be displayed
- Approximate: This is an approximate sampling strategy; actual sampling times depend on available frame timestamps
Impact of Video Characteristics
Constant Frame Rate (CFR) Videos: - Frame timestamps are evenly spaced - Sampling is more predictable - Example: 30 fps video with 1-second interval โ ~30 frames between samples
Variable Frame Rate (VFR) Videos: - Frame timestamps may be irregular - Sampling times may vary from target times - Common in screen recordings, animations, and optimized videos
Frame Timestamp Precision: - Different video formats use different time bases (e.g., 1/90000 for NTSC) - Floating-point precision is handled with a small epsilon tolerance - Frames without valid timestamps are skipped
Examples
Example 1: Uniform CFR Video
| Frame timestamps: [0.0, 0.033, 0.067, 0.100, 0.133, 0.167, 0.200, ...]
sample_interval_seconds=0.1
Sampled frames: [0.0, 0.100, 0.200, ...] # Exact matches
|
Example 2: Non-uniform Timestamps
| Frame timestamps: [0.0, 0.95, 1.05, 2.0, 2.95, 3.05]
sample_interval_seconds=1.0
Sampled frames: [0.0, 1.05, 2.0, 3.05] # First frame >= target time
|
Example 3: Large Frame Interval
| Frame timestamps: [0.0, 2.5, 5.0]
sample_interval_seconds=1.0
Sampled frames: [0.0, 2.5, 5.0] # Closest available frames
|
Example 4: VFR Video
| Frame timestamps: [0.0, 0.033, 0.100, 0.133, 0.233, 0.267, 1.0, 1.033]
sample_interval_seconds=1.0
Sampled frames: [0.0, 1.0] # Frames at 0.0s and 1.0s
|
Combining with Key Frame Filtering
You can combine time-based sampling with key frame filtering:
This is useful for: - Efficient processing: Work with fewer, more important frames - Video indexing: Create a sparse representation using key frames - Compression-aware sampling: Respect the video's compression structure
- Source-side filtering: Sampling happens at the data source, reducing memory and processing overhead
- Frame decoding: All frames are still decoded to check timestamps, but only sampled frames are processed
- Memory efficiency: Only sampled frames are stored in the resulting DataFrame
Limitations
- Approximate sampling: The exact sampling times may differ from target times
- Frame availability: If no frame exists near a target time, the closest available frame is selected
- Timestamp requirements: Frames without valid timestamps are skipped when sampling is enabled
- No interpolation: The algorithm does not interpolate between frames; it selects existing frames
| Output |
|---|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 | โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ path โ frame_index โ frame_time โ frame_time_base โ frame_pts โ frame_dts โ frame_duration โ is_key_frame โ data โ
โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ
โ Utf8 โ Int64 โ Float64 โ Utf8 โ Int64 โ Int64 โ Int64 โ Boolean โ Image[RGB; 480 x 640] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโชโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโก
โ s3://daft-oss-public-data/videos/โฆ โ 0 โ 0 โ 1/15360 โ 0 โ 0 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 1 โ 4 โ 1/15360 โ 61440 โ 61440 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 2 โ 5.333333333333333 โ 1/15360 โ 81920 โ 81920 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 3 โ 9.333333333333334 โ 1/15360 โ 143360 โ 143360 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 4 โ 10.666666666666666 โ 1/15360 โ 163840 โ 163840 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 5 โ 14.666666666666666 โ 1/15360 โ 225280 โ 225280 โ 1024 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ s3://daft-oss-public-data/videos/โฆ โ 6 โ 16 โ 1/15360 โ 245760 โ 245760 โ 1024 โ true โ <FixedShapeImage> โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโฏ
(Showing first 7 of 7 rows)
|
Note
You can specify multiple paths and use globs like daft.read_video_frames("/path/to/file.mp4") and daft.read_video_frames("/path/to/files-*.mp4")
Reading from YouTube
This example shows reading the key frames of a youtube video, you can also pass in a list of video urls.
| Output |
|---|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 | โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ path โ frame_index โ frame_time โ frame_time_base โ frame_pts โ frame_dts โ frame_duration โ is_key_frame โ data โ
โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ --- โ
โ Utf8 โ Int64 โ Float64 โ Utf8 โ Int64 โ Int64 โ Int64 โ Boolean โ Image[RGB; 480 x 640] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโชโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโก
โ https://www.youtube.com/watchโฆ โ 0 โ 0 โ 1/90000 โ 0 โ 0 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 1 โ 6.8068 โ 1/90000 โ 612612 โ 612612 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 2 โ 13.2132 โ 1/90000 โ 1189188 โ 1189188 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 3 โ 18.018 โ 1/90000 โ 1621620 โ 1621620 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 4 โ 24.8248 โ 1/90000 โ 2234232 โ 2234232 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 5 โ 30.03 โ 1/90000 โ 2702700 โ 2702700 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 6 โ 36.36966666666667 โ 1/90000 โ 3273270 โ 3273270 โ 3003 โ true โ <FixedShapeImage> โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโค
โ https://www.youtube.com/watchโฆ โ 7 โ 43.27656666666667 โ 1/90000 โ 3894891 โ 3894891 โ 3003 โ true โ <FixedShapeImage> โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโฏ
(Showing first 8 rows)
|
Working with daft.VideoFile
The following example demonstrates how to use daft.VideoFile to read a video file and extract metadata.
1
2
3
4
5
6
7
8
9
10
11
12 | import daft
from daft.functions import video_file, video_metadata, video_keyframes
df = (
daft.from_glob_path("hf://datasets/Eventual-Inc/sample-files/videos/*.mp4")
.with_column("file", video_file(daft.col("path")))
.with_column("metadata", video_metadata(daft.col("file")))
.with_column("keyframes", video_keyframes(daft.col("file")))
.select("path", "file", "size", "metadata", "keyframes")
)
df.show(3)
|
You can also decode frames from a VideoFile column with video_frames. This keeps one row per input video and returns the decoded frames as a list of structs. Use .explode("frames") if you want one row per frame.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 | import daft
from daft.functions import video_file, video_frames
df = (
daft.from_glob_path("hf://datasets/Eventual-Inc/sample-files/videos/*.mp4")
.with_column("video", video_file(daft.col("path"), verify=True))
.with_column(
"frames",
video_frames(
daft.col("video"),
start_time=0.0,
end_time=0.2,
),
)
.select("path", "frames")
)
df.show(3)
|