File Types
The File DataType provides first-class support for handling file data across local and remote storage, enabling seamless file operations in distributed environments.
File #
File(url: str, io_config: IOConfig | None = None, media_type: MediaType = unknown(), offset: int | None = None, length: int | None = None)
A file-like object for working with file contents in Daft.
This is an abstract base class that provides a standard file interface compatible with Python's file protocol.
The File object can be used with most Python libraries that accept file-like objects, and implements the standard read/seek/tell interface. Files are read-only in the current implementation.
Examples:
1 2 3 4 5 6 7 8 9 10 11 | |
Methods:
| Name | Description |
|---|---|
as_audio | Convert to AudioFile if this file contains audio data. |
as_image | Convert to ImageFile if this file contains image data. |
as_video | Convert to VideoFile if this file contains video data. |
is_audio | |
is_image | |
is_video | |
isatty | |
mime_type | Attempts to determine the MIME type of the file. |
open | |
readable | |
seekable | |
size | |
to_tempfile | Create a temporary file with the contents of this file. |
writable | |
Attributes:
| Name | Type | Description |
|---|---|---|
length | int | None | The byte length for range reads, or None for full-file reads. |
name | str | The filename (basename) extracted from the file path or URL. |
offset | int | None | The byte offset for range reads, or None for full-file reads. |
path | str | The full path or URL of the file. |
Source code in daft/file/file.py
55 56 57 58 59 60 61 62 63 | |
name #
name: str
The filename (basename) extracted from the file path or URL.
Returns:
| Name | Type | Description |
|---|---|---|
str | str | The filename without directory components. |
Example
import daft f = daft.File("s3://bucket/path/to/data.csv") f.name 'data.csv'
path #
path: str
The full path or URL of the file.
Returns:
| Name | Type | Description |
|---|---|---|
str | str | The file path or URL. |
Example
import daft f = daft.File("s3://bucket/path/to/data.csv") f.path 's3://bucket/path/to/data.csv'
as_audio #
as_audio() -> AudioFile
Convert to AudioFile if this file contains audio data.
Source code in daft/file/file.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | |
as_image #
as_image() -> ImageFile
Convert to ImageFile if this file contains image data.
Source code in daft/file/file.py
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
as_video #
as_video() -> VideoFile
Convert to VideoFile if this file contains video data.
Source code in daft/file/file.py
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | |
is_audio #
is_audio() -> bool
Source code in daft/file/file.py
169 170 171 172 173 | |
is_image #
is_image() -> bool
Source code in daft/file/file.py
175 176 177 178 179 | |
is_video #
is_video() -> bool
Source code in daft/file/file.py
163 164 165 166 167 | |
isatty #
isatty() -> bool
Source code in daft/file/file.py
80 81 | |
mime_type #
mime_type() -> str
Attempts to determine the MIME type of the file.
If the MIME type is undetectable, returns 'application/octet-stream'.
Source code in daft/file/file.py
126 127 128 129 130 131 132 133 | |
open #
open(buffer_size: int | None = None) -> PyDaftFile
Source code in daft/file/file.py
65 66 | |
readable #
readable() -> bool
Source code in daft/file/file.py
71 72 | |
seekable #
seekable() -> bool
Source code in daft/file/file.py
77 78 | |
size #
size() -> int
Source code in daft/file/file.py
123 124 | |
to_tempfile #
to_tempfile() -> _TemporaryFileWrapper[bytes]
Create a temporary file with the contents of this file.
Returns:
| Type | Description |
|---|---|
_TemporaryFileWrapper[bytes] | _TemporaryFileWrapper[bytes]: The temporary file object. |
The temporary file will be automatically deleted when the returned context manager is closed.
It's important to note that to_tempfile closes the original file object, so it CANNOT be used after calling this method.
Source code in daft/file/file.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
writable #
writable() -> bool
Source code in daft/file/file.py
74 75 | |
AudioFile #
AudioFile(url: str, io_config: IOConfig | None = None)
An audio-specific file interface that provides audio operations.
Methods:
| Name | Description |
|---|---|
metadata | Extract basic audio metadata from container headers. |
resample | Resample the audio file to the given sample rate. |
to_numpy | Convert the audio file to a numpy array. |
Source code in daft/file/audio.py
25 26 27 28 29 30 31 | |
metadata #
metadata() -> AudioMetadata
Extract basic audio metadata from container headers.
Returns:
| Name | Type | Description |
|---|---|---|
AudioMetadata | AudioMetadata | Audio metadata object containing: - sample_rate: int - The sample rate of the audio file - channels: int - The number of channels in the audio file - frames: int - The number of frames in the audio file - format: str - The format of the audio file - subtype: str | None - The subtype of the audio file |
Source code in daft/file/audio.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
resample #
resample(sample_rate: int) -> ndarray[Any, dtype[float64]]
Resample the audio file to the given sample rate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sample_rate | int | The new sample rate. | required |
Returns:
| Name | Type | Description |
|---|---|---|
AudioFile | ndarray[Any, dtype[float64]] | The resampled audio file. |
Source code in daft/file/audio.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
to_numpy #
to_numpy() -> ndarray[Any, dtype[float64]]
Convert the audio file to a numpy array.
Returns:
| Type | Description |
|---|---|
ndarray[Any, dtype[float64]] | np.ndarray[Any, Any]: The audio data as a numpy array. |
Source code in daft/file/audio.py
58 59 60 61 62 63 64 65 66 67 | |
ImageFile #
ImageFile(url: str, io_config: IOConfig | None = None)
An image-specific file interface that provides image operations.
Methods:
| Name | Description |
|---|---|
decode | Decode the image file into a PIL Image. |
metadata | Extract basic image metadata from file headers. |
Source code in daft/file/image.py
27 28 29 30 31 32 33 | |
decode #
decode(mode: str | None = None) -> Image
Decode the image file into a PIL Image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode | str | None | Optional image mode to convert to (e.g. "RGB", "RGBA", "L"). | None |
Returns:
| Type | Description |
|---|---|
Image | PIL.Image.Image: The decoded image. |
Source code in daft/file/image.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | |
metadata #
metadata() -> ImageMetadata
Extract basic image metadata from file headers.
PIL's Image.open() is lazy -- it reads only the file header to determine dimensions, format, and mode without decoding pixel data.
Returns:
| Name | Type | Description |
|---|---|---|
ImageMetadata | ImageMetadata | Image metadata containing width, height, format, mode. |
Source code in daft/file/image.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | |
VideoFile #
VideoFile(url: str, io_config: IOConfig | None = None)
A video-specific file interface that provides video operations.
Methods:
| Name | Description |
|---|---|
frames | Lazy iterator of all decoded frames with metadata within time range. |
keyframes | Lazy iterator of keyframes as PIL Images within time range. |
metadata | Extract basic video metadata from container headers. |
Source code in daft/file/video.py
29 30 31 32 | |
frames #
frames(start_time: float = 0, end_time: float | None = None, width: int | None = None, height: int | None = None, is_key_frame: bool | None = None) -> Iterator[VideoFrameData]
Lazy iterator of all decoded frames with metadata within time range.
Mirrors the per-frame schema of daft.read_video_frames().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_time | float | Start of the time range in seconds. Defaults to 0. | 0 |
end_time | float | None | End of the time range in seconds. Defaults to None (end of video). | None |
width | int | None | Optional target width for resizing frames. Must be provided with | None |
height | int | None | Optional target height for resizing frames. Must be provided with | None |
is_key_frame | bool | None | If True, emit only keyframes. If False, emit only non-keyframes. If None, emit all decoded frames. | None |
Yields:
| Type | Description |
|---|---|
VideoFrameData | VideoFrameData dicts with keys: frame_index, frame_time, frame_time_base, |
VideoFrameData | frame_pts, frame_dts, frame_duration, is_key_frame, data (PIL Image). |
Source code in daft/file/video.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
keyframes #
keyframes(start_time: float = 0, end_time: float | None = None) -> Iterator[Image]
Lazy iterator of keyframes as PIL Images within time range.
Source code in daft/file/video.py
99 100 101 102 | |
metadata #
metadata() -> VideoMetadata
Extract basic video metadata from container headers.
Returns:
| Name | Type | Description |
|---|---|---|
VideoMetadata | VideoMetadata | Video metadata object containing width, height, fps, frame_count, time_base, keyframe_pts, keyframe_indices |
Source code in daft/file/video.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |