DataTypes#

daft.DataType#

Daft provides simple DataTypes that are ubiquitous in many DataFrames such as numbers, strings and dates - all the way up to more complex types like tensors and images.

DataType #

DataType()

A Daft DataType defines the type of all the values in an Expression or DataFrame column.

Methods:

Name	Description
`decimal128`	Fixed-precision decimal.
`duration`	Duration DataType.
`embedding`	Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric `dtype` and each array has a fixed length of `size`.
`extension`
`file`	Create a File DataType: a type which refers to a file object.
`fixed_size_binary`	Create a FixedSizeBinary DataType: A fixed-size string of bytes.
`fixed_size_list`	Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type `dtype` and each list has length `size`.
`from_arrow_type`	Maps a PyArrow DataType to a Daft DataType.
`from_numpy_dtype`	Maps a Numpy datatype to a Daft DataType.
`from_sql`	Construct a Daft DataType from a SQL type.
`image`	Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.
`infer_from_object`	Infer Daft DataType from a Python object.
`infer_from_type`	Infer Daft DataType from a Python type.
`is_binary`	Check if this is a binary type.
`is_boolean`	Check if this is a boolean type.
`is_date`	Check if this is a date type.
`is_decimal128`	Check if this is a decimal128 type.
`is_duration`	Check if this is a duration type.
`is_embedding`	Check if this is an embedding type.
`is_extension`	Check if this is an extension type.
`is_file`	Check if this is a file type.
`is_fixed_shape_image`	Check if this is a fixed shape image type.
`is_fixed_shape_sparse_tensor`	Check if this is a fixed shape sparse tensor type.
`is_fixed_shape_tensor`	Check if this is a fixed shape tensor type.
`is_fixed_size_binary`	Check if this is a fixed size binary type.
`is_fixed_size_list`	Check if this is a fixed size list type.
`is_float16`	Check if this is a 16-bit float type.
`is_float32`	Check if this is a 32-bit float type.
`is_float64`	Check if this is a 64-bit float type.
`is_image`	Check if this is an image type.
`is_int16`	Check if this is a 16-bit integer type.
`is_int32`	Check if this is a 32-bit integer type.
`is_int64`	Check if this is a 64-bit integer type.
`is_int8`	Check if this is an 8-bit integer type.
`is_integer`	Check if this is an integer type.
`is_interval`	Check if this is an interval type.
`is_list`	Check if this is a list type.
`is_logical`	Check if this is a logical type.
`is_map`	Check if this is a map type.
`is_null`	Check if this is a null type.
`is_numeric`	Check if this is a numeric type.
`is_python`	Check if this is a python object type.
`is_sparse_tensor`	Check if this is a sparse tensor type.
`is_string`	Check if this is a string type.
`is_struct`	Check if this is a struct type.
`is_temporal`	Check if this is a temporal type.
`is_tensor`	Check if this is a tensor type.
`is_time`	Check if this is a time type.
`is_timestamp`	Check if this is a timestamp type.
`is_uint16`	Check if this is an unsigned 16-bit integer type.
`is_uint32`	Check if this is an unsigned 32-bit integer type.
`is_uint64`	Check if this is an unsigned 64-bit integer type.
`is_uint8`	Check if this is an unsigned 8-bit integer type.
`is_union`	Check if this is a union type.
`is_uuid`	Check if this is a UUID type.
`list`	Create a List DataType: Variable-length list, where each element in the list has type `dtype`.
`map`	Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.
`sparse_tensor`	Create a SparseTensor DataType: SparseTensor arrays implemented as 'COO Sparse Tensor' representation of n-dimensional arrays of data of the provided `dtype` as elements, each of the provided `shape`.
`struct`	Create a Struct DataType: a nested type which has names mapped to child types.
`tensor`	Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided `dtype` as elements, each of the provided `shape`.
`time`	Time DataType. Supported timeunits are "us", "ns".
`timestamp`	Timestamp DataType.
`to_arrow_dtype`
`union`	Create a Union DataType: a union of named fields, each with its own type.

Attributes:

Name	Type	Description
`binary`	`_CallableSingletonDataType`
`bool`	`_CallableSingletonDataType`
`date`	`_CallableSingletonDataType`
`dtype`	`DataType`	If the datatype contains an inner type, return the inner type, otherwise an attribute error is raised.
`fields`	`dict[str, DataType]`	If this is a struct type, return the fields, otherwise an attribute error is raised.
`float16`	`_CallableSingletonDataType`
`float32`	`_CallableSingletonDataType`
`float64`	`_CallableSingletonDataType`
`image_mode`	`ImageMode \| None`	If this is an image type, return the (optional) image mode, otherwise an attribute error is raised.
`int16`	`_CallableSingletonDataType`
`int32`	`_CallableSingletonDataType`
`int64`	`_CallableSingletonDataType`
`int8`	`_CallableSingletonDataType`
`interval`	`_CallableSingletonDataType`
`key_type`	`DataType`	If this is a map type, return the key type, otherwise an attribute error is raised.
`null`	`_CallableSingletonDataType`
`precision`	`int`	If this is a decimal type, return the precision, otherwise an attribute error is raised.
`python`	`_CallableSingletonDataType`
`scale`	`int`	If this is a decimal type, return the scale, otherwise an attribute error is raised.
`shape`	`tuple[int, ...]`	If this is a fixed shape type, return the shape, otherwise an attribute error is raised.
`size`	`int`	If this is a fixed size type, return the size, otherwise an attribute error is raised.
`string`	`_CallableSingletonDataType`
`timeunit`	`TimeUnit`	If this is a time or timestamp type, return the timeunit, otherwise an attribute error is raised.
`timezone`	`str \| None`	If this is a timestamp type, return the timezone, otherwise an attribute error is raised.
`type_ids`	`list[int]`	If this is a union type, return the type IDs, otherwise an attribute error is raised.
`uint16`	`_CallableSingletonDataType`
`uint32`	`_CallableSingletonDataType`
`uint64`	`_CallableSingletonDataType`
`uint8`	`_CallableSingletonDataType`
`union_fields`	`dict[str, DataType]`	If this is a union type, return the fields, otherwise an attribute error is raised.
`union_mode`	`UnionMode`	If this is a union type, return the union mode, otherwise an attribute error is raised.
`use_offset_indices`	`bool`	If this is a sparse tensor type, return whether the indices are stored as offsets, otherwise an attribute error is raised.
`uuid`	`_CallableSingletonDataType`
`value_type`	`DataType`	If this is a map type, return the value type, otherwise an attribute error is raised.

Source code in daft/datatype.py

def __init__(self) -> None:
    raise NotImplementedError(
        "We do not support creating a DataType via __init__ "
        "use a creator method like DataType.int32() or use DataType.from_arrow_type(pa_type)"
    )

binary #

binary: _CallableSingletonDataType

bool #

bool: _CallableSingletonDataType

date #

date: _CallableSingletonDataType

dtype #

dtype: DataType

If the datatype contains an inner type, return the inner type, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.list(daft.DataType.int64())
>>> assert dtype.dtype == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.dtype
... except AttributeError:
...     pass

fields #

fields: dict[str, DataType]

If this is a struct type, return the fields, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
>>> fields = dtype.fields
>>> assert fields["a"] == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.fields
... except AttributeError:
...     pass

float16 #

float16: _CallableSingletonDataType

float32 #

float32: _CallableSingletonDataType

float64 #

float64: _CallableSingletonDataType

image_mode #

image_mode: ImageMode | None

If this is an image type, return the (optional) image mode, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.image(mode="RGB")
>>> assert dtype.image_mode == daft.ImageMode.RGB
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.image_mode
... except AttributeError:
...     pass

int16 #

int16: _CallableSingletonDataType

int32 #

int32: _CallableSingletonDataType

int64 #

int64: _CallableSingletonDataType

int8 #

int8: _CallableSingletonDataType

interval #

interval: _CallableSingletonDataType

key_type #

key_type: DataType

If this is a map type, return the key type, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.key_type == daft.DataType.string()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.key_type
... except AttributeError:
...     pass

null #

null: _CallableSingletonDataType

precision #

precision: int

If this is a decimal type, return the precision, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.precision == 10
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.precision
... except AttributeError:
...     pass

python #

python: _CallableSingletonDataType

scale #

scale: int

If this is a decimal type, return the scale, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.scale == 2
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.scale
... except AttributeError:
...     pass

shape #

shape: tuple[int, ...]

If this is a fixed shape type, return the shape, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.shape == (2, 3)
>>> dtype = daft.DataType.tensor(daft.DataType.float32())
>>> try:
...     dtype.shape
... except AttributeError:
...     pass

size #

size: int

If this is a fixed size type, return the size, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.fixed_size_binary(size=10)
>>> assert dtype.size == 10
>>> dtype = daft.DataType.binary()
>>> try:
...     dtype.size
... except AttributeError:
...     pass

string #

string: _CallableSingletonDataType

timeunit #

timeunit: TimeUnit

If this is a time or timestamp type, return the timeunit, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.time(timeunit="ns")
>>> dtype.timeunit
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.timeunit
... except AttributeError:
...     pass

TimeUnit(ns)

timezone #

timezone: str | None

If this is a timestamp type, return the timezone, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns", timezone="UTC")
>>> assert dtype.timezone == "UTC"
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.timezone
... except AttributeError:
...     pass

type_ids #

type_ids: list[int]

If this is a union type, return the type IDs, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.union({"i": daft.DataType.int32(), "f": daft.DataType.float64()}, type_ids=[0, 1])
>>> assert dtype.type_ids == [0, 1]
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.type_ids
... except AttributeError:
...     pass

uint16 #

uint16: _CallableSingletonDataType

uint32 #

uint32: _CallableSingletonDataType

uint64 #

uint64: _CallableSingletonDataType

uint8 #

uint8: _CallableSingletonDataType

union_fields #

union_fields: dict[str, DataType]

If this is a union type, return the fields, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.union({"i": daft.DataType.int32(), "f": daft.DataType.float64()}, type_ids=[0, 1])
>>> fields = dtype.union_fields
>>> assert fields["i"] == daft.DataType.int32()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.union_fields
... except AttributeError:
...     pass

union_mode #

union_mode: UnionMode

If this is a union type, return the union mode, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.union({"i": daft.DataType.int32()}, type_ids=[0], mode="dense")
>>> assert str(dtype.union_mode) == "Dense"
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.union_mode
... except AttributeError:
...     pass

use_offset_indices #

use_offset_indices: bool

If this is a sparse tensor type, return whether the indices are stored as offsets, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), use_offset_indices=True)
>>> assert dtype.use_offset_indices
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.use_offset_indices
... except AttributeError:
...     pass

uuid #

uuid: _CallableSingletonDataType

value_type #

value_type: DataType

If this is a map type, return the value type, otherwise an attribute error is raised.

Examples:

>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.value_type == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.value_type
... except AttributeError:
...     pass

decimal128 #

decimal128(precision: int, scale: int) -> DataType

Fixed-precision decimal.

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def decimal128(cls, precision: int, scale: int) -> DataType:
    """Fixed-precision decimal."""
    return cls._from_pydatatype(PyDataType.decimal128(precision, scale))

duration #

duration(timeunit: TimeUnit | str) -> DataType

Duration DataType.

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def duration(cls, timeunit: TimeUnit | str) -> DataType:
    """Duration DataType."""
    if isinstance(timeunit, str):
        timeunit = TimeUnit.from_str(timeunit)
    return cls._from_pydatatype(PyDataType.duration(timeunit._timeunit))

embedding #

embedding(dtype: DataType, size: int) -> DataType

Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric dtype and each array has a fixed length of size.

Parameters:

Name	Type	Description	Default
`dtype`	`DataType`	DataType of each element in the list (must be numeric)	required
`size`	`int`	length of each list	required

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def embedding(cls, dtype: DataType, size: int) -> DataType:
    """Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a **numeric** ``dtype`` and each array has a fixed length of ``size``.

    Args:
        dtype: DataType of each element in the list (must be numeric)
        size: length of each list
    """
    if not isinstance(size, int) or size <= 0:
        raise ValueError("The size for a embedding must be a positive integer, but got: ", size)
    return cls._from_pydatatype(PyDataType.embedding(dtype._dtype, size))

extension #

extension(name: str, storage_dtype: DataType, metadata: str | None = None) -> DataType

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def extension(cls, name: str, storage_dtype: DataType, metadata: str | None = None) -> DataType:
    return cls._from_pydatatype(PyDataType.extension(name, storage_dtype._dtype, metadata))

file #

file(media_type: MediaType = unknown()) -> DataType

Create a File DataType: a type which refers to a file object.

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def file(cls, media_type: MediaType = MediaType.unknown()) -> DataType:
    """Create a File DataType: a type which refers to a file object."""
    return cls._from_pydatatype(PyDataType.file(media_type._media_type))

fixed_size_binary #

fixed_size_binary(size: int) -> DataType

Create a FixedSizeBinary DataType: A fixed-size string of bytes.

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def fixed_size_binary(cls, size: int) -> DataType:
    """Create a FixedSizeBinary DataType: A fixed-size string of bytes."""
    if not isinstance(size, int) or size <= 0:
        raise ValueError("The size for a fixed-size binary must be a positive integer, but got: ", size)
    return cls._from_pydatatype(PyDataType.fixed_size_binary(size))

fixed_size_list #

fixed_size_list(dtype: DataType, size: int) -> DataType

Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type dtype and each list has length size.

Parameters:

Name	Type	Description	Default
`dtype`	`DataType`	DataType of each element in the list	required
`size`	`int`	length of each list	required

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def fixed_size_list(cls, dtype: DataType, size: int) -> DataType:
    """Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type ``dtype`` and each list has length ``size``.

    Args:
        dtype: DataType of each element in the list
        size: length of each list
    """
    if not isinstance(size, int) or size <= 0:
        raise ValueError("The size for a fixed-size list must be a positive integer, but got: ", size)
    return cls._from_pydatatype(PyDataType.fixed_size_list(dtype._dtype, size))

from_arrow_type #

from_arrow_type(arrow_type: DataType, python_fallback: bool = True) -> DataType

Maps a PyArrow DataType to a Daft DataType.

Source code in daft/datatype.py

@classmethod
def from_arrow_type(cls, arrow_type: pa.lib.DataType, python_fallback: builtins.bool = True) -> DataType:
    """Maps a PyArrow DataType to a Daft DataType."""
    if pa.types.is_int8(arrow_type):
        return cls.int8()
    elif pa.types.is_int16(arrow_type):
        return cls.int16()
    elif pa.types.is_int32(arrow_type):
        return cls.int32()
    elif pa.types.is_int64(arrow_type):
        return cls.int64()
    elif pa.types.is_uint8(arrow_type):
        return cls.uint8()
    elif pa.types.is_uint16(arrow_type):
        return cls.uint16()
    elif pa.types.is_uint32(arrow_type):
        return cls.uint32()
    elif pa.types.is_uint64(arrow_type):
        return cls.uint64()
    elif pa.types.is_float16(arrow_type):
        return cls.float16()
    elif pa.types.is_float32(arrow_type):
        return cls.float32()
    elif pa.types.is_float64(arrow_type):
        return cls.float64()
    elif pa.types.is_string(arrow_type) or pa.types.is_large_string(arrow_type):
        return cls.string()
    elif pa.types.is_binary(arrow_type) or pa.types.is_large_binary(arrow_type):
        return cls.binary()
    elif pa.types.is_fixed_size_binary(arrow_type):
        return cls.fixed_size_binary(arrow_type.byte_width)
    elif pa.types.is_boolean(arrow_type):
        return cls.bool()
    elif pa.types.is_null(arrow_type):
        return cls.null()
    elif pa.types.is_decimal128(arrow_type):
        return cls.decimal128(arrow_type.precision, arrow_type.scale)
    elif pa.types.is_date32(arrow_type):
        return cls.date()
    elif pa.types.is_date64(arrow_type):
        return cls.timestamp(TimeUnit.ms())
    elif pa.types.is_time64(arrow_type):
        timeunit = TimeUnit.from_str(pa.type_for_alias(str(arrow_type)).unit)
        return cls.time(timeunit)
    elif pa.types.is_timestamp(arrow_type):
        timeunit = TimeUnit.from_str(arrow_type.unit)
        return cls.timestamp(timeunit=timeunit, timezone=arrow_type.tz)
    elif pa.types.is_duration(arrow_type):
        timeunit = TimeUnit.from_str(arrow_type.unit)
        return cls.duration(timeunit=timeunit)
    elif pa.types.is_list(arrow_type) or pa.types.is_large_list(arrow_type):
        assert isinstance(arrow_type, (pa.ListType, pa.LargeListType))
        field = arrow_type.value_field
        return cls.list(cls.from_arrow_type(field.type, python_fallback))
    elif pa.types.is_fixed_size_list(arrow_type):
        assert isinstance(arrow_type, pa.FixedSizeListType)
        field = arrow_type.value_field
        return cls.fixed_size_list(cls.from_arrow_type(field.type, python_fallback), arrow_type.list_size)
    elif pa.types.is_struct(arrow_type):
        assert isinstance(arrow_type, pa.StructType)
        fields = [arrow_type[i] for i in range(arrow_type.num_fields)]
        return cls.struct({field.name: cls.from_arrow_type(field.type, python_fallback) for field in fields})
    elif pa.types.is_interval(arrow_type):
        return cls.interval()
    elif pa.types.is_map(arrow_type):
        assert isinstance(arrow_type, pa.MapType)
        return cls.map(
            key_type=cls.from_arrow_type(arrow_type.key_type, python_fallback),
            value_type=cls.from_arrow_type(arrow_type.item_type, python_fallback),
        )
    elif isinstance(arrow_type, pa.FixedShapeTensorType):
        scalar_dtype = cls.from_arrow_type(arrow_type.value_type, python_fallback)
        return cls.tensor(scalar_dtype, tuple(arrow_type.shape))
    elif pa.types.is_union(arrow_type):
        assert isinstance(arrow_type, pa.UnionType)
        mode = "dense" if arrow_type.mode == "dense" else "sparse"
        field_dict = {
            arrow_type.field(i).name: cls.from_arrow_type(arrow_type.field(i).type, python_fallback)
            for i in range(arrow_type.num_fields)
        }
        type_ids = list(arrow_type.type_codes)
        return cls.union(field_dict, type_ids, mode)
    # Only check for PyExtensionType if pyarrow version is < 21.0.0
    if hasattr(pa, "PyExtensionType") and isinstance(arrow_type, getattr(pa, "PyExtensionType")):
        # TODO(Clark): Add a native cross-lang extension type representation for PyExtensionTypes.
        raise ValueError(
            "pyarrow extension types that subclass pa.PyExtensionType can't be used in Daft, since they can't be "
            f"used in non-Python Arrow implementations and Daft uses the Rust Arrow implementation: {arrow_type}"
        )
    elif isinstance(arrow_type, pa.BaseExtensionType):
        name = arrow_type.extension_name

        if (get_or_create_runner().name == "ray") and (
            type(arrow_type).__reduce__ == pa.BaseExtensionType.__reduce__
        ):
            raise ValueError(
                f"You are attempting to use a Extension Type: {arrow_type} with the default pyarrow `__reduce__` which breaks pickling for Extensions"
                "To fix this, implement your own `__reduce__` on your extension type"
                "For more details see this issue: "
                "https://github.com/apache/arrow/issues/35599"
            )
        try:
            metadata = arrow_type.__arrow_ext_serialize__().decode()
        except AttributeError:
            metadata = None

        if name == "daft.super_extension":
            assert metadata is not None
            return cls._from_pydatatype(PyDataType.from_json(metadata))
        else:
            return cls.extension(
                name,
                cls.from_arrow_type(arrow_type.storage_type, python_fallback),
                metadata,
            )
    else:
        if python_fallback:
            # Fall back to a Python object type.
            # TODO(Clark): Add native support for remaining Arrow types.
            return cls.python()
        else:
            raise TypeError(f"Unsupported Arrow type: {arrow_type}")

from_numpy_dtype #

from_numpy_dtype(np_type: dtype[Any]) -> DataType

Maps a Numpy datatype to a Daft DataType.

Source code in daft/datatype.py

@classmethod
def from_numpy_dtype(cls, np_type: np.dtype[Any]) -> DataType:
    """Maps a Numpy datatype to a Daft DataType."""
    arrow_type = pa.from_numpy_dtype(np_type)
    return cls.from_arrow_type(arrow_type)

from_sql #

from_sql(sql_type: str) -> DataType

Construct a Daft DataType from a SQL type.

Source code in daft/datatype.py

@classmethod
def from_sql(cls, sql_type: str) -> DataType:
    """Construct a Daft DataType from a SQL type."""
    return cls._from_pydatatype(sql_datatype(sql_type))

image #

image(mode: str | ImageMode | None = None, height: int | None = None, width: int | None = None) -> DataType

Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

Each image in the array has an :class:~daft.ImageMode, which describes the pixel dtype (e.g. uint8) and the number of image channels/bands and their logical interpretation (e.g. RGB).

If the height, width, and mode are the same for all images in the array, specifying them when constructing this type is advised, since that will allow Daft to create a more optimized physical representation of the image array.

If the height, width, or mode may vary across images in the array, leaving these fields unspecified when creating this type will cause Daft to represent this image array as a heterogeneous collection of images, where each image can have a different mode, height, and width. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:

Name	Type	Description	Default
`mode`	`str \| ImageMode \| None`	The mode of the image. By default, this is inferred from the underlying data. If height and width are specified, the mode must also be specified.	`None`
`height`	`int \| None`	The height of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.	`None`
`width`	`int \| None`	The width of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.	`None`

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def image(
    cls, mode: str | ImageMode | None = None, height: int | None = None, width: int | None = None
) -> DataType:
    """Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

    Each image in the array has an :class:`~daft.ImageMode`, which describes the pixel dtype (e.g. uint8) and
    the number of image channels/bands and their logical interpretation (e.g. RGB).

    If the height, width, and mode are the same for all images in the array, specifying them when constructing
    this type is advised, since that will allow Daft to create a more optimized physical representation
    of the image array.

    If the height, width, or mode may vary across images in the array, leaving these fields unspecified when
    creating this type will cause Daft to represent this image array as a heterogeneous collection of images,
    where each image can have a different mode, height, and width. This is much more flexible, but will result
    in a less compact representation and may be make some operations less efficient.

    Args:
        mode: The mode of the image. By default, this is inferred from the underlying data.
            If height and width are specified, the mode must also be specified.
        height: The height of the image. By default, this is inferred from the underlying data.
            Must be specified if the width is specified.
        width: The width of the image. By default, this is inferred from the underlying data.
            Must be specified if the width is specified.
    """
    if isinstance(mode, str):
        mode = ImageMode.from_mode_string(mode.upper())
    if mode is not None and not isinstance(mode, ImageMode):
        raise ValueError(f"mode must be a string or ImageMode variant, but got: {mode}")
    if height is not None and width is not None:
        if not isinstance(height, int) or height <= 0:
            raise ValueError("Image height must be a positive integer, but got: ", height)
        if not isinstance(width, int) or width <= 0:
            raise ValueError("Image width must be a positive integer, but got: ", width)
    elif height is not None or width is not None:
        raise ValueError(
            f"Image height and width must either both be specified, or both not be specified, but got height={height}, width={width}"
        )
    return cls._from_pydatatype(PyDataType.image(mode, height, width))

infer_from_object #

infer_from_object(obj: Any) -> DataType

Infer Daft DataType from a Python object.

Source code in daft/datatype.py

@classmethod
def infer_from_object(cls, obj: Any) -> DataType:
    """Infer Daft DataType from a Python object."""
    from daft.series import Series

    s = Series.from_pylist([obj])
    return s.datatype()

infer_from_type #

infer_from_type(t: type | GenericAlias | UnionType) -> DataType

Infer Daft DataType from a Python type.

Source code in daft/datatype.py

@classmethod
def infer_from_type(cls, t: type | GenericAlias | UnionType) -> DataType:
    """Infer Daft DataType from a Python type."""
    # NOTE: Make sure this matches the logic in `Literal::from_pyobj` in Rust
    # NOTE: The base type for Union is hidden, so it requires special handling
    # TODO: TypeForm would cover everything: https://peps.python.org/pep-0747/

    assert isinstance(t, (type, GenericAlias, UnionType)) or typing.get_origin(t) is typing.Union, (
        f"Input to DataType.infer_from_type must be a type, found {t} (type {type(t)})"
    )

    import datetime
    import decimal
    import importlib
    from typing import is_typeddict

    import daft.file
    import daft.series

    origin_or_none = typing.get_origin(t)
    origin: type = origin_or_none if origin_or_none is not None else t  # type: ignore
    args = typing.get_args(t)

    def check_type(type_or_path: type | str) -> bool:
        """Check if `origin` is a subclass of `type_or_path`.

        Pass in a string value for `type_or_path` for types from optional dependencies.
        """
        if isinstance(type_or_path, type):
            type_obj = type_or_path
        elif isinstance(type_or_path, str):
            module_name, type_name = type_or_path.rsplit(".", 1)
            try:
                module = importlib.import_module(module_name)
                type_obj = getattr(module, type_name)
            except (ImportError, AttributeError):
                return False
        else:
            raise ValueError("`type_or_path` must be type or string")

        return issubclass(origin, type_obj)

    # NOTE: This has to be first to handle the special case of typing.Union
    if origin is typing.Union or check_type(UnionType):  # type: ignore[comparison-overlap]
        inner_types = set(DataType.infer_from_type(arg) for arg in args)
        if len(inner_types) == 1:
            return inner_types.pop()
        elif len(inner_types) == 2 and cls.null() in inner_types:
            return inner_types.difference([cls.null()]).pop()
        else:
            return cls.python()
    elif check_type(type(None)):
        return cls.null()
    elif check_type(bool):
        return cls.bool()
    elif check_type(str):
        return cls.string()
    elif check_type(bytes):
        return cls.binary()
    elif check_type(int):
        return cls.int64()
    elif check_type(float):
        return cls.float64()
    elif check_type(datetime.datetime):
        # cannot derive timezone from type
        return cls.timestamp(TimeUnit.us(), timezone=None)
    elif check_type(datetime.date):
        return cls.date()
    elif check_type(datetime.time):
        return cls.time(TimeUnit.us())
    elif check_type(datetime.timedelta):
        return cls.duration(TimeUnit.us())
    elif check_type(daft.file.VideoFile):
        return cls.file(MediaType.video())
    elif check_type(daft.file.AudioFile):
        return cls.file(MediaType.audio())
    elif check_type(daft.file.ImageFile):
        return cls.file(MediaType.image())
    elif check_type(daft.file.File):
        return cls.file(MediaType.unknown())
    elif check_type(list):
        if len(args) == 0:
            inner_dtype = cls.python()
        elif len(args) == 1:
            inner_dtype = cls.infer_from_type(args[0])
        else:
            raise TypeError(f"Python list type cannot have more than one type argument, found: {t}")

        return cls.list(inner_dtype)
    elif is_typeddict(origin):
        field_types = typing.get_type_hints(origin)
        if any(not isinstance(t, str) for t in field_types):
            warnings.warn(
                f"Expected all TypedDict keys to be strings, found: {field_types}. Defaulting to Map[Python, Python] type."
            )
            return cls.map(cls.python(), cls.python())

        field_dtypes = {k: cls.infer_from_type(v) for k, v in field_types.items()}
        return cls.struct(field_dtypes)
    elif check_type(dict):
        if len(args) == 0:
            key_dtype = cls.python()
            value_dtype = cls.python()
        elif len(args) == 2:
            key_dtype = cls.infer_from_type(args[0])
            value_dtype = cls.infer_from_type(args[1])
        else:
            raise TypeError(f"Python dict type must have exactly two type arguments, found: {t}")

        # dict type can also be turned into struct, but we are unable to derive the struct keys from the type alone
        return cls.map(key_dtype, value_dtype)
    elif check_type(tuple):
        if len(args) == 0:
            # tuple -> List[Python]
            return cls.list(cls.python())
        if len(args) == 2 and args[1] is Ellipsis:
            # tuple[inner_type, ...] -> List[inner_type]
            inner_dtype = cls.infer_from_type(args[0])
            return cls.list(inner_dtype)
        else:
            # tuple[type0, type1, ...] -> Struct[_0: type0, _1: type1, ...]
            field_dtypes = {f"_{i}": cls.infer_from_type(arg) for i, arg in enumerate(args)}
            return cls.struct(field_dtypes)
    elif check_type(decimal.Decimal):
        warnings.warn(
            "Cannot derive precision and scale from decimal.Decimal type, defaulting to DataType.python()"
        )
        return cls.python()
    elif check_type(daft.series.Series):
        warnings.warn(
            "Cannot derive inner type from daft.Series type, defaulting to DataType.python() for Series inner type"
        )
        return cls.list(cls.python())
    elif check_type("pydantic.BaseModel"):
        import pydantic

        if not (parse("2.0.0") <= parse(pydantic.__version__) < parse("3.0.0")):
            raise ValueError(
                f"Daft only supports DataType inference for Pydantic V2, found Pydantic V{pydantic.__version__}"
            )

        model: pydantic.BaseModel = origin

        serialize_by_alias = model.model_config.get("serialize_by_alias", False)

        field_dtypes = {}
        for attr_name, field_info in model.model_fields.items():
            if serialize_by_alias:
                if field_info.serialization_alias is not None:
                    serialized_name = field_info.serialization_alias
                elif field_info.alias is not None:
                    serialized_name = field_info.alias
                else:
                    serialized_name = attr_name
            else:
                serialized_name = attr_name
            field_dtypes[serialized_name] = cls.infer_from_type(field_info.annotation)

        for attr_name, field_info in model.model_computed_fields.items():
            if serialize_by_alias:
                if field_info.alias is not None:
                    serialized_name = field_info.alias
                else:
                    serialized_name = attr_name
            else:
                serialized_name = attr_name
            field_dtypes[serialized_name] = cls.infer_from_type(field_info.return_type)

        return cls.struct(field_dtypes)
    elif check_type("PIL.Image.Image"):
        return cls.image()
    elif check_type("jaxtyping.AbstractArray"):
        return cls._infer_from_jaxtyping(origin)
    elif check_type("numpy.ndarray"):
        inner_dtype = None
        if len(args) == 2:
            # https://numpy.org/doc/2.3/reference/typing.html#numpy.typing.NDArray
            numpy_dtype_args = typing.get_args(args[1])
            if len(numpy_dtype_args) == 1:
                inner_dtype = cls.infer_from_type(numpy_dtype_args[0])

        if inner_dtype is None:
            warnings.warn(
                f"Cannot derive inner type from {t}, defaulting to DataType.python() for ndarray inner type"
            )
            inner_dtype = cls.python()

        return cls.tensor(inner_dtype)

    elif check_type("torch.FloatTensor"):
        return cls.tensor(cls.float32())
    elif check_type("torch.DoubleTensor"):
        return cls.tensor(cls.float64())
    elif check_type("torch.ByteTensor"):
        return cls.tensor(cls.uint8())
    elif check_type("torch.CharTensor"):
        return cls.tensor(cls.int8())
    elif check_type("torch.ShortTensor"):
        return cls.tensor(cls.int16())
    elif check_type("torch.IntTensor"):
        return cls.tensor(cls.int32())
    elif check_type("torch.LongTensor"):
        return cls.tensor(cls.int64())
    elif check_type("torch.BoolTensor"):
        return cls.tensor(cls.bool())
    elif (
        check_type("torch.Tensor")
        or check_type("tensorflow.Tensor")
        or check_type("jax.Array")
        or check_type("cupy.ndarray")
    ):
        return cls.tensor(cls.python())
    elif check_type("numpy.generic"):
        return cls._infer_from_numpy_scalar_dtype(origin)
    elif check_type("pandas.Series"):
        warnings.warn(
            "Cannot derive inner type from pandas.Series type, defaulting to DataType.python() for Series inner type"
        )
        return cls.list(cls.python())
    else:
        return cls.python()

is_binary #

is_binary() -> bool

Check if this is a binary type.

Examples:

>>> import daft
>>> dtype = daft.DataType.binary  # or daft.DataType.binary()
>>> assert dtype.is_binary()

Source code in daft/datatype.py

def is_binary(self) -> builtins.bool:
    """Check if this is a binary type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.binary  # or daft.DataType.binary()
        >>> assert dtype.is_binary()
    """
    return self._dtype.is_binary()

is_boolean #

is_boolean() -> bool

Check if this is a boolean type.

Examples:

>>> import daft
>>> dtype = daft.DataType.bool  # or daft.DataType.bool()
>>> assert dtype.is_boolean()

Source code in daft/datatype.py

def is_boolean(self) -> builtins.bool:
    """Check if this is a boolean type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.bool  # or daft.DataType.bool()
        >>> assert dtype.is_boolean()
    """
    return self._dtype.is_boolean()

is_date #

is_date() -> bool

Check if this is a date type.

Examples:

>>> import daft
>>> dtype = daft.DataType.date  # or daft.DataType.date()
>>> assert dtype.is_date()

Source code in daft/datatype.py

def is_date(self) -> builtins.bool:
    """Check if this is a date type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.date  # or daft.DataType.date()
        >>> assert dtype.is_date()
    """
    return self._dtype.is_date()

is_decimal128 #

is_decimal128() -> bool

Check if this is a decimal128 type.

Examples:

>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.is_decimal128()

Source code in daft/datatype.py

def is_decimal128(self) -> builtins.bool:
    """Check if this is a decimal128 type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.decimal128(precision=10, scale=2)
        >>> assert dtype.is_decimal128()
    """
    return self._dtype.is_decimal128()

is_duration #

is_duration() -> bool

Check if this is a duration type.

Examples:

>>> import daft
>>> dtype = daft.DataType.duration(timeunit="ns")
>>> assert dtype.is_duration()

Source code in daft/datatype.py

def is_duration(self) -> builtins.bool:
    """Check if this is a duration type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.duration(timeunit="ns")
        >>> assert dtype.is_duration()
    """
    return self._dtype.is_duration()

is_embedding #

is_embedding() -> bool

Check if this is an embedding type.

Examples:

>>> import daft
>>> dtype = daft.DataType.embedding(daft.DataType.float32(), 512)
>>> assert dtype.is_embedding()

Source code in daft/datatype.py

def is_embedding(self) -> builtins.bool:
    """Check if this is an embedding type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.embedding(daft.DataType.float32(), 512)
        >>> assert dtype.is_embedding()
    """
    return self._dtype.is_embedding()

is_extension #

is_extension() -> bool

Check if this is an extension type.

Examples:

>>> import daft
>>> dtype = daft.DataType.extension("custom", daft.DataType.int64())
>>> assert dtype.is_extension()

Source code in daft/datatype.py

def is_extension(self) -> builtins.bool:
    """Check if this is an extension type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.extension("custom", daft.DataType.int64())
        >>> assert dtype.is_extension()
    """
    return self._dtype.is_extension()

is_file #

is_file() -> bool

Check if this is a file type.

Examples:

>>> import daft
>>> dtype = daft.DataType.file()
>>> assert dtype.is_file()

Source code in daft/datatype.py

def is_file(self) -> builtins.bool:
    """Check if this is a file type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.file()
        >>> assert dtype.is_file()
    """
    return self._dtype.is_file()

is_fixed_shape_image #

is_fixed_shape_image() -> bool

Check if this is a fixed shape image type.

Examples:

>>> import daft
>>> dtype = daft.DataType.image(mode="RGB", height=224, width=224)
>>> assert dtype.is_fixed_shape_image()

Source code in daft/datatype.py

def is_fixed_shape_image(self) -> builtins.bool:
    """Check if this is a fixed shape image type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.image(mode="RGB", height=224, width=224)
        >>> assert dtype.is_fixed_shape_image()
    """
    return self._dtype.is_fixed_shape_image()

is_fixed_shape_sparse_tensor #

is_fixed_shape_sparse_tensor() -> bool

Check if this is a fixed shape sparse tensor type.

Examples:

>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.is_fixed_shape_sparse_tensor()

Source code in daft/datatype.py

def is_fixed_shape_sparse_tensor(self) -> builtins.bool:
    """Check if this is a fixed shape sparse tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), shape=(2, 3))
        >>> assert dtype.is_fixed_shape_sparse_tensor()
    """
    return self._dtype.is_fixed_shape_sparse_tensor()

is_fixed_shape_tensor #

is_fixed_shape_tensor() -> bool

Check if this is a fixed shape tensor type.

Examples:

>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.is_fixed_shape_tensor()

Source code in daft/datatype.py

def is_fixed_shape_tensor(self) -> builtins.bool:
    """Check if this is a fixed shape tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
        >>> assert dtype.is_fixed_shape_tensor()
    """
    return self._dtype.is_fixed_shape_tensor()

is_fixed_size_binary #

is_fixed_size_binary() -> bool

Check if this is a fixed size binary type.

Examples:

>>> import daft
>>> dtype = daft.DataType.fixed_size_binary(size=10)
>>> assert dtype.is_fixed_size_binary()

Source code in daft/datatype.py

def is_fixed_size_binary(self) -> builtins.bool:
    """Check if this is a fixed size binary type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.fixed_size_binary(size=10)
        >>> assert dtype.is_fixed_size_binary()
    """
    return self._dtype.is_fixed_size_binary()

is_fixed_size_list #

is_fixed_size_list() -> bool

Check if this is a fixed size list type.

Examples:

>>> import daft
>>> dtype = daft.DataType.fixed_size_list(daft.DataType.int64(), size=10)
>>> assert dtype.is_fixed_size_list()

Source code in daft/datatype.py

def is_fixed_size_list(self) -> builtins.bool:
    """Check if this is a fixed size list type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.fixed_size_list(daft.DataType.int64(), size=10)
        >>> assert dtype.is_fixed_size_list()
    """
    return self._dtype.is_fixed_size_list()

is_float16 #

is_float16() -> bool

Check if this is a 16-bit float type.

Source code in daft/datatype.py

def is_float16(self) -> builtins.bool:
    """Check if this is a 16-bit float type."""
    return self._dtype.is_float16()

is_float32 #

is_float32() -> bool

Check if this is a 32-bit float type.

Examples:

>>> import daft
>>> dtype = daft.DataType.float32  # or daft.DataType.float32()
>>> assert dtype.is_float32()

Source code in daft/datatype.py

def is_float32(self) -> builtins.bool:
    """Check if this is a 32-bit float type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.float32  # or daft.DataType.float32()
        >>> assert dtype.is_float32()
    """
    return self._dtype.is_float32()

is_float64 #

is_float64() -> bool

Check if this is a 64-bit float type.

Examples:

>>> import daft
>>> dtype = daft.DataType.float64  # or daft.DataType.float64()
>>> assert dtype.is_float64()

Source code in daft/datatype.py

def is_float64(self) -> builtins.bool:
    """Check if this is a 64-bit float type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.float64  # or daft.DataType.float64()
        >>> assert dtype.is_float64()
    """
    return self._dtype.is_float64()

is_image #

is_image() -> bool

Check if this is an image type.

Examples:

>>> import daft
>>> dtype = daft.DataType.image()
>>> assert dtype.is_image()

Source code in daft/datatype.py

def is_image(self) -> builtins.bool:
    """Check if this is an image type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.image()
        >>> assert dtype.is_image()
    """
    return self._dtype.is_image()

is_int16 #

is_int16() -> bool

Check if this is a 16-bit integer type.

Examples:

>>> import daft
>>> dtype = daft.DataType.int16  # or daft.DataType.int16()
>>> assert dtype.is_int16()

Source code in daft/datatype.py

def is_int16(self) -> builtins.bool:
    """Check if this is a 16-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int16  # or daft.DataType.int16()
        >>> assert dtype.is_int16()
    """
    return self._dtype.is_int16()

is_int32 #

is_int32() -> bool

Check if this is a 32-bit integer type.

Examples:

>>> import daft
>>> dtype = daft.DataType.int32  # or daft.DataType.int32()
>>> assert dtype.is_int32()

Source code in daft/datatype.py

def is_int32(self) -> builtins.bool:
    """Check if this is a 32-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int32  # or daft.DataType.int32()
        >>> assert dtype.is_int32()
    """
    return self._dtype.is_int32()

is_int64 #

is_int64() -> bool

Check if this is a 64-bit integer type.

Examples:

>>> import daft
>>> dtype = daft.DataType.int64  # or daft.DataType.int64()
>>> assert dtype.is_int64()

Source code in daft/datatype.py

def is_int64(self) -> builtins.bool:
    """Check if this is a 64-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int64  # or daft.DataType.int64()
        >>> assert dtype.is_int64()
    """
    return self._dtype.is_int64()

is_int8 #

is_int8() -> bool

Check if this is an 8-bit integer type.

Examples:

>>> import daft
>>> dtype = daft.DataType.int8  # or daft.DataType.int8()
>>> assert dtype.is_int8()

Source code in daft/datatype.py

def is_int8(self) -> builtins.bool:
    """Check if this is an 8-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int8  # or daft.DataType.int8()
        >>> assert dtype.is_int8()
    """
    return self._dtype.is_int8()

is_integer #

is_integer() -> bool

Check if this is an integer type.

Examples:

>>> import daft
>>> dtype = daft.DataType.int64  # or daft.DataType.int64()
>>> assert dtype.is_integer()

Source code in daft/datatype.py

def is_integer(self) -> builtins.bool:
    """Check if this is an integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int64  # or daft.DataType.int64()
        >>> assert dtype.is_integer()
    """
    return self._dtype.is_integer()

is_interval #

is_interval() -> bool

Check if this is an interval type.

Examples:

>>> import daft
>>> dtype = daft.DataType.interval  # or daft.DataType.interval()
>>> assert dtype.is_interval()

Source code in daft/datatype.py

def is_interval(self) -> builtins.bool:
    """Check if this is an interval type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.interval  # or daft.DataType.interval()
        >>> assert dtype.is_interval()
    """
    return self._dtype.is_interval()

is_list #

is_list() -> bool

Check if this is a list type.

Examples:

>>> import daft
>>> dtype = daft.DataType.list(daft.DataType.int64())
>>> assert dtype.is_list()

Source code in daft/datatype.py

def is_list(self) -> builtins.bool:
    """Check if this is a list type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.list(daft.DataType.int64())
        >>> assert dtype.is_list()
    """
    return self._dtype.is_list()

is_logical #

is_logical() -> bool

Check if this is a logical type.

Examples:

>>> import daft
>>> dtype = daft.DataType.bool  # or daft.DataType.bool()
>>> assert not dtype.is_logical()

Source code in daft/datatype.py

def is_logical(self) -> builtins.bool:
    """Check if this is a logical type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.bool  # or daft.DataType.bool()
        >>> assert not dtype.is_logical()
    """
    return self._dtype.is_logical()

is_map #

is_map() -> bool

Check if this is a map type.

Examples:

>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.is_map()

Source code in daft/datatype.py

def is_map(self) -> builtins.bool:
    """Check if this is a map type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
        >>> assert dtype.is_map()
    """
    return self._dtype.is_map()

is_null #

is_null() -> bool

Check if this is a null type.

Examples:

>>> import daft
>>> dtype = daft.DataType.null  # or daft.DataType.null()
>>> dtype.is_null()

True

Source code in daft/datatype.py

def is_null(self) -> builtins.bool:
    """Check if this is a null type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.null  # or daft.DataType.null()
        >>> dtype.is_null()
        True
    """
    return self._dtype.is_null()

is_numeric #

is_numeric() -> bool

Check if this is a numeric type.

Examples:

>>> import daft
>>> dtype = daft.DataType.float64  # or daft.DataType.float64()
>>> assert dtype.is_numeric()

Source code in daft/datatype.py

def is_numeric(self) -> builtins.bool:
    """Check if this is a numeric type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.float64  # or daft.DataType.float64()
        >>> assert dtype.is_numeric()
    """
    return self._dtype.is_numeric()

is_python #

is_python() -> bool

Check if this is a python object type.

Examples:

>>> import daft
>>> dtype = daft.DataType.python  # or daft.DataType.python()
>>> assert dtype.is_python()

Source code in daft/datatype.py

def is_python(self) -> builtins.bool:
    """Check if this is a python object type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.python  # or daft.DataType.python()
        >>> assert dtype.is_python()
    """
    return self._dtype.is_python()

is_sparse_tensor #

is_sparse_tensor() -> bool

Check if this is a sparse tensor type.

Examples:

>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32())
>>> assert dtype.is_sparse_tensor()

Source code in daft/datatype.py

def is_sparse_tensor(self) -> builtins.bool:
    """Check if this is a sparse tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32())
        >>> assert dtype.is_sparse_tensor()
    """
    return self._dtype.is_sparse_tensor()

is_string #

is_string() -> bool

Check if this is a string type.

Examples:

>>> import daft
>>> dtype = daft.DataType.string  # or daft.DataType.string()
>>> assert dtype.is_string()

Source code in daft/datatype.py

def is_string(self) -> builtins.bool:
    """Check if this is a string type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.string  # or daft.DataType.string()
        >>> assert dtype.is_string()
    """
    return self._dtype.is_string()

is_struct #

is_struct() -> bool

Check if this is a struct type.

Examples:

>>> import daft
>>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
>>> assert dtype.is_struct()

Source code in daft/datatype.py

def is_struct(self) -> builtins.bool:
    """Check if this is a struct type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
        >>> assert dtype.is_struct()
    """
    return self._dtype.is_struct()

is_temporal #

is_temporal() -> bool

Check if this is a temporal type.

Examples:

>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns")
>>> assert dtype.is_temporal()

Source code in daft/datatype.py

def is_temporal(self) -> builtins.bool:
    """Check if this is a temporal type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.timestamp(timeunit="ns")
        >>> assert dtype.is_temporal()
    """
    return self._dtype.is_temporal()

is_tensor #

is_tensor() -> bool

Check if this is a tensor type.

Examples:

>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32())
>>> assert dtype.is_tensor()

Source code in daft/datatype.py

def is_tensor(self) -> builtins.bool:
    """Check if this is a tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.tensor(daft.DataType.float32())
        >>> assert dtype.is_tensor()
    """
    return self._dtype.is_tensor()

is_time #

is_time() -> bool

Check if this is a time type.

Examples:

>>> import daft
>>> dtype = daft.DataType.time(timeunit="ns")
>>> assert dtype.is_time()

Source code in daft/datatype.py

def is_time(self) -> builtins.bool:
    """Check if this is a time type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.time(timeunit="ns")
        >>> assert dtype.is_time()
    """
    return self._dtype.is_time()

is_timestamp #

is_timestamp() -> bool

Check if this is a timestamp type.

Examples:

>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns")
>>> assert dtype.is_timestamp()

Source code in daft/datatype.py

def is_timestamp(self) -> builtins.bool:
    """Check if this is a timestamp type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.timestamp(timeunit="ns")
        >>> assert dtype.is_timestamp()
    """
    return self._dtype.is_timestamp()

is_uint16 #

is_uint16() -> bool

Check if this is an unsigned 16-bit integer type.

Examples:

>>> import daft
>>> dtype = daft.DataType.uint16  # or daft.DataType.uint16()
>>> assert dtype.is_uint16()

Source code in daft/datatype.py

def is_uint16(self) -> builtins.bool:
    """Check if this is an unsigned 16-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint16  # or daft.DataType.uint16()
        >>> assert dtype.is_uint16()
    """
    return self._dtype.is_uint16()

is_uint32 #

is_uint32() -> bool

Check if this is an unsigned 32-bit integer type.

Examples:

>>> import daft
>>> dtype = daft.DataType.uint32  # or daft.DataType.uint32()
>>> assert dtype.is_uint32()

Source code in daft/datatype.py

def is_uint32(self) -> builtins.bool:
    """Check if this is an unsigned 32-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint32  # or daft.DataType.uint32()
        >>> assert dtype.is_uint32()
    """
    return self._dtype.is_uint32()

is_uint64 #

is_uint64() -> bool

Check if this is an unsigned 64-bit integer type.

Examples:

>>> import daft
>>> dtype = daft.DataType.uint64  # or daft.DataType.uint64()
>>> assert dtype.is_uint64()

Source code in daft/datatype.py

def is_uint64(self) -> builtins.bool:
    """Check if this is an unsigned 64-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint64  # or daft.DataType.uint64()
        >>> assert dtype.is_uint64()
    """
    return self._dtype.is_uint64()

is_uint8 #

is_uint8() -> bool

Check if this is an unsigned 8-bit integer type.

Examples:

>>> import daft
>>> dtype = daft.DataType.uint8  # or daft.DataType.uint8()
>>> assert dtype.is_uint8()

Source code in daft/datatype.py

def is_uint8(self) -> builtins.bool:
    """Check if this is an unsigned 8-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint8  # or daft.DataType.uint8()
        >>> assert dtype.is_uint8()
    """
    return self._dtype.is_uint8()

is_union #

is_union() -> bool

Check if this is a union type.

Examples:

>>> import daft
>>> dtype = daft.DataType.union({"i": daft.DataType.int32(), "f": daft.DataType.float64()}, type_ids=[0, 1])
>>> assert dtype.is_union()

Source code in daft/datatype.py

def is_union(self) -> builtins.bool:
    """Check if this is a union type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.union({"i": daft.DataType.int32(), "f": daft.DataType.float64()}, type_ids=[0, 1])
        >>> assert dtype.is_union()
    """
    return self._dtype.is_union()

is_uuid #

is_uuid() -> bool

Check if this is a UUID type.

Examples:

>>> import daft
>>> dtype = daft.DataType.uuid()
>>> assert dtype.is_uuid()

Source code in daft/datatype.py

def is_uuid(self) -> builtins.bool:
    """Check if this is a UUID type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uuid()
        >>> assert dtype.is_uuid()
    """
    return self._dtype.is_uuid()

list #

list(dtype: DataType) -> DataType

Create a List DataType: Variable-length list, where each element in the list has type dtype.

Parameters:

Name	Type	Description	Default
`dtype`	`DataType`	DataType of each element in the list	required

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def list(cls, dtype: DataType) -> DataType:
    """Create a List DataType: Variable-length list, where each element in the list has type ``dtype``.

    Args:
        dtype: DataType of each element in the list
    """
    return cls._from_pydatatype(PyDataType.list(dtype._dtype))

map #

map(key_type: DataType, value_type: DataType) -> DataType

Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.

Parameters:

Name	Type	Description	Default
`key_type`	`DataType`	DataType of the keys in the map	required
`value_type`	`DataType`	DataType of the values in the map	required

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def map(cls, key_type: DataType, value_type: DataType) -> DataType:
    """Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.

    Args:
        key_type: DataType of the keys in the map
        value_type: DataType of the values in the map
    """
    return cls._from_pydatatype(PyDataType.map(key_type._dtype, value_type._dtype))

sparse_tensor #

sparse_tensor(dtype: DataType, shape: tuple[int, ...] | None = None, use_offset_indices: bool = False) -> DataType

Create a SparseTensor DataType: SparseTensor arrays implemented as 'COO Sparse Tensor' representation of n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

The use_offset_indices parameter determines how the indices of the SparseTensor are stored: - False (default): Indices represent the actual positions of nonzero values. - True: Indices represent the offsets between consecutive nonzero values. This can improve compression efficiency, especially when nonzero values are clustered together, as offsets between them are often zero, making them easier to compress.

Parameters:

Name	Type	Description	Default
`dtype`	`DataType`	The type of the data contained within the tensor elements.	required
`shape`	`tuple[int, ...] \| None`	The shape of each SparseTensor in the column. This is `None` by default, which allows the shapes of each tensor element to vary.	`None`
`use_offset_indices`	`bool`	Determines how indices are represented. Defaults to `False` (storing actual indices). If `True`, stores offsets between nonzero indices.	`False`

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def sparse_tensor(
    cls,
    dtype: DataType,
    shape: tuple[int, ...] | None = None,
    use_offset_indices: builtins.bool = False,
) -> DataType:
    """Create a SparseTensor DataType: SparseTensor arrays implemented as 'COO Sparse Tensor' representation of n-dimensional arrays of data of the provided ``dtype`` as elements, each of the provided ``shape``.

    If a ``shape`` is given, each ndarray in the column will have this shape.

    If ``shape`` is not given, the ndarrays in the column can have different shapes. This is much more flexible,
    but will result in a less compact representation and may be make some operations less efficient.

    The ``use_offset_indices`` parameter determines how the indices of the SparseTensor are stored:
    - ``False`` (default): Indices represent the actual positions of nonzero values.
    - ``True``: Indices represent the offsets between consecutive nonzero values.
    This can improve compression efficiency, especially when nonzero values are clustered together,
    as offsets between them are often zero, making them easier to compress.

    Args:
        dtype: The type of the data contained within the tensor elements.
        shape: The shape of each SparseTensor in the column. This is ``None`` by default, which allows the shapes of
            each tensor element to vary.
        use_offset_indices: Determines how indices are represented.
            Defaults to `False` (storing actual indices). If `True`, stores offsets between nonzero indices.
    """
    if shape is not None:
        if not isinstance(shape, tuple) or not shape or any(not isinstance(n, int) for n in shape):
            raise ValueError("SparseTensor shape must be a non-empty tuple of ints, but got: ", shape)
    return cls._from_pydatatype(PyDataType.sparse_tensor(dtype._dtype, shape, use_offset_indices))

struct #

struct(fields: dict[str, DataType]) -> DataType

Create a Struct DataType: a nested type which has names mapped to child types.

Examples:

>>> struct_type = DataType.struct({"name": DataType.string(), "age": DataType.int64()})

Parameters:

Name	Type	Description	Default
`fields`	`dict[str, DataType]`	Nested fields of the Struct	required

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def struct(cls, fields: dict[str, DataType]) -> DataType:
    """Create a Struct DataType: a nested type which has names mapped to child types.

    Examples:
        >>> struct_type = DataType.struct({"name": DataType.string(), "age": DataType.int64()})

    Args:
        fields: Nested fields of the Struct
    """
    return cls._from_pydatatype(PyDataType.struct({name: datatype._dtype for name, datatype in fields.items()}))

tensor #

tensor(dtype: DataType, shape: tuple[int, ...] | None = None) -> DataType

Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:

Name	Type	Description	Default
`dtype`	`DataType`	The type of the data contained within the tensor elements.	required
`shape`	`tuple[int, ...] \| None`	The shape of each tensor in the column. This is `None` by default, which allows the shapes of each tensor element to vary.	`None`

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def tensor(
    cls,
    dtype: DataType,
    shape: tuple[int, ...] | None = None,
) -> DataType:
    """Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided ``dtype`` as elements, each of the provided ``shape``.

    If a ``shape`` is given, each ndarray in the column will have this shape.

    If ``shape`` is not given, the ndarrays in the column can have different shapes. This is much more flexible,
    but will result in a less compact representation and may be make some operations less efficient.

    Args:
        dtype: The type of the data contained within the tensor elements.
        shape: The shape of each tensor in the column. This is ``None`` by default, which allows the shapes of
            each tensor element to vary.
    """
    if shape is not None:
        if not isinstance(shape, tuple) or any(not isinstance(n, int) for n in shape):
            raise ValueError("Tensor shape must be a tuple of ints, but got: ", shape)
    return cls._from_pydatatype(PyDataType.tensor(dtype._dtype, shape))

time #

time(timeunit: TimeUnit | str) -> DataType

Time DataType. Supported timeunits are "us", "ns".

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def time(cls, timeunit: TimeUnit | str) -> DataType:
    """Time DataType. Supported timeunits are "us", "ns"."""
    if isinstance(timeunit, str):
        timeunit = TimeUnit.from_str(timeunit)
    return cls._from_pydatatype(PyDataType.time(timeunit._timeunit))

timestamp #

timestamp(timeunit: TimeUnit | str, timezone: str | None = None) -> DataType

Timestamp DataType.

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def timestamp(cls, timeunit: TimeUnit | str, timezone: str | None = None) -> DataType:
    """Timestamp DataType."""
    if isinstance(timeunit, str):
        timeunit = TimeUnit.from_str(timeunit)
    return cls._from_pydatatype(PyDataType.timestamp(timeunit._timeunit, timezone))

to_arrow_dtype #

to_arrow_dtype() -> DataType

Source code in daft/datatype.py

def to_arrow_dtype(self) -> pa.DataType:
    _ensure_registered_super_ext_type()
    return self._dtype.to_arrow()

union #

union(fields: dict[str, DataType], type_ids: list[int], mode: str | UnionMode = 'sparse') -> DataType

Create a Union DataType: a union of named fields, each with its own type.

Parameters:

Name	Type	Description	Default
`fields`	`dict[str, DataType]`	Mapping of field names to their DataTypes	required
`type_ids`	`list[int]`	Type IDs (one per field) used to identify which variant is stored	required
`mode`	`str \| UnionMode`	Union mode, either `"sparse"` or `"dense"` (default: `"sparse"`)	`'sparse'`

Examples:

>>> import daft
>>> union_type = daft.DataType.union(
...     {"i": daft.DataType.int32(), "f": daft.DataType.float64()},
...     type_ids=[0, 1],
...     mode="sparse",
... )

Source code in daft/datatype.py

@datatype_constructor
@classmethod
def union(
    cls,
    fields: dict[str, DataType],
    type_ids: builtins.list[int],
    mode: str | UnionMode = "sparse",
) -> DataType:
    """Create a Union DataType: a union of named fields, each with its own type.

    Args:
        fields: Mapping of field names to their DataTypes
        type_ids: Type IDs (one per field) used to identify which variant is stored
        mode: Union mode, either ``"sparse"`` or ``"dense"`` (default: ``"sparse"``)

    Examples:
        >>> import daft
        >>> union_type = daft.DataType.union(
        ...     {"i": daft.DataType.int32(), "f": daft.DataType.float64()},
        ...     type_ids=[0, 1],
        ...     mode="sparse",
        ... )
    """
    if isinstance(mode, str):
        mode = UnionMode.from_mode_string(mode)
    return cls._from_pydatatype(PyDataType.union({name: dt._dtype for name, dt in fields.items()}, type_ids, mode))