Skip to content

DataTypes#

daft.DataType#

Daft provides simple DataTypes that are ubiquitous in many DataFrames such as numbers, strings and dates - all the way up to more complex types like tensors and images.

DataType #

DataType()

A Daft DataType defines the type of all the values in an Expression or DataFrame column.

Methods:

Name Description
decimal128

Fixed-precision decimal.

duration

Duration DataType.

embedding

Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric dtype and each array has a fixed length of size.

extension
file

Create a File DataType: a type which refers to a file object.

fixed_size_binary

Create a FixedSizeBinary DataType: A fixed-size string of bytes.

fixed_size_list

Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type dtype and each list has length size.

from_arrow_type

Maps a PyArrow DataType to a Daft DataType.

from_numpy_dtype

Maps a Numpy datatype to a Daft DataType.

from_sql

Construct a Daft DataType from a SQL type.

image

Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

infer_from_object

Infer Daft DataType from a Python object.

infer_from_type

Infer Daft DataType from a Python type.

is_binary

Check if this is a binary type.

is_boolean

Check if this is a boolean type.

is_date

Check if this is a date type.

is_decimal128

Check if this is a decimal128 type.

is_duration

Check if this is a duration type.

is_embedding

Check if this is an embedding type.

is_extension

Check if this is an extension type.

is_file

Check if this is a file type.

is_fixed_shape_image

Check if this is a fixed shape image type.

is_fixed_shape_sparse_tensor

Check if this is a fixed shape sparse tensor type.

is_fixed_shape_tensor

Check if this is a fixed shape tensor type.

is_fixed_size_binary

Check if this is a fixed size binary type.

is_fixed_size_list

Check if this is a fixed size list type.

is_float16

Check if this is a 16-bit float type.

is_float32

Check if this is a 32-bit float type.

is_float64

Check if this is a 64-bit float type.

is_image

Check if this is an image type.

is_int16

Check if this is a 16-bit integer type.

is_int32

Check if this is a 32-bit integer type.

is_int64

Check if this is a 64-bit integer type.

is_int8

Check if this is an 8-bit integer type.

is_integer

Check if this is an integer type.

is_interval

Check if this is an interval type.

is_list

Check if this is a list type.

is_logical

Check if this is a logical type.

is_map

Check if this is a map type.

is_null

Check if this is a null type.

is_numeric

Check if this is a numeric type.

is_python

Check if this is a python object type.

is_sparse_tensor

Check if this is a sparse tensor type.

is_string

Check if this is a string type.

is_struct

Check if this is a struct type.

is_temporal

Check if this is a temporal type.

is_tensor

Check if this is a tensor type.

is_time

Check if this is a time type.

is_timestamp

Check if this is a timestamp type.

is_uint16

Check if this is an unsigned 16-bit integer type.

is_uint32

Check if this is an unsigned 32-bit integer type.

is_uint64

Check if this is an unsigned 64-bit integer type.

is_uint8

Check if this is an unsigned 8-bit integer type.

is_union

Check if this is a union type.

is_uuid

Check if this is a UUID type.

list

Create a List DataType: Variable-length list, where each element in the list has type dtype.

map

Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.

sparse_tensor

Create a SparseTensor DataType: SparseTensor arrays implemented as 'COO Sparse Tensor' representation of n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

struct

Create a Struct DataType: a nested type which has names mapped to child types.

tensor

Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

time

Time DataType. Supported timeunits are "us", "ns".

timestamp

Timestamp DataType.

to_arrow_dtype
union

Create a Union DataType: a union of named fields, each with its own type.

Attributes:

Name Type Description
binary _CallableSingletonDataType
bool _CallableSingletonDataType
date _CallableSingletonDataType
dtype DataType

If the datatype contains an inner type, return the inner type, otherwise an attribute error is raised.

fields dict[str, DataType]

If this is a struct type, return the fields, otherwise an attribute error is raised.

float16 _CallableSingletonDataType
float32 _CallableSingletonDataType
float64 _CallableSingletonDataType
image_mode ImageMode | None

If this is an image type, return the (optional) image mode, otherwise an attribute error is raised.

int16 _CallableSingletonDataType
int32 _CallableSingletonDataType
int64 _CallableSingletonDataType
int8 _CallableSingletonDataType
interval _CallableSingletonDataType
key_type DataType

If this is a map type, return the key type, otherwise an attribute error is raised.

null _CallableSingletonDataType
precision int

If this is a decimal type, return the precision, otherwise an attribute error is raised.

python _CallableSingletonDataType
scale int

If this is a decimal type, return the scale, otherwise an attribute error is raised.

shape tuple[int, ...]

If this is a fixed shape type, return the shape, otherwise an attribute error is raised.

size int

If this is a fixed size type, return the size, otherwise an attribute error is raised.

string _CallableSingletonDataType
timeunit TimeUnit

If this is a time or timestamp type, return the timeunit, otherwise an attribute error is raised.

timezone str | None

If this is a timestamp type, return the timezone, otherwise an attribute error is raised.

type_ids list[int]

If this is a union type, return the type IDs, otherwise an attribute error is raised.

uint16 _CallableSingletonDataType
uint32 _CallableSingletonDataType
uint64 _CallableSingletonDataType
uint8 _CallableSingletonDataType
union_fields dict[str, DataType]

If this is a union type, return the fields, otherwise an attribute error is raised.

union_mode UnionMode

If this is a union type, return the union mode, otherwise an attribute error is raised.

use_offset_indices bool

If this is a sparse tensor type, return whether the indices are stored as offsets, otherwise an attribute error is raised.

uuid _CallableSingletonDataType
value_type DataType

If this is a map type, return the value type, otherwise an attribute error is raised.

Source code in daft/datatype.py
148
149
150
151
152
def __init__(self) -> None:
    raise NotImplementedError(
        "We do not support creating a DataType via __init__ "
        "use a creator method like DataType.int32() or use DataType.from_arrow_type(pa_type)"
    )

binary #

binary: _CallableSingletonDataType

bool #

bool: _CallableSingletonDataType

date #

date: _CallableSingletonDataType

dtype #

dtype: DataType

If the datatype contains an inner type, return the inner type, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.list(daft.DataType.int64())
>>> assert dtype.dtype == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.dtype
... except AttributeError:
...     pass

fields #

fields: dict[str, DataType]

If this is a struct type, return the fields, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
9
>>> import daft
>>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
>>> fields = dtype.fields
>>> assert fields["a"] == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.fields
... except AttributeError:
...     pass

float16 #

float16: _CallableSingletonDataType

float32 #

float32: _CallableSingletonDataType

float64 #

float64: _CallableSingletonDataType

image_mode #

image_mode: ImageMode | None

If this is an image type, return the (optional) image mode, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.image(mode="RGB")
>>> assert dtype.image_mode == daft.ImageMode.RGB
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.image_mode
... except AttributeError:
...     pass

int16 #

int16: _CallableSingletonDataType

int32 #

int32: _CallableSingletonDataType

int64 #

int64: _CallableSingletonDataType

int8 #

int8: _CallableSingletonDataType

interval #

interval: _CallableSingletonDataType

key_type #

key_type: DataType

If this is a map type, return the key type, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.key_type == daft.DataType.string()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.key_type
... except AttributeError:
...     pass

null #

null: _CallableSingletonDataType

precision #

precision: int

If this is a decimal type, return the precision, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.precision == 10
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.precision
... except AttributeError:
...     pass

python #

python: _CallableSingletonDataType

scale #

scale: int

If this is a decimal type, return the scale, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.scale == 2
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.scale
... except AttributeError:
...     pass

shape #

shape: tuple[int, ...]

If this is a fixed shape type, return the shape, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.shape == (2, 3)
>>> dtype = daft.DataType.tensor(daft.DataType.float32())
>>> try:
...     dtype.shape
... except AttributeError:
...     pass

size #

size: int

If this is a fixed size type, return the size, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.fixed_size_binary(size=10)
>>> assert dtype.size == 10
>>> dtype = daft.DataType.binary()
>>> try:
...     dtype.size
... except AttributeError:
...     pass

string #

string: _CallableSingletonDataType

timeunit #

timeunit: TimeUnit

If this is a time or timestamp type, return the timeunit, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.time(timeunit="ns")
>>> dtype.timeunit
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.timeunit
... except AttributeError:
...     pass
TimeUnit(ns)

timezone #

timezone: str | None

If this is a timestamp type, return the timezone, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns", timezone="UTC")
>>> assert dtype.timezone == "UTC"
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.timezone
... except AttributeError:
...     pass

type_ids #

type_ids: list[int]

If this is a union type, return the type IDs, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.union({"i": daft.DataType.int32(), "f": daft.DataType.float64()}, type_ids=[0, 1])
>>> assert dtype.type_ids == [0, 1]
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.type_ids
... except AttributeError:
...     pass

uint16 #

uint16: _CallableSingletonDataType

uint32 #

uint32: _CallableSingletonDataType

uint64 #

uint64: _CallableSingletonDataType

uint8 #

uint8: _CallableSingletonDataType

union_fields #

union_fields: dict[str, DataType]

If this is a union type, return the fields, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
9
>>> import daft
>>> dtype = daft.DataType.union({"i": daft.DataType.int32(), "f": daft.DataType.float64()}, type_ids=[0, 1])
>>> fields = dtype.union_fields
>>> assert fields["i"] == daft.DataType.int32()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.union_fields
... except AttributeError:
...     pass

union_mode #

union_mode: UnionMode

If this is a union type, return the union mode, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.union({"i": daft.DataType.int32()}, type_ids=[0], mode="dense")
>>> assert str(dtype.union_mode) == "Dense"
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.union_mode
... except AttributeError:
...     pass

use_offset_indices #

use_offset_indices: bool

If this is a sparse tensor type, return whether the indices are stored as offsets, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), use_offset_indices=True)
>>> assert dtype.use_offset_indices
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.use_offset_indices
... except AttributeError:
...     pass

uuid #

uuid: _CallableSingletonDataType

value_type #

value_type: DataType

If this is a map type, return the value type, otherwise an attribute error is raised.

Examples:

1
2
3
4
5
6
7
8
>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.value_type == daft.DataType.int64()
>>> dtype = daft.DataType.int64()
>>> try:
...     dtype.value_type
... except AttributeError:
...     pass

decimal128 #

decimal128(precision: int, scale: int) -> DataType

Fixed-precision decimal.

Source code in daft/datatype.py
532
533
534
535
536
@datatype_constructor
@classmethod
def decimal128(cls, precision: int, scale: int) -> DataType:
    """Fixed-precision decimal."""
    return cls._from_pydatatype(PyDataType.decimal128(precision, scale))

duration #

duration(timeunit: TimeUnit | str) -> DataType

Duration DataType.

Source code in daft/datatype.py
554
555
556
557
558
559
560
@datatype_constructor
@classmethod
def duration(cls, timeunit: TimeUnit | str) -> DataType:
    """Duration DataType."""
    if isinstance(timeunit, str):
        timeunit = TimeUnit.from_str(timeunit)
    return cls._from_pydatatype(PyDataType.duration(timeunit._timeunit))

embedding #

embedding(dtype: DataType, size: int) -> DataType

Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a numeric dtype and each array has a fixed length of size.

Parameters:

Name Type Description Default
dtype DataType

DataType of each element in the list (must be numeric)

required
size int

length of each list

required
Source code in daft/datatype.py
614
615
616
617
618
619
620
621
622
623
624
625
@datatype_constructor
@classmethod
def embedding(cls, dtype: DataType, size: int) -> DataType:
    """Create an Embedding DataType: embeddings are fixed size arrays, where each element in the array has a **numeric** ``dtype`` and each array has a fixed length of ``size``.

    Args:
        dtype: DataType of each element in the list (must be numeric)
        size: length of each list
    """
    if not isinstance(size, int) or size <= 0:
        raise ValueError("The size for a embedding must be a positive integer, but got: ", size)
    return cls._from_pydatatype(PyDataType.embedding(dtype._dtype, size))

extension #

extension(name: str, storage_dtype: DataType, metadata: str | None = None) -> DataType
Source code in daft/datatype.py
609
610
611
612
@datatype_constructor
@classmethod
def extension(cls, name: str, storage_dtype: DataType, metadata: str | None = None) -> DataType:
    return cls._from_pydatatype(PyDataType.extension(name, storage_dtype._dtype, metadata))

file #

file(media_type: MediaType = unknown()) -> DataType

Create a File DataType: a type which refers to a file object.

Source code in daft/datatype.py
859
860
861
862
863
@datatype_constructor
@classmethod
def file(cls, media_type: MediaType = MediaType.unknown()) -> DataType:
    """Create a File DataType: a type which refers to a file object."""
    return cls._from_pydatatype(PyDataType.file(media_type._media_type))

fixed_size_binary #

fixed_size_binary(size: int) -> DataType

Create a FixedSizeBinary DataType: A fixed-size string of bytes.

Source code in daft/datatype.py
524
525
526
527
528
529
530
@datatype_constructor
@classmethod
def fixed_size_binary(cls, size: int) -> DataType:
    """Create a FixedSizeBinary DataType: A fixed-size string of bytes."""
    if not isinstance(size, int) or size <= 0:
        raise ValueError("The size for a fixed-size binary must be a positive integer, but got: ", size)
    return cls._from_pydatatype(PyDataType.fixed_size_binary(size))

fixed_size_list #

fixed_size_list(dtype: DataType, size: int) -> DataType

Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type dtype and each list has length size.

Parameters:

Name Type Description Default
dtype DataType

DataType of each element in the list

required
size int

length of each list

required
Source code in daft/datatype.py
572
573
574
575
576
577
578
579
580
581
582
583
@datatype_constructor
@classmethod
def fixed_size_list(cls, dtype: DataType, size: int) -> DataType:
    """Create a FixedSizeList DataType: Fixed-size list, where each element in the list has type ``dtype`` and each list has length ``size``.

    Args:
        dtype: DataType of each element in the list
        size: length of each list
    """
    if not isinstance(size, int) or size <= 0:
        raise ValueError("The size for a fixed-size list must be a positive integer, but got: ", size)
    return cls._from_pydatatype(PyDataType.fixed_size_list(dtype._dtype, size))

from_arrow_type #

from_arrow_type(arrow_type: DataType, python_fallback: bool = True) -> DataType

Maps a PyArrow DataType to a Daft DataType.

Source code in daft/datatype.py
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
@classmethod
def from_arrow_type(cls, arrow_type: pa.lib.DataType, python_fallback: builtins.bool = True) -> DataType:
    """Maps a PyArrow DataType to a Daft DataType."""
    if pa.types.is_int8(arrow_type):
        return cls.int8()
    elif pa.types.is_int16(arrow_type):
        return cls.int16()
    elif pa.types.is_int32(arrow_type):
        return cls.int32()
    elif pa.types.is_int64(arrow_type):
        return cls.int64()
    elif pa.types.is_uint8(arrow_type):
        return cls.uint8()
    elif pa.types.is_uint16(arrow_type):
        return cls.uint16()
    elif pa.types.is_uint32(arrow_type):
        return cls.uint32()
    elif pa.types.is_uint64(arrow_type):
        return cls.uint64()
    elif pa.types.is_float16(arrow_type):
        return cls.float16()
    elif pa.types.is_float32(arrow_type):
        return cls.float32()
    elif pa.types.is_float64(arrow_type):
        return cls.float64()
    elif pa.types.is_string(arrow_type) or pa.types.is_large_string(arrow_type):
        return cls.string()
    elif pa.types.is_binary(arrow_type) or pa.types.is_large_binary(arrow_type):
        return cls.binary()
    elif pa.types.is_fixed_size_binary(arrow_type):
        return cls.fixed_size_binary(arrow_type.byte_width)
    elif pa.types.is_boolean(arrow_type):
        return cls.bool()
    elif pa.types.is_null(arrow_type):
        return cls.null()
    elif pa.types.is_decimal128(arrow_type):
        return cls.decimal128(arrow_type.precision, arrow_type.scale)
    elif pa.types.is_date32(arrow_type):
        return cls.date()
    elif pa.types.is_date64(arrow_type):
        return cls.timestamp(TimeUnit.ms())
    elif pa.types.is_time64(arrow_type):
        timeunit = TimeUnit.from_str(pa.type_for_alias(str(arrow_type)).unit)
        return cls.time(timeunit)
    elif pa.types.is_timestamp(arrow_type):
        timeunit = TimeUnit.from_str(arrow_type.unit)
        return cls.timestamp(timeunit=timeunit, timezone=arrow_type.tz)
    elif pa.types.is_duration(arrow_type):
        timeunit = TimeUnit.from_str(arrow_type.unit)
        return cls.duration(timeunit=timeunit)
    elif pa.types.is_list(arrow_type) or pa.types.is_large_list(arrow_type):
        assert isinstance(arrow_type, (pa.ListType, pa.LargeListType))
        field = arrow_type.value_field
        return cls.list(cls.from_arrow_type(field.type, python_fallback))
    elif pa.types.is_fixed_size_list(arrow_type):
        assert isinstance(arrow_type, pa.FixedSizeListType)
        field = arrow_type.value_field
        return cls.fixed_size_list(cls.from_arrow_type(field.type, python_fallback), arrow_type.list_size)
    elif pa.types.is_struct(arrow_type):
        assert isinstance(arrow_type, pa.StructType)
        fields = [arrow_type[i] for i in range(arrow_type.num_fields)]
        return cls.struct({field.name: cls.from_arrow_type(field.type, python_fallback) for field in fields})
    elif pa.types.is_interval(arrow_type):
        return cls.interval()
    elif pa.types.is_map(arrow_type):
        assert isinstance(arrow_type, pa.MapType)
        return cls.map(
            key_type=cls.from_arrow_type(arrow_type.key_type, python_fallback),
            value_type=cls.from_arrow_type(arrow_type.item_type, python_fallback),
        )
    elif isinstance(arrow_type, pa.FixedShapeTensorType):
        scalar_dtype = cls.from_arrow_type(arrow_type.value_type, python_fallback)
        return cls.tensor(scalar_dtype, tuple(arrow_type.shape))
    elif pa.types.is_union(arrow_type):
        assert isinstance(arrow_type, pa.UnionType)
        mode = "dense" if arrow_type.mode == "dense" else "sparse"
        field_dict = {
            arrow_type.field(i).name: cls.from_arrow_type(arrow_type.field(i).type, python_fallback)
            for i in range(arrow_type.num_fields)
        }
        type_ids = list(arrow_type.type_codes)
        return cls.union(field_dict, type_ids, mode)
    # Only check for PyExtensionType if pyarrow version is < 21.0.0
    if hasattr(pa, "PyExtensionType") and isinstance(arrow_type, getattr(pa, "PyExtensionType")):
        # TODO(Clark): Add a native cross-lang extension type representation for PyExtensionTypes.
        raise ValueError(
            "pyarrow extension types that subclass pa.PyExtensionType can't be used in Daft, since they can't be "
            f"used in non-Python Arrow implementations and Daft uses the Rust Arrow implementation: {arrow_type}"
        )
    elif isinstance(arrow_type, pa.BaseExtensionType):
        name = arrow_type.extension_name

        if (get_or_create_runner().name == "ray") and (
            type(arrow_type).__reduce__ == pa.BaseExtensionType.__reduce__
        ):
            raise ValueError(
                f"You are attempting to use a Extension Type: {arrow_type} with the default pyarrow `__reduce__` which breaks pickling for Extensions"
                "To fix this, implement your own `__reduce__` on your extension type"
                "For more details see this issue: "
                "https://github.com/apache/arrow/issues/35599"
            )
        try:
            metadata = arrow_type.__arrow_ext_serialize__().decode()
        except AttributeError:
            metadata = None

        if name == "daft.super_extension":
            assert metadata is not None
            return cls._from_pydatatype(PyDataType.from_json(metadata))
        else:
            return cls.extension(
                name,
                cls.from_arrow_type(arrow_type.storage_type, python_fallback),
                metadata,
            )
    else:
        if python_fallback:
            # Fall back to a Python object type.
            # TODO(Clark): Add native support for remaining Arrow types.
            return cls.python()
        else:
            raise TypeError(f"Unsupported Arrow type: {arrow_type}")

from_numpy_dtype #

from_numpy_dtype(np_type: dtype[Any]) -> DataType

Maps a Numpy datatype to a Daft DataType.

Source code in daft/datatype.py
849
850
851
852
853
@classmethod
def from_numpy_dtype(cls, np_type: np.dtype[Any]) -> DataType:
    """Maps a Numpy datatype to a Daft DataType."""
    arrow_type = pa.from_numpy_dtype(np_type)
    return cls.from_arrow_type(arrow_type)

from_sql #

from_sql(sql_type: str) -> DataType

Construct a Daft DataType from a SQL type.

Source code in daft/datatype.py
494
495
496
497
@classmethod
def from_sql(cls, sql_type: str) -> DataType:
    """Construct a Daft DataType from a SQL type."""
    return cls._from_pydatatype(sql_datatype(sql_type))

image #

image(mode: str | ImageMode | None = None, height: int | None = None, width: int | None = None) -> DataType

Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

Each image in the array has an :class:~daft.ImageMode, which describes the pixel dtype (e.g. uint8) and the number of image channels/bands and their logical interpretation (e.g. RGB).

If the height, width, and mode are the same for all images in the array, specifying them when constructing this type is advised, since that will allow Daft to create a more optimized physical representation of the image array.

If the height, width, or mode may vary across images in the array, leaving these fields unspecified when creating this type will cause Daft to represent this image array as a heterogeneous collection of images, where each image can have a different mode, height, and width. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:

Name Type Description Default
mode str | ImageMode | None

The mode of the image. By default, this is inferred from the underlying data. If height and width are specified, the mode must also be specified.

None
height int | None

The height of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.

None
width int | None

The width of the image. By default, this is inferred from the underlying data. Must be specified if the width is specified.

None
Source code in daft/datatype.py
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
@datatype_constructor
@classmethod
def image(
    cls, mode: str | ImageMode | None = None, height: int | None = None, width: int | None = None
) -> DataType:
    """Create an Image DataType: image arrays contain (height, width, channel) ndarrays of pixel values.

    Each image in the array has an :class:`~daft.ImageMode`, which describes the pixel dtype (e.g. uint8) and
    the number of image channels/bands and their logical interpretation (e.g. RGB).

    If the height, width, and mode are the same for all images in the array, specifying them when constructing
    this type is advised, since that will allow Daft to create a more optimized physical representation
    of the image array.

    If the height, width, or mode may vary across images in the array, leaving these fields unspecified when
    creating this type will cause Daft to represent this image array as a heterogeneous collection of images,
    where each image can have a different mode, height, and width. This is much more flexible, but will result
    in a less compact representation and may be make some operations less efficient.

    Args:
        mode: The mode of the image. By default, this is inferred from the underlying data.
            If height and width are specified, the mode must also be specified.
        height: The height of the image. By default, this is inferred from the underlying data.
            Must be specified if the width is specified.
        width: The width of the image. By default, this is inferred from the underlying data.
            Must be specified if the width is specified.
    """
    if isinstance(mode, str):
        mode = ImageMode.from_mode_string(mode.upper())
    if mode is not None and not isinstance(mode, ImageMode):
        raise ValueError(f"mode must be a string or ImageMode variant, but got: {mode}")
    if height is not None and width is not None:
        if not isinstance(height, int) or height <= 0:
            raise ValueError("Image height must be a positive integer, but got: ", height)
        if not isinstance(width, int) or width <= 0:
            raise ValueError("Image width must be a positive integer, but got: ", width)
    elif height is not None or width is not None:
        raise ValueError(
            f"Image height and width must either both be specified, or both not be specified, but got height={height}, width={width}"
        )
    return cls._from_pydatatype(PyDataType.image(mode, height, width))

infer_from_object #

infer_from_object(obj: Any) -> DataType

Infer Daft DataType from a Python object.

Source code in daft/datatype.py
486
487
488
489
490
491
492
@classmethod
def infer_from_object(cls, obj: Any) -> DataType:
    """Infer Daft DataType from a Python object."""
    from daft.series import Series

    s = Series.from_pylist([obj])
    return s.datatype()

infer_from_type #

infer_from_type(t: type | GenericAlias | UnionType) -> DataType

Infer Daft DataType from a Python type.

Source code in daft/datatype.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
@classmethod
def infer_from_type(cls, t: type | GenericAlias | UnionType) -> DataType:
    """Infer Daft DataType from a Python type."""
    # NOTE: Make sure this matches the logic in `Literal::from_pyobj` in Rust
    # NOTE: The base type for Union is hidden, so it requires special handling
    # TODO: TypeForm would cover everything: https://peps.python.org/pep-0747/

    assert isinstance(t, (type, GenericAlias, UnionType)) or typing.get_origin(t) is typing.Union, (
        f"Input to DataType.infer_from_type must be a type, found {t} (type {type(t)})"
    )

    import datetime
    import decimal
    import importlib
    from typing import is_typeddict

    import daft.file
    import daft.series

    origin_or_none = typing.get_origin(t)
    origin: type = origin_or_none if origin_or_none is not None else t  # type: ignore
    args = typing.get_args(t)

    def check_type(type_or_path: type | str) -> bool:
        """Check if `origin` is a subclass of `type_or_path`.

        Pass in a string value for `type_or_path` for types from optional dependencies.
        """
        if isinstance(type_or_path, type):
            type_obj = type_or_path
        elif isinstance(type_or_path, str):
            module_name, type_name = type_or_path.rsplit(".", 1)
            try:
                module = importlib.import_module(module_name)
                type_obj = getattr(module, type_name)
            except (ImportError, AttributeError):
                return False
        else:
            raise ValueError("`type_or_path` must be type or string")

        return issubclass(origin, type_obj)

    # NOTE: This has to be first to handle the special case of typing.Union
    if origin is typing.Union or check_type(UnionType):  # type: ignore[comparison-overlap]
        inner_types = set(DataType.infer_from_type(arg) for arg in args)
        if len(inner_types) == 1:
            return inner_types.pop()
        elif len(inner_types) == 2 and cls.null() in inner_types:
            return inner_types.difference([cls.null()]).pop()
        else:
            return cls.python()
    elif check_type(type(None)):
        return cls.null()
    elif check_type(bool):
        return cls.bool()
    elif check_type(str):
        return cls.string()
    elif check_type(bytes):
        return cls.binary()
    elif check_type(int):
        return cls.int64()
    elif check_type(float):
        return cls.float64()
    elif check_type(datetime.datetime):
        # cannot derive timezone from type
        return cls.timestamp(TimeUnit.us(), timezone=None)
    elif check_type(datetime.date):
        return cls.date()
    elif check_type(datetime.time):
        return cls.time(TimeUnit.us())
    elif check_type(datetime.timedelta):
        return cls.duration(TimeUnit.us())
    elif check_type(daft.file.VideoFile):
        return cls.file(MediaType.video())
    elif check_type(daft.file.AudioFile):
        return cls.file(MediaType.audio())
    elif check_type(daft.file.ImageFile):
        return cls.file(MediaType.image())
    elif check_type(daft.file.File):
        return cls.file(MediaType.unknown())
    elif check_type(list):
        if len(args) == 0:
            inner_dtype = cls.python()
        elif len(args) == 1:
            inner_dtype = cls.infer_from_type(args[0])
        else:
            raise TypeError(f"Python list type cannot have more than one type argument, found: {t}")

        return cls.list(inner_dtype)
    elif is_typeddict(origin):
        field_types = typing.get_type_hints(origin)
        if any(not isinstance(t, str) for t in field_types):
            warnings.warn(
                f"Expected all TypedDict keys to be strings, found: {field_types}. Defaulting to Map[Python, Python] type."
            )
            return cls.map(cls.python(), cls.python())

        field_dtypes = {k: cls.infer_from_type(v) for k, v in field_types.items()}
        return cls.struct(field_dtypes)
    elif check_type(dict):
        if len(args) == 0:
            key_dtype = cls.python()
            value_dtype = cls.python()
        elif len(args) == 2:
            key_dtype = cls.infer_from_type(args[0])
            value_dtype = cls.infer_from_type(args[1])
        else:
            raise TypeError(f"Python dict type must have exactly two type arguments, found: {t}")

        # dict type can also be turned into struct, but we are unable to derive the struct keys from the type alone
        return cls.map(key_dtype, value_dtype)
    elif check_type(tuple):
        if len(args) == 0:
            # tuple -> List[Python]
            return cls.list(cls.python())
        if len(args) == 2 and args[1] is Ellipsis:
            # tuple[inner_type, ...] -> List[inner_type]
            inner_dtype = cls.infer_from_type(args[0])
            return cls.list(inner_dtype)
        else:
            # tuple[type0, type1, ...] -> Struct[_0: type0, _1: type1, ...]
            field_dtypes = {f"_{i}": cls.infer_from_type(arg) for i, arg in enumerate(args)}
            return cls.struct(field_dtypes)
    elif check_type(decimal.Decimal):
        warnings.warn(
            "Cannot derive precision and scale from decimal.Decimal type, defaulting to DataType.python()"
        )
        return cls.python()
    elif check_type(daft.series.Series):
        warnings.warn(
            "Cannot derive inner type from daft.Series type, defaulting to DataType.python() for Series inner type"
        )
        return cls.list(cls.python())
    elif check_type("pydantic.BaseModel"):
        import pydantic

        if not (parse("2.0.0") <= parse(pydantic.__version__) < parse("3.0.0")):
            raise ValueError(
                f"Daft only supports DataType inference for Pydantic V2, found Pydantic V{pydantic.__version__}"
            )

        model: pydantic.BaseModel = origin

        serialize_by_alias = model.model_config.get("serialize_by_alias", False)

        field_dtypes = {}
        for attr_name, field_info in model.model_fields.items():
            if serialize_by_alias:
                if field_info.serialization_alias is not None:
                    serialized_name = field_info.serialization_alias
                elif field_info.alias is not None:
                    serialized_name = field_info.alias
                else:
                    serialized_name = attr_name
            else:
                serialized_name = attr_name
            field_dtypes[serialized_name] = cls.infer_from_type(field_info.annotation)

        for attr_name, field_info in model.model_computed_fields.items():
            if serialize_by_alias:
                if field_info.alias is not None:
                    serialized_name = field_info.alias
                else:
                    serialized_name = attr_name
            else:
                serialized_name = attr_name
            field_dtypes[serialized_name] = cls.infer_from_type(field_info.return_type)

        return cls.struct(field_dtypes)
    elif check_type("PIL.Image.Image"):
        return cls.image()
    elif check_type("jaxtyping.AbstractArray"):
        return cls._infer_from_jaxtyping(origin)
    elif check_type("numpy.ndarray"):
        inner_dtype = None
        if len(args) == 2:
            # https://numpy.org/doc/2.3/reference/typing.html#numpy.typing.NDArray
            numpy_dtype_args = typing.get_args(args[1])
            if len(numpy_dtype_args) == 1:
                inner_dtype = cls.infer_from_type(numpy_dtype_args[0])

        if inner_dtype is None:
            warnings.warn(
                f"Cannot derive inner type from {t}, defaulting to DataType.python() for ndarray inner type"
            )
            inner_dtype = cls.python()

        return cls.tensor(inner_dtype)

    elif check_type("torch.FloatTensor"):
        return cls.tensor(cls.float32())
    elif check_type("torch.DoubleTensor"):
        return cls.tensor(cls.float64())
    elif check_type("torch.ByteTensor"):
        return cls.tensor(cls.uint8())
    elif check_type("torch.CharTensor"):
        return cls.tensor(cls.int8())
    elif check_type("torch.ShortTensor"):
        return cls.tensor(cls.int16())
    elif check_type("torch.IntTensor"):
        return cls.tensor(cls.int32())
    elif check_type("torch.LongTensor"):
        return cls.tensor(cls.int64())
    elif check_type("torch.BoolTensor"):
        return cls.tensor(cls.bool())
    elif (
        check_type("torch.Tensor")
        or check_type("tensorflow.Tensor")
        or check_type("jax.Array")
        or check_type("cupy.ndarray")
    ):
        return cls.tensor(cls.python())
    elif check_type("numpy.generic"):
        return cls._infer_from_numpy_scalar_dtype(origin)
    elif check_type("pandas.Series"):
        warnings.warn(
            "Cannot derive inner type from pandas.Series type, defaulting to DataType.python() for Series inner type"
        )
        return cls.list(cls.python())
    else:
        return cls.python()

is_binary #

is_binary() -> bool

Check if this is a binary type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.binary  # or daft.DataType.binary()
>>> assert dtype.is_binary()
Source code in daft/datatype.py
1077
1078
1079
1080
1081
1082
1083
1084
1085
def is_binary(self) -> builtins.bool:
    """Check if this is a binary type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.binary  # or daft.DataType.binary()
        >>> assert dtype.is_binary()
    """
    return self._dtype.is_binary()

is_boolean #

is_boolean() -> bool

Check if this is a boolean type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.bool  # or daft.DataType.bool()
>>> assert dtype.is_boolean()
Source code in daft/datatype.py
903
904
905
906
907
908
909
910
911
def is_boolean(self) -> builtins.bool:
    """Check if this is a boolean type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.bool  # or daft.DataType.bool()
        >>> assert dtype.is_boolean()
    """
    return self._dtype.is_boolean()

is_date #

is_date() -> bool

Check if this is a date type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.date  # or daft.DataType.date()
>>> assert dtype.is_date()
Source code in daft/datatype.py
1037
1038
1039
1040
1041
1042
1043
1044
1045
def is_date(self) -> builtins.bool:
    """Check if this is a date type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.date  # or daft.DataType.date()
        >>> assert dtype.is_date()
    """
    return self._dtype.is_date()

is_decimal128 #

is_decimal128() -> bool

Check if this is a decimal128 type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.decimal128(precision=10, scale=2)
>>> assert dtype.is_decimal128()
Source code in daft/datatype.py
1017
1018
1019
1020
1021
1022
1023
1024
1025
def is_decimal128(self) -> builtins.bool:
    """Check if this is a decimal128 type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.decimal128(precision=10, scale=2)
        >>> assert dtype.is_decimal128()
    """
    return self._dtype.is_decimal128()

is_duration #

is_duration() -> bool

Check if this is a duration type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.duration(timeunit="ns")
>>> assert dtype.is_duration()
Source code in daft/datatype.py
1057
1058
1059
1060
1061
1062
1063
1064
1065
def is_duration(self) -> builtins.bool:
    """Check if this is a duration type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.duration(timeunit="ns")
        >>> assert dtype.is_duration()
    """
    return self._dtype.is_duration()

is_embedding #

is_embedding() -> bool

Check if this is an embedding type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.embedding(daft.DataType.float32(), 512)
>>> assert dtype.is_embedding()
Source code in daft/datatype.py
1197
1198
1199
1200
1201
1202
1203
1204
1205
def is_embedding(self) -> builtins.bool:
    """Check if this is an embedding type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.embedding(daft.DataType.float32(), 512)
        >>> assert dtype.is_embedding()
    """
    return self._dtype.is_embedding()

is_extension #

is_extension() -> bool

Check if this is an extension type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.extension("custom", daft.DataType.int64())
>>> assert dtype.is_extension()
Source code in daft/datatype.py
1167
1168
1169
1170
1171
1172
1173
1174
1175
def is_extension(self) -> builtins.bool:
    """Check if this is an extension type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.extension("custom", daft.DataType.int64())
        >>> assert dtype.is_extension()
    """
    return self._dtype.is_extension()

is_file #

is_file() -> bool

Check if this is a file type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.file()
>>> assert dtype.is_file()
Source code in daft/datatype.py
1297
1298
1299
1300
1301
1302
1303
1304
1305
def is_file(self) -> builtins.bool:
    """Check if this is a file type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.file()
        >>> assert dtype.is_file()
    """
    return self._dtype.is_file()

is_fixed_shape_image #

is_fixed_shape_image() -> bool

Check if this is a fixed shape image type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.image(mode="RGB", height=224, width=224)
>>> assert dtype.is_fixed_shape_image()
Source code in daft/datatype.py
1187
1188
1189
1190
1191
1192
1193
1194
1195
def is_fixed_shape_image(self) -> builtins.bool:
    """Check if this is a fixed shape image type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.image(mode="RGB", height=224, width=224)
        >>> assert dtype.is_fixed_shape_image()
    """
    return self._dtype.is_fixed_shape_image()

is_fixed_shape_sparse_tensor #

is_fixed_shape_sparse_tensor() -> bool

Check if this is a fixed shape sparse tensor type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.is_fixed_shape_sparse_tensor()
Source code in daft/datatype.py
1237
1238
1239
1240
1241
1242
1243
1244
1245
def is_fixed_shape_sparse_tensor(self) -> builtins.bool:
    """Check if this is a fixed shape sparse tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32(), shape=(2, 3))
        >>> assert dtype.is_fixed_shape_sparse_tensor()
    """
    return self._dtype.is_fixed_shape_sparse_tensor()

is_fixed_shape_tensor #

is_fixed_shape_tensor() -> bool

Check if this is a fixed shape tensor type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
>>> assert dtype.is_fixed_shape_tensor()
Source code in daft/datatype.py
1217
1218
1219
1220
1221
1222
1223
1224
1225
def is_fixed_shape_tensor(self) -> builtins.bool:
    """Check if this is a fixed shape tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.tensor(daft.DataType.float32(), shape=(2, 3))
        >>> assert dtype.is_fixed_shape_tensor()
    """
    return self._dtype.is_fixed_shape_tensor()

is_fixed_size_binary #

is_fixed_size_binary() -> bool

Check if this is a fixed size binary type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.fixed_size_binary(size=10)
>>> assert dtype.is_fixed_size_binary()
Source code in daft/datatype.py
1087
1088
1089
1090
1091
1092
1093
1094
1095
def is_fixed_size_binary(self) -> builtins.bool:
    """Check if this is a fixed size binary type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.fixed_size_binary(size=10)
        >>> assert dtype.is_fixed_size_binary()
    """
    return self._dtype.is_fixed_size_binary()

is_fixed_size_list #

is_fixed_size_list() -> bool

Check if this is a fixed size list type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.fixed_size_list(daft.DataType.int64(), size=10)
>>> assert dtype.is_fixed_size_list()
Source code in daft/datatype.py
1127
1128
1129
1130
1131
1132
1133
1134
1135
def is_fixed_size_list(self) -> builtins.bool:
    """Check if this is a fixed size list type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.fixed_size_list(daft.DataType.int64(), size=10)
        >>> assert dtype.is_fixed_size_list()
    """
    return self._dtype.is_fixed_size_list()

is_float16 #

is_float16() -> bool

Check if this is a 16-bit float type.

Source code in daft/datatype.py
993
994
995
def is_float16(self) -> builtins.bool:
    """Check if this is a 16-bit float type."""
    return self._dtype.is_float16()

is_float32 #

is_float32() -> bool

Check if this is a 32-bit float type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.float32  # or daft.DataType.float32()
>>> assert dtype.is_float32()
Source code in daft/datatype.py
 997
 998
 999
1000
1001
1002
1003
1004
1005
def is_float32(self) -> builtins.bool:
    """Check if this is a 32-bit float type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.float32  # or daft.DataType.float32()
        >>> assert dtype.is_float32()
    """
    return self._dtype.is_float32()

is_float64 #

is_float64() -> bool

Check if this is a 64-bit float type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.float64  # or daft.DataType.float64()
>>> assert dtype.is_float64()
Source code in daft/datatype.py
1007
1008
1009
1010
1011
1012
1013
1014
1015
def is_float64(self) -> builtins.bool:
    """Check if this is a 64-bit float type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.float64  # or daft.DataType.float64()
        >>> assert dtype.is_float64()
    """
    return self._dtype.is_float64()

is_image #

is_image() -> bool

Check if this is an image type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.image()
>>> assert dtype.is_image()
Source code in daft/datatype.py
1177
1178
1179
1180
1181
1182
1183
1184
1185
def is_image(self) -> builtins.bool:
    """Check if this is an image type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.image()
        >>> assert dtype.is_image()
    """
    return self._dtype.is_image()

is_int16 #

is_int16() -> bool

Check if this is a 16-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int16  # or daft.DataType.int16()
>>> assert dtype.is_int16()
Source code in daft/datatype.py
923
924
925
926
927
928
929
930
931
def is_int16(self) -> builtins.bool:
    """Check if this is a 16-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int16  # or daft.DataType.int16()
        >>> assert dtype.is_int16()
    """
    return self._dtype.is_int16()

is_int32 #

is_int32() -> bool

Check if this is a 32-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int32  # or daft.DataType.int32()
>>> assert dtype.is_int32()
Source code in daft/datatype.py
933
934
935
936
937
938
939
940
941
def is_int32(self) -> builtins.bool:
    """Check if this is a 32-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int32  # or daft.DataType.int32()
        >>> assert dtype.is_int32()
    """
    return self._dtype.is_int32()

is_int64 #

is_int64() -> bool

Check if this is a 64-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int64  # or daft.DataType.int64()
>>> assert dtype.is_int64()
Source code in daft/datatype.py
943
944
945
946
947
948
949
950
951
def is_int64(self) -> builtins.bool:
    """Check if this is a 64-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int64  # or daft.DataType.int64()
        >>> assert dtype.is_int64()
    """
    return self._dtype.is_int64()

is_int8 #

is_int8() -> bool

Check if this is an 8-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int8  # or daft.DataType.int8()
>>> assert dtype.is_int8()
Source code in daft/datatype.py
913
914
915
916
917
918
919
920
921
def is_int8(self) -> builtins.bool:
    """Check if this is an 8-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int8  # or daft.DataType.int8()
        >>> assert dtype.is_int8()
    """
    return self._dtype.is_int8()

is_integer #

is_integer() -> bool

Check if this is an integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.int64  # or daft.DataType.int64()
>>> assert dtype.is_integer()
Source code in daft/datatype.py
1267
1268
1269
1270
1271
1272
1273
1274
1275
def is_integer(self) -> builtins.bool:
    """Check if this is an integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.int64  # or daft.DataType.int64()
        >>> assert dtype.is_integer()
    """
    return self._dtype.is_integer()

is_interval #

is_interval() -> bool

Check if this is an interval type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.interval  # or daft.DataType.interval()
>>> assert dtype.is_interval()
Source code in daft/datatype.py
1067
1068
1069
1070
1071
1072
1073
1074
1075
def is_interval(self) -> builtins.bool:
    """Check if this is an interval type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.interval  # or daft.DataType.interval()
        >>> assert dtype.is_interval()
    """
    return self._dtype.is_interval()

is_list #

is_list() -> bool

Check if this is a list type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.list(daft.DataType.int64())
>>> assert dtype.is_list()
Source code in daft/datatype.py
1117
1118
1119
1120
1121
1122
1123
1124
1125
def is_list(self) -> builtins.bool:
    """Check if this is a list type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.list(daft.DataType.int64())
        >>> assert dtype.is_list()
    """
    return self._dtype.is_list()

is_logical #

is_logical() -> bool

Check if this is a logical type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.bool  # or daft.DataType.bool()
>>> assert not dtype.is_logical()
Source code in daft/datatype.py
1277
1278
1279
1280
1281
1282
1283
1284
1285
def is_logical(self) -> builtins.bool:
    """Check if this is a logical type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.bool  # or daft.DataType.bool()
        >>> assert not dtype.is_logical()
    """
    return self._dtype.is_logical()

is_map #

is_map() -> bool

Check if this is a map type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
>>> assert dtype.is_map()
Source code in daft/datatype.py
1147
1148
1149
1150
1151
1152
1153
1154
1155
def is_map(self) -> builtins.bool:
    """Check if this is a map type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.map(daft.DataType.string(), daft.DataType.int64())
        >>> assert dtype.is_map()
    """
    return self._dtype.is_map()

is_null #

is_null() -> bool

Check if this is a null type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.null  # or daft.DataType.null()
>>> dtype.is_null()
True
Source code in daft/datatype.py
892
893
894
895
896
897
898
899
900
901
def is_null(self) -> builtins.bool:
    """Check if this is a null type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.null  # or daft.DataType.null()
        >>> dtype.is_null()
        True
    """
    return self._dtype.is_null()

is_numeric #

is_numeric() -> bool

Check if this is a numeric type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.float64  # or daft.DataType.float64()
>>> assert dtype.is_numeric()
Source code in daft/datatype.py
1257
1258
1259
1260
1261
1262
1263
1264
1265
def is_numeric(self) -> builtins.bool:
    """Check if this is a numeric type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.float64  # or daft.DataType.float64()
        >>> assert dtype.is_numeric()
    """
    return self._dtype.is_numeric()

is_python #

is_python() -> bool

Check if this is a python object type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.python  # or daft.DataType.python()
>>> assert dtype.is_python()
Source code in daft/datatype.py
1247
1248
1249
1250
1251
1252
1253
1254
1255
def is_python(self) -> builtins.bool:
    """Check if this is a python object type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.python  # or daft.DataType.python()
        >>> assert dtype.is_python()
    """
    return self._dtype.is_python()

is_sparse_tensor #

is_sparse_tensor() -> bool

Check if this is a sparse tensor type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32())
>>> assert dtype.is_sparse_tensor()
Source code in daft/datatype.py
1227
1228
1229
1230
1231
1232
1233
1234
1235
def is_sparse_tensor(self) -> builtins.bool:
    """Check if this is a sparse tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.sparse_tensor(daft.DataType.float32())
        >>> assert dtype.is_sparse_tensor()
    """
    return self._dtype.is_sparse_tensor()

is_string #

is_string() -> bool

Check if this is a string type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.string  # or daft.DataType.string()
>>> assert dtype.is_string()
Source code in daft/datatype.py
1107
1108
1109
1110
1111
1112
1113
1114
1115
def is_string(self) -> builtins.bool:
    """Check if this is a string type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.string  # or daft.DataType.string()
        >>> assert dtype.is_string()
    """
    return self._dtype.is_string()

is_struct #

is_struct() -> bool

Check if this is a struct type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
>>> assert dtype.is_struct()
Source code in daft/datatype.py
1137
1138
1139
1140
1141
1142
1143
1144
1145
def is_struct(self) -> builtins.bool:
    """Check if this is a struct type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.struct({"a": daft.DataType.int64()})
        >>> assert dtype.is_struct()
    """
    return self._dtype.is_struct()

is_temporal #

is_temporal() -> bool

Check if this is a temporal type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns")
>>> assert dtype.is_temporal()
Source code in daft/datatype.py
1287
1288
1289
1290
1291
1292
1293
1294
1295
def is_temporal(self) -> builtins.bool:
    """Check if this is a temporal type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.timestamp(timeunit="ns")
        >>> assert dtype.is_temporal()
    """
    return self._dtype.is_temporal()

is_tensor #

is_tensor() -> bool

Check if this is a tensor type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.tensor(daft.DataType.float32())
>>> assert dtype.is_tensor()
Source code in daft/datatype.py
1207
1208
1209
1210
1211
1212
1213
1214
1215
def is_tensor(self) -> builtins.bool:
    """Check if this is a tensor type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.tensor(daft.DataType.float32())
        >>> assert dtype.is_tensor()
    """
    return self._dtype.is_tensor()

is_time #

is_time() -> bool

Check if this is a time type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.time(timeunit="ns")
>>> assert dtype.is_time()
Source code in daft/datatype.py
1047
1048
1049
1050
1051
1052
1053
1054
1055
def is_time(self) -> builtins.bool:
    """Check if this is a time type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.time(timeunit="ns")
        >>> assert dtype.is_time()
    """
    return self._dtype.is_time()

is_timestamp #

is_timestamp() -> bool

Check if this is a timestamp type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.timestamp(timeunit="ns")
>>> assert dtype.is_timestamp()
Source code in daft/datatype.py
1027
1028
1029
1030
1031
1032
1033
1034
1035
def is_timestamp(self) -> builtins.bool:
    """Check if this is a timestamp type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.timestamp(timeunit="ns")
        >>> assert dtype.is_timestamp()
    """
    return self._dtype.is_timestamp()

is_uint16 #

is_uint16() -> bool

Check if this is an unsigned 16-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.uint16  # or daft.DataType.uint16()
>>> assert dtype.is_uint16()
Source code in daft/datatype.py
963
964
965
966
967
968
969
970
971
def is_uint16(self) -> builtins.bool:
    """Check if this is an unsigned 16-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint16  # or daft.DataType.uint16()
        >>> assert dtype.is_uint16()
    """
    return self._dtype.is_uint16()

is_uint32 #

is_uint32() -> bool

Check if this is an unsigned 32-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.uint32  # or daft.DataType.uint32()
>>> assert dtype.is_uint32()
Source code in daft/datatype.py
973
974
975
976
977
978
979
980
981
def is_uint32(self) -> builtins.bool:
    """Check if this is an unsigned 32-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint32  # or daft.DataType.uint32()
        >>> assert dtype.is_uint32()
    """
    return self._dtype.is_uint32()

is_uint64 #

is_uint64() -> bool

Check if this is an unsigned 64-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.uint64  # or daft.DataType.uint64()
>>> assert dtype.is_uint64()
Source code in daft/datatype.py
983
984
985
986
987
988
989
990
991
def is_uint64(self) -> builtins.bool:
    """Check if this is an unsigned 64-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint64  # or daft.DataType.uint64()
        >>> assert dtype.is_uint64()
    """
    return self._dtype.is_uint64()

is_uint8 #

is_uint8() -> bool

Check if this is an unsigned 8-bit integer type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.uint8  # or daft.DataType.uint8()
>>> assert dtype.is_uint8()
Source code in daft/datatype.py
953
954
955
956
957
958
959
960
961
def is_uint8(self) -> builtins.bool:
    """Check if this is an unsigned 8-bit integer type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uint8  # or daft.DataType.uint8()
        >>> assert dtype.is_uint8()
    """
    return self._dtype.is_uint8()

is_union #

is_union() -> bool

Check if this is a union type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.union({"i": daft.DataType.int32(), "f": daft.DataType.float64()}, type_ids=[0, 1])
>>> assert dtype.is_union()
Source code in daft/datatype.py
1157
1158
1159
1160
1161
1162
1163
1164
1165
def is_union(self) -> builtins.bool:
    """Check if this is a union type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.union({"i": daft.DataType.int32(), "f": daft.DataType.float64()}, type_ids=[0, 1])
        >>> assert dtype.is_union()
    """
    return self._dtype.is_union()

is_uuid #

is_uuid() -> bool

Check if this is a UUID type.

Examples:

1
2
3
>>> import daft
>>> dtype = daft.DataType.uuid()
>>> assert dtype.is_uuid()
Source code in daft/datatype.py
1097
1098
1099
1100
1101
1102
1103
1104
1105
def is_uuid(self) -> builtins.bool:
    """Check if this is a UUID type.

    Examples:
        >>> import daft
        >>> dtype = daft.DataType.uuid()
        >>> assert dtype.is_uuid()
    """
    return self._dtype.is_uuid()

list #

list(dtype: DataType) -> DataType

Create a List DataType: Variable-length list, where each element in the list has type dtype.

Parameters:

Name Type Description Default
dtype DataType

DataType of each element in the list

required
Source code in daft/datatype.py
562
563
564
565
566
567
568
569
570
@datatype_constructor
@classmethod
def list(cls, dtype: DataType) -> DataType:
    """Create a List DataType: Variable-length list, where each element in the list has type ``dtype``.

    Args:
        dtype: DataType of each element in the list
    """
    return cls._from_pydatatype(PyDataType.list(dtype._dtype))

map #

map(key_type: DataType, value_type: DataType) -> DataType

Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.

Parameters:

Name Type Description Default
key_type DataType

DataType of the keys in the map

required
value_type DataType

DataType of the values in the map

required
Source code in daft/datatype.py
585
586
587
588
589
590
591
592
593
594
@datatype_constructor
@classmethod
def map(cls, key_type: DataType, value_type: DataType) -> DataType:
    """Create a Map DataType: A map is a nested type of key-value pairs that is implemented as a list of structs with two fields, key and value.

    Args:
        key_type: DataType of the keys in the map
        value_type: DataType of the values in the map
    """
    return cls._from_pydatatype(PyDataType.map(key_type._dtype, value_type._dtype))

sparse_tensor #

sparse_tensor(dtype: DataType, shape: tuple[int, ...] | None = None, use_offset_indices: bool = False) -> DataType

Create a SparseTensor DataType: SparseTensor arrays implemented as 'COO Sparse Tensor' representation of n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

The use_offset_indices parameter determines how the indices of the SparseTensor are stored: - False (default): Indices represent the actual positions of nonzero values. - True: Indices represent the offsets between consecutive nonzero values. This can improve compression efficiency, especially when nonzero values are clustered together, as offsets between them are often zero, making them easier to compress.

Parameters:

Name Type Description Default
dtype DataType

The type of the data contained within the tensor elements.

required
shape tuple[int, ...] | None

The shape of each SparseTensor in the column. This is None by default, which allows the shapes of each tensor element to vary.

None
use_offset_indices bool

Determines how indices are represented. Defaults to False (storing actual indices). If True, stores offsets between nonzero indices.

False
Source code in daft/datatype.py
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
@datatype_constructor
@classmethod
def sparse_tensor(
    cls,
    dtype: DataType,
    shape: tuple[int, ...] | None = None,
    use_offset_indices: builtins.bool = False,
) -> DataType:
    """Create a SparseTensor DataType: SparseTensor arrays implemented as 'COO Sparse Tensor' representation of n-dimensional arrays of data of the provided ``dtype`` as elements, each of the provided ``shape``.

    If a ``shape`` is given, each ndarray in the column will have this shape.

    If ``shape`` is not given, the ndarrays in the column can have different shapes. This is much more flexible,
    but will result in a less compact representation and may be make some operations less efficient.

    The ``use_offset_indices`` parameter determines how the indices of the SparseTensor are stored:
    - ``False`` (default): Indices represent the actual positions of nonzero values.
    - ``True``: Indices represent the offsets between consecutive nonzero values.
    This can improve compression efficiency, especially when nonzero values are clustered together,
    as offsets between them are often zero, making them easier to compress.

    Args:
        dtype: The type of the data contained within the tensor elements.
        shape: The shape of each SparseTensor in the column. This is ``None`` by default, which allows the shapes of
            each tensor element to vary.
        use_offset_indices: Determines how indices are represented.
            Defaults to `False` (storing actual indices). If `True`, stores offsets between nonzero indices.
    """
    if shape is not None:
        if not isinstance(shape, tuple) or not shape or any(not isinstance(n, int) for n in shape):
            raise ValueError("SparseTensor shape must be a non-empty tuple of ints, but got: ", shape)
    return cls._from_pydatatype(PyDataType.sparse_tensor(dtype._dtype, shape, use_offset_indices))

struct #

struct(fields: dict[str, DataType]) -> DataType

Create a Struct DataType: a nested type which has names mapped to child types.

Examples:

1
>>> struct_type = DataType.struct({"name": DataType.string(), "age": DataType.int64()})

Parameters:

Name Type Description Default
fields dict[str, DataType]

Nested fields of the Struct

required
Source code in daft/datatype.py
596
597
598
599
600
601
602
603
604
605
606
607
@datatype_constructor
@classmethod
def struct(cls, fields: dict[str, DataType]) -> DataType:
    """Create a Struct DataType: a nested type which has names mapped to child types.

    Examples:
        >>> struct_type = DataType.struct({"name": DataType.string(), "age": DataType.int64()})

    Args:
        fields: Nested fields of the Struct
    """
    return cls._from_pydatatype(PyDataType.struct({name: datatype._dtype for name, datatype in fields.items()}))

tensor #

tensor(dtype: DataType, shape: tuple[int, ...] | None = None) -> DataType

Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided dtype as elements, each of the provided shape.

If a shape is given, each ndarray in the column will have this shape.

If shape is not given, the ndarrays in the column can have different shapes. This is much more flexible, but will result in a less compact representation and may be make some operations less efficient.

Parameters:

Name Type Description Default
dtype DataType

The type of the data contained within the tensor elements.

required
shape tuple[int, ...] | None

The shape of each tensor in the column. This is None by default, which allows the shapes of each tensor element to vary.

None
Source code in daft/datatype.py
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
@datatype_constructor
@classmethod
def tensor(
    cls,
    dtype: DataType,
    shape: tuple[int, ...] | None = None,
) -> DataType:
    """Create a tensor DataType: tensor arrays contain n-dimensional arrays of data of the provided ``dtype`` as elements, each of the provided ``shape``.

    If a ``shape`` is given, each ndarray in the column will have this shape.

    If ``shape`` is not given, the ndarrays in the column can have different shapes. This is much more flexible,
    but will result in a less compact representation and may be make some operations less efficient.

    Args:
        dtype: The type of the data contained within the tensor elements.
        shape: The shape of each tensor in the column. This is ``None`` by default, which allows the shapes of
            each tensor element to vary.
    """
    if shape is not None:
        if not isinstance(shape, tuple) or any(not isinstance(n, int) for n in shape):
            raise ValueError("Tensor shape must be a tuple of ints, but got: ", shape)
    return cls._from_pydatatype(PyDataType.tensor(dtype._dtype, shape))

time #

time(timeunit: TimeUnit | str) -> DataType

Time DataType. Supported timeunits are "us", "ns".

Source code in daft/datatype.py
538
539
540
541
542
543
544
@datatype_constructor
@classmethod
def time(cls, timeunit: TimeUnit | str) -> DataType:
    """Time DataType. Supported timeunits are "us", "ns"."""
    if isinstance(timeunit, str):
        timeunit = TimeUnit.from_str(timeunit)
    return cls._from_pydatatype(PyDataType.time(timeunit._timeunit))

timestamp #

timestamp(timeunit: TimeUnit | str, timezone: str | None = None) -> DataType

Timestamp DataType.

Source code in daft/datatype.py
546
547
548
549
550
551
552
@datatype_constructor
@classmethod
def timestamp(cls, timeunit: TimeUnit | str, timezone: str | None = None) -> DataType:
    """Timestamp DataType."""
    if isinstance(timeunit, str):
        timeunit = TimeUnit.from_str(timeunit)
    return cls._from_pydatatype(PyDataType.timestamp(timeunit._timeunit, timezone))

to_arrow_dtype #

to_arrow_dtype() -> DataType
Source code in daft/datatype.py
855
856
857
def to_arrow_dtype(self) -> pa.DataType:
    _ensure_registered_super_ext_type()
    return self._dtype.to_arrow()

union #

union(fields: dict[str, DataType], type_ids: list[int], mode: str | UnionMode = 'sparse') -> DataType

Create a Union DataType: a union of named fields, each with its own type.

Parameters:

Name Type Description Default
fields dict[str, DataType]

Mapping of field names to their DataTypes

required
type_ids list[int]

Type IDs (one per field) used to identify which variant is stored

required
mode str | UnionMode

Union mode, either "sparse" or "dense" (default: "sparse")

'sparse'

Examples:

1
2
3
4
5
6
>>> import daft
>>> union_type = daft.DataType.union(
...     {"i": daft.DataType.int32(), "f": daft.DataType.float64()},
...     type_ids=[0, 1],
...     mode="sparse",
... )
Source code in daft/datatype.py
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
@datatype_constructor
@classmethod
def union(
    cls,
    fields: dict[str, DataType],
    type_ids: builtins.list[int],
    mode: str | UnionMode = "sparse",
) -> DataType:
    """Create a Union DataType: a union of named fields, each with its own type.

    Args:
        fields: Mapping of field names to their DataTypes
        type_ids: Type IDs (one per field) used to identify which variant is stored
        mode: Union mode, either ``"sparse"`` or ``"dense"`` (default: ``"sparse"``)

    Examples:
        >>> import daft
        >>> union_type = daft.DataType.union(
        ...     {"i": daft.DataType.int32(), "f": daft.DataType.float64()},
        ...     type_ids=[0, 1],
        ...     mode="sparse",
        ... )
    """
    if isinstance(mode, str):
        mode = UnionMode.from_mode_string(mode)
    return cls._from_pydatatype(PyDataType.union({name: dt._dtype for name, dt in fields.items()}, type_ids, mode))