Skip to content

Type Conversions#

Daft to Python#

This table shows the mapping from Daft DataTypes to Python types, as done in places such as Series.to_pylist, Expression.cast to Python type, and arguments passed into functions decorated with @daft.func.

Daft DataType Python Type
Null None
Boolean bool
Utf8 str
Binary
FixedSizeBinary
bytes
Int8
Uint8
Int16
UInt16
Int32
UInt32
Int64
UInt64
int
Timestamp datetime.datetime
Date datetime.date
Time datetime.time
Duration datetime.timedelta
Interval not supported
Float32
Float64
float
Decimal128 decimal.Decimal
List[T]
FixedSizeList[T, n]
list[T]
Struct[k1: T1, k2: T2, ...] { "k1": <T1>, "k2": <T2>, ... }
Map[K, V] list[tuple[K, V]] (default) or dict[K, V] with maps_as_pydicts
Tensor[T]
FixedShapeTensor[T, [...]]
numpy.typing.NDArray[T]
SparseTensor[T]
FixedShapeSparseTensor[T, [...]]
{
"values": <T>,
"indices": [<int>],
"shape": [<int>]
}
Embedding[T] numpy.typing.NDArray[T]
Image numpy.typing.NDArray[numpy.uint8 | numpy.uint16 | numpy.float32]
Python Any
Extension[T] T

For Map[K, V] conversions, Daft defaults to list[tuple[K, V]] to preserve duplicate keys and ordering. If you pass maps_as_pydicts="lossy" or maps_as_pydicts="strict", Daft converts maps to Python dicts: - "lossy" keeps the last value for duplicate keys and emits a warning when duplicates are encountered. - "strict" raises an exception when duplicate keys are encountered.

Python to Daft#

From Python Type#

This table shows the mapping from Python types to Daft types, such as when inferring the return type from the type hints of a function decorated with @daft.func.

To check the inferred DataType for a Python type, use DataType.infer_from_type.

Python Type Daft DataType
NoneType Null
bool Boolean
str Utf8
bytes Binary
int Int64
float Float64
datetime.datetime Timestamp[us]
datetime.date Date
datetime.time Time[us]
datetime.timedelta Duration[us]
list[T] List[T]
dict[K, V] Map[K, V]
typing.TypedDict("...", { "k1": T1, "k2": T2, ... }) Struct[k1: T1, k2: T2, ...]
tuple[T0, T1, ..., TN] (no ellipsis in actual type) Struct[_0: T0, _1: T1, ..., _N: TN]
tuple[T, ...] List[T]
pydantic.BaseModel with serialized fields f1: T1, f2: T2, ... Struct[f1: T1, f2: T2, ...]
numpy.ndarray
torch.Tensor
tensorflow.Tensor
jax.Array
cupy.ndarray
Tensor[Python]
numpy.typing.NDArray[T] Tensor[T]
torch.FloatTensor Tensor[Float32]
torch.DoubleTensor Tensor[Float64]
torch.ByteTensor Tensor[UInt8]
torch.CharTensor Tensor[Int8]
torch.ShortTensor Tensor[Int16]
torch.IntTensor Tensor[Int32]
torch.LongTensor Tensor[Int64]
torch.BoolTensor Tensor[Boolean]
jaxtyping types (see jaxtyping) Tensor or FixedShapeTensor
numpy.bool_ Boolean
numpy.int8 Int8
numpy.uint8 UInt8
numpy.int16 Int16
numpy.uint16 UInt16
numpy.int32 Int32
numpy.uint32 UInt32
numpy.int64 Int64
numpy.uint64 UInt64
numpy.float32 Float32
numpy.float64 Float64
numpy.datetime64 Timestamp[us]
pandas.Series List[Python]
PIL.Image.Image Image[MIXED]
daft.Series List[Python]
daft.File File
Everything else Python

jaxtyping#

The jaxtyping library provides data type and shape type annotations for array/tensor types from various libraries, including NumPy, PyTorch, TensorFlow, and JAX. Daft is able to natively infer the inner dtype and shape from jaxtyping types.

Examples:

  • jaxtyping.Float64[jaxtyping.Array, "1 2 3 4"] -> FixedShapeTensor[Float64, [1, 2, 3, 4]]
  • jaxtyping.Int8[torch.Tensor, "dim1 dim2"] -> Tensor[Int8]
Dtype Inference#

The following table show the mapping from jaxtyping types to Daft DataType. The Daft DataType corresponds to the inner type of the result Tensor or FixedShapeTensor.

jaxtyping Type Daft DataType
Bool Boolean
Int8 Int8
UInt8 UInt8
Int16 Int16
UInt16 UInt16
Int32 Int32
UInt32 UInt32
Int64
Int
Integer
Int64
UInt64
UInt
UInt64
Float32 Float32
Float64
Float
Real
Float64
Everything else Python
Shape Inference#

The second generic parameter of a jaxtyping type is a string of space-separated symbols representing the shape of the array. Daft will attempt to infer the tensor shape from the string.

  • If all dimensions are fixed-size, Daft will infer a FixedShapeTensor with those dimensions.
    • E.g. "1 2 3", "rows=4 cols=3", or "" (scalar shape)
  • Otherwise, Daft will infer a Tensor.
    • E.g. "dim1 dim2", "512 512 _", or "... 1 2 3"

From Python Object#

In addition to the above table, this table shows the additional behavior when Daft converts Python objects to Daft types without an explicitly specified type, such as in daft.from_pydict and Series.from_pylist. In these cases, Daft is able to derive information from the Python object that is not present in the object's type, allowing for better mapping to Daft types.

To check the inferred DataType for a Python object, use DataType.infer_from_object.

Python Object Daft Type
int value greater than 2^63-1 (max i64 value) UInt64
dict with fields: { "k1": <T1>, "k2": <T2>, ... } Struct[k1: T1, k2: T2, ...]
decimal.Decimal with N digits after the dot Decimal128[precision=38, scale=N]
pandas.Series with element type T List[T]
daft.Series with element type T List[T]
numpy.ndarray
torch.Tensor
tensorflow.Tensor
jax.Array
cupy.ndarray

with numpy dtype T
Tensor[T]
numpy.datetime64 with U = datetime unit - Date if U = "Y", "M", "W", or "D"
- Timestamp[s] if U = "h", "m", or "s"
- Timestamp[ms] if U = "ms"
- Timestamp[us] if U = "us"
- Timestamp[ns] if U = "ns", "ps", "fs", or "as"
PIL.Image.Image with M = image mode Image[M]
(supported modes: L, LA, RGB, RGBA)