Skip to content
Daft Documentation
Using External APIs
Initializing search
Daft
Guide
Examples
Python API
SQL Reference
Contributing
Daft Skills
Daft Documentation
Daft
Guide
Guide
Quickstart
Installation
AI Functions
AI Functions
Overview
Prompt
Embed
Classify
Providers
Modalities
Modalities
Overview
Text
Images
Audio
Videos
Documents
JSON and Nested Data
Files and URLs
Embeddings
Custom Modalities
User Defined Functions
User Defined Functions
Functions
Classes & Methods
Aggregate UDFs
Working with GPUs
Legacy UDF Migration Guide
Legacy UDFs
Common Use Cases
Common Use Cases
Batch Inference
Datasets
Datasets
Common Crawl
Data Connectors
Data Connectors
Custom Connectors
Custom Catalogs
AWS Glue
AWS S3
AWS S3 Tables
Apache Gravitino
Apache Hudi
Apache Iceberg
Apache Kafka
Azure Blob Store
Bigtable
ClickHouse
COS (Tencent Cloud)
Delta Lake
Files
Google Cloud Storage
Hugging Face Datasets
Lance
MCAP
Apache Paimon
Postgres
SQL Databases
Text Files
Turbopuffer
Unity Catalog (Databricks)
Extensions
Extensions
Overview
Community Extensions
Built on Daft
Authoring Guide
Scaling Out and Deployment
Scaling Out and Deployment
Running on Kubernetes
Running on Ray
Architecture
Architecture
Optimization
Optimization
Managing Memory Usage
Partitioning and Batching
Join Strategies
Observability
Observability
Dashboard
Progress Indicators
Logging
Telemetry
Sessions, Catalogs, and Tables
Roadmap
Benchmarks
Community
↗
Release Notes
↗
Usage Telemetry
Examples
Examples
Multimodal Structured Outputs: Evaluating Image Understanding
Voice AI Analytics with Faster-Whisper and embed_text
Web Text Deduplication
Audio Transcription
Generate Text Embeddings for Turbopuffer
Running LLMs on the Red Pajamas Dataset
Generate Images from Text with Stable Diffusion
Querying Image Data
MNIST Digit Classification
UDF Patterns
Window Functions
Working with Common Crawl Data
Document Processing
Python API
Python API
AI
I/O
DataFrame
Datasets
Expressions
Functions
Functions
abs
add_months
any_value
approx_count_distinct
approx_percentiles
arccos
arccosh
arcsin
arcsinh
arctan
arctan2
arctanh
audio_file
audio_metadata
avg
between
bin
bitwise_and
bitwise_or
bitwise_xor
bool_and
bool_or
capitalize
cast
cbrt
ceil
chunk
classify_image
classify_text
clip
coalesce
columns_avg
columns_max
columns_mean
columns_min
columns_sum
compress
concat
concat_ws
contains
convert_image
convert_time_zone
cos
cosh
cosine_distance
cosine_similarity
cot
count
count_distinct
count_matches
crop
csc
current_date
current_timestamp
current_timezone
date
date_add
date_diff
date_from_unix_date
date_sub
date_trunc
day
day_of_month
day_of_week
day_of_year
decode
decode_image
decode_image_file
decompress
degrees
dense_rank
deserialize
dot_product
download
e
embed_image
embed_text
encode
encode_image
endswith
eq_null_safe
euclidean_distance
exp
explode
expm1
factorial
file
file_path
file_size
fill_nan
fill_null
find
floor
format
from_unixtime
get
great_circle_distance
guess_mime_type
hamming_distance
hamming_distance_str
hash
hour
hypot
ilike
image_attribute
image_channel
image_file
image_file_metadata
image_hash
image_height
image_mode
image_to_tensor
image_width
is_in
is_inf
is_nan
is_null
jaccard_similarity
jq
lag
last_day
lead
left
length
length_bytes
like
list_agg
list_agg_distinct
list_append
list_bool_and
list_bool_or
list_contains
list_count
list_distinct
list_filter
list_flatten
list_join
list_map
list_max
list_mean
list_min
list_sort
list_sum
llm_generate
ln
log
log1p
log2
log10
lower
lpad
lstrip
make_date
make_timestamp
make_timestamp_ltz
map_get
map_keys
max
mean
median
microsecond
millisecond
min
minhash
minute
monotonically_increasing_id
month
months_between
nanosecond
negate
next_day
normalize
not_nan
not_null
over
parse_url
partition_days
partition_hours
partition_iceberg_bucket
partition_iceberg_truncate
partition_months
partition_years
pearson_correlation
percentile
pi
pmod
pow
power
product
prompt
quarter
radians
random_int
rank
regexp
regexp_count
regexp_extract
regexp_extract_all
regexp_replace
regexp_split
repeat
replace
replace_time_zone
resample
resize
reverse
right
round
row_number
rpad
rstrip
run_process
sec
second
seq
serialize
shift_left
shift_right
sign
simhash
sin
sinh
skew
slice
split
sqrt
startswith
stddev
strftime
string_agg
strip
substr
sum
tan
tanh
time
timestamp_micros
timestamp_millis
timestamp_seconds
to_camel_case
to_date
to_datetime
to_kebab_case
to_list
to_snake_case
to_struct
to_title_case
to_unix_epoch
to_upper_camel_case
to_upper_kebab_case
to_upper_snake_case
tokenize_decode
tokenize_encode
total_days
total_hours
total_microseconds
total_milliseconds
total_minutes
total_nanoseconds
total_seconds
try_compress
try_decode
try_decompress
try_deserialize
try_encode
unix_date
unnest
upload
upper
uuid
value_counts
var
video_file
video_frames
video_keyframes
video_metadata
week_of_year
when
year
User-Defined Functions
Data Types
Data Types
DataType
File Types
Type Conversions
Casting
Window
Sessions
Catalogs & Tables
Schema
Aggregations
Series
Configuration
Miscellaneous
SQL Reference
SQL Reference
Statements
Statements
SELECT
DESCRIBE
SHOW
USE
Data Types
Identifiers
Window Functions
Contributing
Contributing
Overview
Development Guide
Contributing AI Functions
Daft Skills
Using External APIs
#
User guide coming soon!
Back to top