Catalogs and Tables#
Daft integrates with various catalog implementations using its Catalog and Table interfaces. These are high-level APIs to manage catalog objects (tables and namespaces), while also making it easy to leverage Daft's existing daft.read_ and df.write_ APIs for open table formats like Iceberg and Delta Lake.
Catalog #
Interface for Python catalog implementations.
A Catalog is a service for discovering, accessing, and querying tabular and non-tabular data. You can instantiate a Catalog using one of the static from_ methods.
Examples:
1 2 3 4 5 6 | |
['users'] Methods:
| Name | Description |
|---|---|
create_function | Registers a function in this catalog. |
create_namespace | Creates a namespace in this catalog. |
create_namespace_if_not_exists | Creates a namespace in this catalog if it does not already exist. |
create_table | Creates a table in this catalog. |
create_table_if_not_exists | Creates a table in this catalog if it does not already exist. |
drop_namespace | |
drop_table | |
from_glue | Creates a Daft Catalog backed by the AWS Glue service, with optional client or session. |
from_gravitino | Create a Daft Catalog from a Gravitino metalake. |
from_iceberg | Create a Daft Catalog from a PyIceberg catalog object. |
from_paimon | Create a Daft Catalog from a pypaimon catalog object. |
from_postgres | Create a Daft Catalog from a PostgreSQL connection string. |
from_pydict | Returns an in-memory catalog from a dictionary of table-like objects. |
from_s3tables | Creates a Daft Catalog from S3 Tables bucket ARN, with optional client or session. |
from_unity | Create a Daft Catalog from a Unity Catalog client. |
get_function | Get a function from the catalog by identifier or raises if the function does not exist. |
get_table | Get a table by its identifier or raises if the table does not exist. |
has_namespace | Returns True if the namespace exists, otherwise False. |
has_table | Returns True if the table exists, otherwise False. |
list_namespaces | List namespaces in the catalog which match the given pattern. |
list_tables | List tables in the catalog which match the given pattern. |
read_table | Returns the table as a DataFrame or raises an exception if it does not exist. |
write_table | |
Attributes:
| Name | Type | Description |
|---|---|---|
name | str | Returns the catalog's name. |
create_function #
create_function(identifier: Identifier | str, function: Function | Callable[..., Any]) -> None
Registers a function in this catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier | Identifier | str | function identifier | required |
function | Function | Callable | the function to register. | required |
Source code in daft/catalog/__init__.py
475 476 477 478 479 480 481 482 483 484 485 | |
create_namespace #
create_namespace(identifier: Identifier | str) -> None
Creates a namespace in this catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier | Identifier | str | namespace identifier | required |
Source code in daft/catalog/__init__.py
487 488 489 490 491 492 493 494 495 496 | |
create_namespace_if_not_exists #
create_namespace_if_not_exists(identifier: Identifier | str) -> None
Creates a namespace in this catalog if it does not already exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier | Identifier | str | namespace identifier | required |
Source code in daft/catalog/__init__.py
498 499 500 501 502 503 504 505 | |
create_table #
create_table(identifier: Identifier | str, source: Schema | DataFrame, properties: Properties | None = None, partition_fields: list[PartitionField] | None = None) -> Table
Creates a table in this catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier | Identifier | str | table identifier | required |
source | Schema | DataFrame | table source object such as a Schema or DataFrame. | required |
Returns:
| Name | Type | Description |
|---|---|---|
Table | Table | new table instance. |
Source code in daft/catalog/__init__.py
507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 | |
create_table_if_not_exists #
create_table_if_not_exists(identifier: Identifier | str, source: Schema | DataFrame, properties: Properties | None = None) -> Table
Creates a table in this catalog if it does not already exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier | Identifier | str | table identifier | required |
source | Schema | DataFrame | table source object such as a Schema or DataFrame. | required |
Returns:
| Name | Type | Description |
|---|---|---|
Table | Table | the existing table (if exists) or the new table instance. |
Source code in daft/catalog/__init__.py
533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 | |
drop_namespace #
drop_namespace(identifier: Identifier | str) -> None
Source code in daft/catalog/__init__.py
575 576 577 578 579 | |
drop_table #
drop_table(identifier: Identifier | str) -> None
Source code in daft/catalog/__init__.py
581 582 583 584 585 | |
from_glue #
from_glue(name: str, client: object | None = None, session: object | None = None) -> Catalog
Creates a Daft Catalog backed by the AWS Glue service, with optional client or session.
Terms
- AWS Glue -> Daft Catalog
- AWS Glue Database -> Daft Namespace
- AWS Glue Table -> Daft Table
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | glue database name | required |
client | object | None | optional boto3 client | None |
session | object | None | optional boto3 session | None |
Returns:
| Name | Type | Description |
|---|---|---|
Catalog | Catalog | new daft catalog instance backed by AWS Glue. |
Source code in daft/catalog/__init__.py
353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 | |
from_gravitino #
from_gravitino(endpoint: str, metalake_name: str, auth_type: Literal['simple', 'oauth2'] = 'simple', username: str | None = None, password: str | None = None, token: str | None = None) -> Catalog
Create a Daft Catalog from a Gravitino metalake.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endpoint | str | Gravitino server endpoint URL. | required |
metalake_name | str | Name of the metalake to connect to. | required |
auth_type | str | Authentication type, either | 'simple' |
username | str | Username for simple authentication. | None |
password | str | Password for simple authentication. | None |
token | str | Bearer token for OAuth2 authentication. | None |
Returns:
| Name | Type | Description |
|---|---|---|
Catalog | Catalog | a new Catalog instance backed by the Gravitino metalake. |
Examples:
1 2 3 4 5 6 | |
Source code in daft/catalog/__init__.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 | |
from_iceberg #
from_iceberg(catalog: object) -> Catalog
Create a Daft Catalog from a PyIceberg catalog object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
catalog | object | a PyIceberg catalog instance | required |
Returns:
| Name | Type | Description |
|---|---|---|
Catalog | Catalog | a new Catalog instance backed by the PyIceberg catalog. |
Examples:
1 2 3 | |
Source code in daft/catalog/__init__.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
from_paimon #
from_paimon(catalog: object, name: str = 'paimon') -> Catalog
Create a Daft Catalog from a pypaimon catalog object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
catalog | object | a pypaimon catalog instance (e.g. from | required |
name | str | name to assign to this catalog. Defaults to | 'paimon' |
Returns:
| Name | Type | Description |
|---|---|---|
Catalog | Catalog | a new Catalog instance backed by the pypaimon catalog. |
Examples:
1 2 3 4 | |
Source code in daft/catalog/__init__.py
388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 | |
from_postgres #
from_postgres(connection_string: str, extensions: list[str] | None = ['vector']) -> Catalog
Create a Daft Catalog from a PostgreSQL connection string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
connection_string | str | a PostgreSQL connection string | required |
extensions | list[str] | List of PostgreSQL extensions to create if they don't exist. For each extension, "CREATE EXTENSION IF NOT EXISTS | ['vector'] |
Returns:
| Name | Type | Description |
|---|---|---|
Catalog | Catalog | a new Catalog instance to a PostgreSQL database. |
Warning
This features is early in development and will likely experience API changes.
Examples:
1 2 3 4 | |
Source code in daft/catalog/__init__.py
412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 | |
from_pydict #
from_pydict(tables: dict[Identifier | str, object], name: str = 'default') -> Catalog
Returns an in-memory catalog from a dictionary of table-like objects.
The table-like objects can be pydicts, dataframes, or a Table implementation. For qualified tables, namespaces are created if necessary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables | dict[str, object] | a dictionary of table-like objects (pydicts, dataframes, and tables) | required |
Returns:
| Name | Type | Description |
|---|---|---|
Catalog | Catalog | new catalog instance with name 'default' |
Examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
['R', 'S', 'T'] Source code in daft/catalog/__init__.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
from_s3tables #
from_s3tables(table_bucket_arn: str, client: object | None = None, session: object | None = None) -> Catalog
Creates a Daft Catalog from S3 Tables bucket ARN, with optional client or session.
If neither a boto3 client nor session is provided, the Iceberg REST client will be used under the hood.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table_bucket_arn | str | ARN of the S3 Tables bucket | required |
client | object | a boto3 client | None |
session | object | a boto3 session | None |
Returns:
| Name | Type | Description |
|---|---|---|
Catalog | Catalog | a new Catalog instance backed by S3 Tables. |
Examples:
1 2 3 | |
Source code in daft/catalog/__init__.py
315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 | |
from_unity #
from_unity(catalog: object) -> Catalog
Create a Daft Catalog from a Unity Catalog client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
catalog | object | a Unity Catalog client instance | required |
Returns:
| Name | Type | Description |
|---|---|---|
Catalog | Catalog | a new Catalog instance backed by the Unity catalog. |
Examples:
1 2 3 | |
Source code in daft/catalog/__init__.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | |
get_function #
get_function(identifier: Identifier | str) -> Function
Get a function from the catalog by identifier or raises if the function does not exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier | Identifier | str | function identifier, where the last part is the function name and preceding parts form the namespace. | required |
Returns:
| Type | Description |
|---|---|
Function | A Function instance. |
Raises:
| Type | Description |
|---|---|
NotFoundError | if the function does not exist. |
Source code in daft/catalog/__init__.py
591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 | |
get_table #
get_table(identifier: Identifier | str) -> Table
Get a table by its identifier or raises if the table does not exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier | Identifier | str | table identifier | required |
Returns:
| Name | Type | Description |
|---|---|---|
Table | Table | matched table or raises if the table does not exist. |
Source code in daft/catalog/__init__.py
609 610 611 612 613 614 615 616 617 618 619 620 621 | |
has_namespace #
has_namespace(identifier: Identifier | str) -> bool
Returns True if the namespace exists, otherwise False.
Source code in daft/catalog/__init__.py
557 558 559 560 561 562 | |
has_table #
has_table(identifier: Identifier | str) -> bool
Returns True if the table exists, otherwise False.
Source code in daft/catalog/__init__.py
564 565 566 567 568 569 | |
list_namespaces #
list_namespaces(pattern: str | None = None) -> list[Identifier]
List namespaces in the catalog which match the given pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern | str | pattern to match such as a namespace prefix | None |
Returns:
| Type | Description |
|---|---|
list[Identifier] | list[Identifier]: list of namespace identifiers matching the pattern. |
Source code in daft/catalog/__init__.py
627 628 629 630 631 632 633 634 635 636 | |
list_tables #
list_tables(pattern: str | None = None) -> list[Identifier]
List tables in the catalog which match the given pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
- pattern | str | Pattern to match table names. Pattern syntax is catalog-dependent: - Native/Memory and Postgres catalogs: Use SQL LIKE syntax ( | required |
Returns:
| Type | Description |
|---|---|
list[Identifier] | list[Identifier]: list of table identifiers matching the pattern. |
Examples:
1 2 3 | |
Source code in daft/catalog/__init__.py
638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 | |
read_table #
read_table(identifier: Identifier | str, **options: dict[str, Any]) -> DataFrame
Returns the table as a DataFrame or raises an exception if it does not exist.
Source code in daft/catalog/__init__.py
660 661 662 | |
write_table #
write_table(identifier: Identifier | str, df: DataFrame, mode: Literal['append', 'overwrite'] = 'append', **options: dict[str, Any]) -> None
Source code in daft/catalog/__init__.py
668 669 670 671 672 673 674 675 | |
Identifier #
Identifier(*parts: str)
A reference (path) to a catalog object.
Examples:
1 2 | |
Creates an Identifier from its parts.
Examples:
1 2 | |
Methods:
| Name | Description |
|---|---|
drop | Returns a new Identifier with the first n parts removed. |
from_sql | Parses an Identifier from an SQL string, normalizing to lowercase if specified. |
from_str | Parses an Identifier from a dot-delimited Python string without normalization. |
Source code in daft/catalog/__init__.py
695 696 697 698 699 700 701 702 703 704 | |
drop #
drop(n: int = 1) -> Identifier
Returns a new Identifier with the first n parts removed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n | int | Number of parts to drop from the beginning. Defaults to 1. | 1 |
Returns:
| Name | Type | Description |
|---|---|---|
Identifier | Identifier | A new Identifier with the first n parts removed. |
Raises:
| Type | Description |
|---|---|
ValueError | If dropping n parts would result in an empty Identifier. |
Source code in daft/catalog/__init__.py
751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 | |
from_sql #
from_sql(input: str, normalize: bool = False) -> Identifier
Parses an Identifier from an SQL string, normalizing to lowercase if specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input | str | input sql string | required |
normalize | bool | flag to case-normalize the identifier text | False |
Returns:
| Name | Type | Description |
|---|---|---|
Identifier | Identifier | new identifier instance |
Examples:
1 2 3 4 | |
Source code in daft/catalog/__init__.py
712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 | |
from_str #
from_str(input: str) -> Identifier
Parses an Identifier from a dot-delimited Python string without normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input | str | input identifier string | required |
Returns:
| Name | Type | Description |
|---|---|---|
Identifier | Identifier | new identifier instance |
Examples:
1 2 | |
Source code in daft/catalog/__init__.py
734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 | |
Table #
Interface for python table implementations.
Methods:
| Name | Description |
|---|---|
append | Appends the DataFrame to this table. |
from_df | Returns a read-only table backed by the DataFrame. |
from_gravitino | Returns a Daft Table instance from a Gravitino table. |
from_iceberg | Creates a Daft Table instance from an Iceberg table. |
from_paimon | Create a Daft Table from a pypaimon table object. |
from_pydict | Returns a read-only table backed by the given data. |
from_unity | Returns a Daft Table instance from a Unity table. |
overwrite | Overwrites this table with the given DataFrame. |
read | Creates a new DataFrame from this table. |
schema | Returns the table's schema. |
select | Creates a new DataFrame from the table applying the provided expressions. |
show | Shows the first n rows from this table. |
write | Writes the DataFrame to this table. |
Attributes:
| Name | Type | Description |
|---|---|---|
name | str | Returns the table's name. |
append #
append(df: DataFrame, **options: Any) -> None
Appends the DataFrame to this table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | dataframe to append | required |
**options | Any | additional format-dependent write options | {} |
Source code in daft/catalog/__init__.py
1087 1088 1089 1090 1091 1092 1093 1094 | |
from_df #
Returns a read-only table backed by the DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | table name | required |
dataframe | DataFrame | table source dataframe | required |
Returns:
| Name | Type | Description |
|---|---|---|
Table | Table | new table instance |
Examples:
1 2 3 | |
Source code in daft/catalog/__init__.py
912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 | |
from_gravitino #
from_gravitino(table: object) -> Table
Returns a Daft Table instance from a Gravitino table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table | object | gravitino table instance. | required |
Source code in daft/catalog/__init__.py
990 991 992 993 994 995 996 997 998 999 1000 1001 1002 | |
from_iceberg #
from_iceberg(table: object) -> Table
Creates a Daft Table instance from an Iceberg table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table | object | a pyiceberg table | required |
Returns:
| Name | Type | Description |
|---|---|---|
Table | Table | new daft table instance |
Source code in daft/catalog/__init__.py
936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 | |
from_paimon #
from_paimon(table: object) -> Table
Create a Daft Table from a pypaimon table object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table | object | a pypaimon table instance (e.g. from | required |
Returns:
| Name | Type | Description |
|---|---|---|
Table | Table | a new Table instance backed by the pypaimon table. |
Examples:
1 2 3 4 | |
Source code in daft/catalog/__init__.py
967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 | |
from_pydict #
from_pydict(name: str, data: dict[str, InputListType]) -> Table
Returns a read-only table backed by the given data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name | str | table table | required |
data dict[str,object] | keys are column names and the values are python lists, numpy arrays, or arrow arrays. | required |
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame | Table | new read-only table instance |
Examples:
1 2 3 | |
╭───────╮
│ foo │
│ --- │
│ Int64 │
╞═══════╡
│ 1 │
├╌╌╌╌╌╌╌┤
│ 2 │
╰───────╯
(Showing first 2 of 2 rows) Source code in daft/catalog/__init__.py
881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 | |
from_unity #
from_unity(table: object) -> Table
Returns a Daft Table instance from a Unity table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table | object | unity table instance. | required |
Source code in daft/catalog/__init__.py
953 954 955 956 957 958 959 960 961 962 963 964 965 | |
overwrite #
overwrite(df: DataFrame, **options: Any) -> None
Overwrites this table with the given DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | dataframe to overwrite this table with | required |
**options | Any | additional format-dependent write options | {} |
Source code in daft/catalog/__init__.py
1096 1097 1098 1099 1100 1101 1102 1103 | |
read #
read(**options: Any) -> DataFrame
Creates a new DataFrame from this table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**options | Any | additional format-dependent read options | {} |
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame | DataFrame | new DataFrame instance |
Source code in daft/catalog/__init__.py
1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 | |
schema #
schema() -> Schema
Returns the table's schema.
Source code in daft/catalog/__init__.py
877 878 879 | |
select #
select(*columns: ColumnInputType) -> DataFrame
Creates a new DataFrame from the table applying the provided expressions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*columns | Expression | str | columns to select from the current DataFrame | () |
Returns:
| Name | Type | Description |
|---|---|---|
DataFrame | DataFrame | new DataFrame instance with the select columns |
Source code in daft/catalog/__init__.py
1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 | |
show #
show(n: int = 8) -> None
Shows the first n rows from this table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n | int | number of rows to show | 8 |
Returns:
| Type | Description |
|---|---|
None | None |
Source code in daft/catalog/__init__.py
1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 | |
write #
write(df: DataFrame, mode: Literal['append', 'overwrite'] = 'append', **options: Any) -> None
Writes the DataFrame to this table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df | DataFrame | datafram to write | required |
mode | str | write mode such as 'append' or 'overwrite' | 'append' |
**options | Any | additional format-dependent write options | {} |
Source code in daft/catalog/__init__.py
1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 | |