Reading from and Writing to Unity Catalog#
Unity Catalog is an open-sourced catalog developed by Databricks. Users of Unity Catalog are able to work with data assets such as tables (Parquet, CSV, Iceberg, Delta), volumes (storing raw files), functions and models.
To use Daft with the Unity Catalog, you will need to install Daft with the unity option specified like so:
1 | |
Warning
These APIs are in beta and may be subject to change as the Unity Catalog continues to be developed.
Connecting to the Unity Catalog#
Daft includes an abstraction for the Unity Catalog. For more information, see also Unity Catalog Documentation.
Authentication options#
You can authenticate either with a personal access token (PAT) or with OAuth credentials.
To use OAuth, create a Databricks service principal and generate an OAuth secret for it. See the Databricks docs for the exact steps: Databricks OAuth M2M docs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
Loading a Dataframe from a Delta Lake table in Unity Catalog#
1 2 3 4 | |
Any subsequent filter operations on the Daft df DataFrame object will be correctly optimized to take advantage of DeltaLake features:
1 2 3 | |
See also Delta Lake for more information about how to work with the Delta Lake tables provided by the Unity Catalog.
Downloading files in Unity Catalog volumes#
Daft supports downloading from Unity Catalog volumes using [Expression.download()][daft.expressions.expressions.download]. File paths that start with vol+dbfs:/ or dbfs:/ will be downloaded using the configurations in IOConfig.unity. These configurations can be created using UnityCatalog.to_io_config, or automatically derived from the global session.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
Roadmap#
- Unity Iceberg integration for reading tables using the Iceberg interface instead of the Delta Lake interface
Please make issues on the Daft repository if you have any use-cases that Daft does not currently cover! For the overall Daft development plan, see Daft Roadmap.