Development Guide#
💡 Already set up?
See our quick tutorial on how to add a new expression to Daft.
Development Environment#
To set up your development environment:
- Install uv. You can run
curl -LsSf https://astral.sh/uv/install.sh | shon macOS and Linux. - Install the Rust compilation toolchain
- Install Node.js (22.x LTS) and npm in order to build docs and the daft-dashboard functionality.
- Install cmake. If you use homebrew, you can run
brew install cmake. - Install protoc. You will need this for release builds --
make build-release. With homebrew, installation isbrew install protobuf. - Clone the Daft repo:
git clone git@github.com:Eventual-Inc/Daft.git - Run
make .venvfrom your newly cloned Daft repository to create a new virtual environment with all of Daft's development dependencies installed - Run
make hooksto install pre-commit hooks: these will run tooling on every commit to ensure that your code meets Daft development standards
Developing#
make build: recompile your code after modifying any Rust code insrc/DAFT_RUNNER=native make test: run tests, you can set additional run parameters throughEXTRA_ARGSDAFT_RUNNER=ray make test: set the runner to the Ray runner and run testsmake docs: build docsmake docs-serve: build docs in development servermake format: format all Python and Rust codemake lint: lint all Python and Rust codemake check-format: check that all Python and Rust code is formatted, aliasmake format-checkmake precommit: run all pre-commit hooks, must install pre-commit first(pip install pre-commit)make build-release: perform a full release build of Daftmake build-whl: recompile your code after modifying any Rust code insrc/for development, only generatewhlfile without installationmake clean: clean all build artifacts, including the python virtual environment. You can skip cleaning the virtual environment by settingSKIP_VENV=true
Note about Developing daft-dashboard#
Note
If you just want to use the Daft Dashboard (not modify it), see the Daft Dashboard user guide. The instructions below are only needed for contributing to the dashboard's frontend code.
If you wish to enable, or work on the daft-dashboard functionality, it requires Node.js (LTS) and npm. Install Node.js, then run npm install and npm run build in the src/daft-dashboard/frontend directory.
Next (make sure Daft is installed), you can launch the dashboard using the daft dashboard command, for example:
1 2 | |
Before executing a specific Daft job, enable reporting query execution data to the dashboard by setting the DAFT_DASHBOARD_URL environment variable, for example:
1 | |
Next, you can access and view the dashboard through a web browser, for example, via address http://127.0.0.1:3238.
Developing with Ray#
Running a development version of Daft on a local Ray cluster is as simple as including daft.set_runner_ray() in your Python script and then building and executing it as usual.
To use a remote Ray cluster, run the following steps on the same operating system version as your Ray nodes, in order to ensure that your binaries are executable on Ray.
mkdir wd: this is the working directory, it will hold all the files to be submitted to Ray for a jobln -s daft wd/daft: create a symbolic link from the Python module to the working directorymake build-release: an optimized build to ensure that the module is small enough to be successfully uploaded to Ray. Run this after modifying any Rust code insrc/ray job submit --working-dir wd --address "http://<head_node_host>:8265" -- python script.py: submitwd/script.pyto be run on Ray
Debugging#
The debugging feature uses a special VSCode launch configuration to start the Python debugger with a script at tools/attach_debugger.py, which takes the target script's name as input. This script finds the process ID, updates the launch.json file, compiles the target script, and runs it. It then attaches a Rust debugger to the Python debugger, allowing both to work together. Breakpoints in Python code hit the Python debugger, while breakpoints in Rust code hit the Rust debugger.
Preparation#
-
CodeLLDB Extension for Visual Studio Code: This extension is useful for debugging Rust code invoked from Python.
-
Setting Up the Virtual Environment Interpreter (Ctrl+Shift+P -> Python: Select Interpreter -> .venv)
-
Debug Settings in launch.json This file is usually found in the
.vscodefolder of your project root. See the official VSCode documentation for more information about the launch.json file.
launch.json1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
{ "configurations": [ { "name": "Debug Rust/Python", "type": "debugpy", "request": "launch", "program": "${workspaceFolder}/tools/attach_debugger.py", "args": [ "${file}" ], "console": "internalConsole", "serverReadyAction": { "pattern": "pID = ([0-9]+)", "action": "startDebugging", "name": "Rust LLDB" } }, { "name": "Rust LLDB", "pid": "0", "type": "lldb", "request": "attach", "program": "${command:python.interpreterPath}", "stopOnEntry": false, "sourceLanguages": [ "rust" ], "presentation": { "hidden": true } } ] }
Running the debugger#
-
Create a Python script containing Daft code. Ensure that your virtual environment is set up correctly.
-
Set breakpoints in any
.rsor.pyfile. -
In the
Run and Debugpanel on the left, selectDebug Rust/Pythonfrom the drop-down menu on top and click theStart Debuggingbutton. This will start a debugging session using the file that is currently opened in the VSCode editor.
At this point, your debugger should stop on breakpoints in any .rs file located within the codebase.
Note: On some systems, the LLDB debugger will not attach unless ptrace protection is disabled. To disable, run the following command:
1echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
Testing#
We run test suites across Python and Rust. Python tests focus on high-level DataFrame and Expression functionality, while Rust tests validate individual kernel implementations at a lower level.
Python tests#
Our python tests are located in the tests directory, you can run all the tests at once with make test.
To run specific tests, set the runner for the tests in the environment and then run the tests directly using pytest or make test EXTRA_ARGS="..." as follows:
1 2 3 4 5 | |
To enable debug logs from tests, set the --log-cli-level option, as well as disable capturing.
1 2 3 4 5 | |
Rust tests#
Our rust tests are distributed across crates, you can run all tests with cargo test --no-default-features --workspace.
To run rust tests that call into Python, the --features python flag and libpython3.*.so dynamic libraries are required. Please ensure that these are installed, here's a table of common locations on different os:
| Operating System | Package Manager | Architecture | Library Path Pattern |
|---|---|---|---|
| Ubuntu/Debian | apt | x86_64 | /usr/lib/x86_64-linux-gnu/libpython3.x.so.1.0 |
| Other | /usr/lib/libpython3.x.so.1.0 | ||
| Red Hat/CentOS | yum/dnf | x86_64 | /usr/lib64/libpython3.x.so.1.0 |
| macOS (Homebrew) | Homebrew | Intel | /usr/local/opt/python@3.x/lib/libpython3.x.dylib |
| Apple Silicon | /opt/homebrew/opt/python@3.x/lib/libpython3.x.dylib | ||
| macOS (System) | Installer | All | /Library/Frameworks/Python.framework/Versions/3.x/lib/libpython3.x.dylib |
Tip: you can run the following python command to get the full path to the pylib
1 | |
Set environment variables to locate the Python library:
1 2 3 | |
Execute the test after configuration:
1 | |
Benchmarking#
Benchmark tests are located in tests/benchmarks. If you would like to run benchmarks, make sure to first do make build-release instead of make build in order to compile an optimized build of Daft.
pytest tests/benchmarks/[test_file.py] -m benchmark: Run all benchmarks in a filepytest tests/benchmarks/[test_file.py] -k [test_name] -m benchmark: Run a specific benchmark in a file
More information about writing and using benchmarks can be found on the pytest-benchmark docs.
Adding new expressions#
Since new expressions are a very common feature request, we wanted to make it easy for new contributors to add these. Adding a new expression requires implementation in Rust and exposing it to Python.
Step 1: Implement the function in Rust#
Add your function to the appropriate crate (daft-functions-json, daft-functions-utf8, etc.). For more advanced use cases, see existing implementations in daft-functions-utf8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
Step 2: Register the function#
Okay, now that we have the actual function implementation available, we're not quite done yet. We also need to register this to our FUNCTION_REGISTRY which is a global registry of all expressions/functions.
Whatever crate/module you are in, there should be a daft_dsl::functions::FunctionModule implementation that registers all of the functions. So all you need to do is add your new struct into there.
for the utf8 functions, it's defined here src/daft-functions-utf8/src/lib.rs
1 2 3 4 5 6 | |
Step 3: Add python bindings#
Create expression method in daft/expressions/expressions.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
For functions with additional arguments, you will need to convert those all to expressions before calling the function_registry function.
1 2 3 4 5 6 | |
Add Series method in daft/series.py:
For series, It just delegates out to the expression implementation, so we can just call the helper method _eval_expressions
1 2 | |
and for functions with additional arguments:
1 2 3 | |
Docstring Template#
We follow Google style python docstrings.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
Step 4: Write tests#
For testing, you can add a new file, or update an existing one in tests/expressions/
We have a fixture test_expression that will do most of the heavy lifting and ensure that the apis are consistent across expr, series, and sql.
here's an example of testing the extract function using the test_expression fixture
1 2 3 4 5 6 7 8 9 10 11 12 | |
Pull Requests#
Best practices for pull requests#
For the best chance of having your pull request accepted, please follow these guidelines:
-
Include unit tests for all changes and new features. Pull requests without tests will not be merged.
-
Keep changes focused. Aim to solve one problem per pull request and avoid unrelated changes.
-
Review before submitting. Whenever possible, ask another contributor to review your code first or perform a thorough self-review. Ask yourself: Is it clear why these changes are being made? Are they easy to understand?
-
Use Conventional Commit messages for pull request titles. For example:
- New feature:
feat: adding API - Bug fix:
fix: issue with API -
Documentation:
docs: adding API documentation -
Test error cases. Ensure your tests cover failure scenarios and provide clear, user-friendly error messages.
Review process#
- Draft vs. Open status
- Leave your pull request in Draft status if it is still a work in progress.
-
Mark it as Open once it is ready for review.
-
Checks on GitHub
-
Ensure all automated checks have passed. PRs with failing checks will not be prioritized for review.
-
Reviewer assignment
- Contributors with write access: add a reviewer to the Assignee field.
-
Other contributors: reviewers will be assigned to your PR shortly.
-
During review
- Assignees will review your PR and provide feedback if changes are needed.
- Address review comments promptly to keep momentum. PRs without author engagement for more than a week may lose priority.
-
Iterate until assignees approve your PR.
-
Merging
- Once the build is passing and approvals are in place, committers will merge the PR.