Working with JSON and Nested Data Daft provides powerful capabilities for working with JSON data and nested data structures. Whether you're processing API responses, log files, or complex hierarchical data, Daft's JSON modality makes it easy to parse, query, and manipulate structured data.
JSON If you have a column of JSON strings, Daft provides the .jq() method to run JQ-style filters on them. For example, to extract a value from a JSON object:
๐ Python โ๏ธ SQL
df = daft . from_pydict ({
"json" : [
'{"a": 1, "b": 2}' ,
'{"a": 3, "b": 4}' ,
],
})
df = df . with_column ( "a" , df [ "json" ] . jq ( ".a" ))
df . collect ()
1
2
3
4
5
6
7
8
9
10
11
12
13 df = daft . from_pydict ({
"json" : [
'{"a": 1, "b": 2}' ,
'{"a": 3, "b": 4}' ,
],
})
df = daft . sql ( """
SELECT
json,
json_query(json, '.a') AS a
FROM df
""" )
df . collect ()
Output โญโโโโโโโโโโโโโโโโโโโฌโโโโโโโฎ
โ json โ a โ
โ --- โ --- โ
โ Utf8 โ Utf8 โ
โโโโโโโโโโโโโโโโโโโโชโโโโโโโก
โ {"a": 1, "b": 2} โ 1 โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโค
โ {"a": 3, "b": 4} โ 3 โ
โฐโโโโโโโโโโโโโโโโโโโดโโโโโโโฏ
(Showing first 2 of 2 rows)
Daft uses jaq as the underlying executor, so you can find the full list of supported filters in the jaq documentation .
Extracting and Flattening Nested Data When working with nested data---like log files, metadata, deserialized JSON---we often need to extract specific fields or flatten the entire structure into individual columns. Daft provides two main approaches for this:
Extracting specific fields : Using the [] operator to access nested fields Flattening all fields : Using .unnest() or the * wildcard to expand all nested fields into separate columns Consider the following example reading from the nebius/SWE-rebench dataset.
Output โญโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ column_name โ type โ
โโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโก
โ meta โ Struct[commit_name: Utf8, failed_lite_validators: List[Utf8], has_test_patch: Boolean, is_lite: Boolean, llm_score: Struct[difficulty_score: Int64, issue_text_score: Int64, test_score: Int64], num_modified_files: Int64] โ
โฐโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
We could extract a specific field from the struct by using the [] operator. For example, to extract the difficulty_score from the llm_score struct:
Output 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 โญโโโโโโโโโโโโโโโโโโโฎ
โ difficulty_score โ
โ --- โ
โ Int64 โ
โโโโโโโโโโโโโโโโโโโโก
โ 2 โ
โโโโโโโโโโโโโโโโโโโโค
โ 1 โ
โโโโโโโโโโโโโโโโโโโโค
โ 2 โ
โโโโโโโโโโโโโโโโโโโโค
โ 2 โ
โโโโโโโโโโโโโโโโโโโโค
โ 0 โ
โโโโโโโโโโโโโโโโโโโโค
โ 0 โ
โโโโโโโโโโโโโโโโโโโโค
โ 1 โ
โโโโโโโโโโโโโโโโโโโโค
โ 0 โ
โฐโโโโโโโโโโโโโโโโโโโฏ
(Showing first 8 rows)
If we want to extract all the nested columns, we can use the .unnest() expression or the wildcard * to access all fields of the meta struct column.
Output 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31 โญโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโฎ
โ commit_name โ failed_lite_validators โ has_test_patch โ is_lite โ llm_score โ num_modified_files โ
โ --- โ --- โ --- โ --- โ --- โ --- โ
โ Utf8 โ List[Utf8] โ Boolean โ Boolean โ Struct[difficulty_score: Int64, issue_text_score: Int64, test_score: Int64] โ Int64 โ
โโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโชโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโโโโโโโโโโโโโโโโโโโโก
โ head_commit โ [has_short_problem_statement,โฆ โ true โ false โ {difficulty_score: 2, โ 5 โ
โ โ โ โ โ issue_tโฆ โ โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโค
โ head_commit โ [has_many_modified_files, hasโฆ โ true โ false โ {difficulty_score: 1, โ 5 โ
โ โ โ โ โ issue_tโฆ โ โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโค
โ head_commit โ [has_removed_files, has_many_โฆ โ true โ false โ {difficulty_score: 2, โ 6 โ
โ โ โ โ โ issue_tโฆ โ โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโค
โ head_commit โ [] โ true โ true โ {difficulty_score: 2, โ 1 โ
โ โ โ โ โ issue_tโฆ โ โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโค
โ head_commit โ [] โ true โ true โ {difficulty_score: 0, โ 1 โ
โ โ โ โ โ issue_tโฆ โ โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโค
โ head_commit โ [] โ true โ true โ {difficulty_score: 0, โ 1 โ
โ โ โ โ โ issue_tโฆ โ โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโค
โ head_commit โ [] โ true โ true โ {difficulty_score: 1, โ 1 โ
โ โ โ โ โ issue_tโฆ โ โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโค
โ head_commit โ [has_hyperlinks, has_issue_reโฆ โ true โ false โ {difficulty_score: 0, โ 3 โ
โ โ โ โ โ issue_tโฆ โ โ
โฐโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโฏ
(Showing first 8 rows)