Extracts the values for the given top-level keys from a JSON object string.
Spark's json_tuple returns one column per requested key (c0, c1, ...). To fit Daft's single-output expression model, this returns a Struct whose field names are the requested keys, each typed as String. Use .get("key") to pull individual fields out.
Behavior:
- Non-string scalar values (numbers, booleans) are stringified without surrounding quotes (e.g.
"1", "true"). - Nested objects/arrays are returned as their JSON-encoded string form.
- Missing keys yield
NULL for that field only; the row itself is still valid as long as the input parses as a JSON object. - Malformed JSON, non-object roots, and
NULL inputs yield a row-level NULL (is_null() returns True); every child field is also NULL. - Field names must be unique; passing a duplicate raises an error.
Parameters:
| Name | Type | Description | Default |
expr | Expression | A string expression containing JSON. | required |
*fields | str | One or more top-level keys to extract. | () |
Returns:
| Name | Type | Description |
Expression | Expression | A Struct expression with one String field per key. |
Examples:
| >>> import daft
>>> from daft.functions import json_tuple
>>>
>>> df = daft.from_pydict({"col": ['{"a": 1, "b": "x"}', '{"a": 2}', None]})
>>> df = df.with_column("t", json_tuple(df["col"], "a", "b"))
>>> df.select(df["t"].get("a").alias("a"), df["t"].get("b").alias("b")).collect()
|
╭────────┬────────╮
│ a ┆ b │
│ --- ┆ --- │
│ String ┆ String │
╞════════╪════════╡
│ 1 ┆ x │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ None │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ None ┆ None │
╰────────┴────────╯
(Showing first 3 of 3 rows)
Source code in daft/functions/str.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226 | def json_tuple(expr: Expression, *fields: str) -> Expression:
"""Extracts the values for the given top-level keys from a JSON object string.
Spark's ``json_tuple`` returns one column per requested key (``c0``,
``c1``, ...). To fit Daft's single-output expression model, this returns
a ``Struct`` whose field names are the requested keys, each typed as
``String``. Use ``.get("key")`` to pull individual fields out.
Behavior:
* Non-string scalar values (numbers, booleans) are stringified without
surrounding quotes (e.g. ``"1"``, ``"true"``).
* Nested objects/arrays are returned as their JSON-encoded string form.
* Missing keys yield ``NULL`` for that field only; the row itself is
still valid as long as the input parses as a JSON object.
* Malformed JSON, non-object roots, and ``NULL`` inputs yield a
row-level ``NULL`` (``is_null()`` returns ``True``); every child
field is also ``NULL``.
* Field names must be unique; passing a duplicate raises an error.
Args:
expr: A string expression containing JSON.
*fields: One or more top-level keys to extract.
Returns:
Expression: A ``Struct`` expression with one ``String`` field per key.
Examples:
>>> import daft
>>> from daft.functions import json_tuple
>>>
>>> df = daft.from_pydict({"col": ['{"a": 1, "b": "x"}', '{"a": 2}', None]})
>>> df = df.with_column("t", json_tuple(df["col"], "a", "b"))
>>> df.select(df["t"].get("a").alias("a"), df["t"].get("b").alias("b")).collect()
╭────────┬────────╮
│ a ┆ b │
│ --- ┆ --- │
│ String ┆ String │
╞════════╪════════╡
│ 1 ┆ x │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ None │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ None ┆ None │
╰────────┴────────╯
<BLANKLINE>
(Showing first 3 of 3 rows)
"""
if not fields:
raise ValueError("json_tuple requires at least one field name")
field_lits = [lit(f) for f in fields]
return Expression._call_builtin_scalar_fn("json_tuple", expr, *field_lits)
|