Skip to content

daft.functions.json_tuple#

json_tuple #

json_tuple(expr: Expression, *fields: str) -> Expression

Extracts the values for the given top-level keys from a JSON object string.

Spark's json_tuple returns one column per requested key (c0, c1, ...). To fit Daft's single-output expression model, this returns a Struct whose field names are the requested keys, each typed as String. Use .get("key") to pull individual fields out.

Behavior:

  • Non-string scalar values (numbers, booleans) are stringified without surrounding quotes (e.g. "1", "true").
  • Nested objects/arrays are returned as their JSON-encoded string form.
  • Missing keys yield NULL for that field only; the row itself is still valid as long as the input parses as a JSON object.
  • Malformed JSON, non-object roots, and NULL inputs yield a row-level NULL (is_null() returns True); every child field is also NULL.
  • Field names must be unique; passing a duplicate raises an error.

Parameters:

Name Type Description Default
expr Expression

A string expression containing JSON.

required
*fields str

One or more top-level keys to extract.

()

Returns:

Name Type Description
Expression Expression

A Struct expression with one String field per key.

Examples:

1
2
3
4
5
6
>>> import daft
>>> from daft.functions import json_tuple
>>>
>>> df = daft.from_pydict({"col": ['{"a": 1, "b": "x"}', '{"a": 2}', None]})
>>> df = df.with_column("t", json_tuple(df["col"], "a", "b"))
>>> df.select(df["t"].get("a").alias("a"), df["t"].get("b").alias("b")).collect()
╭────────┬────────╮
│ a      ┆ b      │
│ ---    ┆ ---    │
│ String ┆ String │
╞════════╪════════╡
│ 1      ┆ x      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2      ┆ None   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ None   ┆ None   │
╰────────┴────────╯
(Showing first 3 of 3 rows)
Source code in daft/functions/str.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
def json_tuple(expr: Expression, *fields: str) -> Expression:
    """Extracts the values for the given top-level keys from a JSON object string.

    Spark's ``json_tuple`` returns one column per requested key (``c0``,
    ``c1``, ...). To fit Daft's single-output expression model, this returns
    a ``Struct`` whose field names are the requested keys, each typed as
    ``String``. Use ``.get("key")`` to pull individual fields out.

    Behavior:

    * Non-string scalar values (numbers, booleans) are stringified without
      surrounding quotes (e.g. ``"1"``, ``"true"``).
    * Nested objects/arrays are returned as their JSON-encoded string form.
    * Missing keys yield ``NULL`` for that field only; the row itself is
      still valid as long as the input parses as a JSON object.
    * Malformed JSON, non-object roots, and ``NULL`` inputs yield a
      row-level ``NULL`` (``is_null()`` returns ``True``); every child
      field is also ``NULL``.
    * Field names must be unique; passing a duplicate raises an error.

    Args:
        expr: A string expression containing JSON.
        *fields: One or more top-level keys to extract.

    Returns:
        Expression: A ``Struct`` expression with one ``String`` field per key.

    Examples:
        >>> import daft
        >>> from daft.functions import json_tuple
        >>>
        >>> df = daft.from_pydict({"col": ['{"a": 1, "b": "x"}', '{"a": 2}', None]})
        >>> df = df.with_column("t", json_tuple(df["col"], "a", "b"))
        >>> df.select(df["t"].get("a").alias("a"), df["t"].get("b").alias("b")).collect()
        ╭────────┬────────╮
        │ a      ┆ b      │
        │ ---    ┆ ---    │
        │ String ┆ String │
        ╞════════╪════════╡
        │ 1      ┆ x      │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
        │ 2      ┆ None   │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
        │ None   ┆ None   │
        ╰────────┴────────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)
    """
    if not fields:
        raise ValueError("json_tuple requires at least one field name")
    field_lits = [lit(f) for f in fields]
    return Expression._call_builtin_scalar_fn("json_tuple", expr, *field_lits)