Skip to content

daft.functions.split#

split #

split(expr: Expression, split_on: str | Expression) -> Expression

Splits each string on the given string, into a list of strings.

Parameters:

Name Type Description Default
expr Expression

The expression to split.

required
split_on str | Expression

The string on which each string should be split, or a column to pick such patterns from.

required

Returns:

Name Type Description
Expression Expression

A List[String] expression containing the string splits for each string in the column.

Examples:

1
2
3
4
>>> import daft
>>> from daft.functions import split
>>> df = daft.from_pydict({"data": ["daft.distributed.query", "a.b.c", "1.2.3"]})
>>> df.with_column("split", split(df["data"], ".")).collect()
╭────────────────────────┬────────────────────────────╮
│ data                   ┆ split                      │
│ ---                    ┆ ---                        │
│ String                 ┆ List[String]               │
╞════════════════════════╪════════════════════════════╡
│ daft.distributed.query ┆ [daft, distributed, query] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a.b.c                  ┆ [a, b, c]                  │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1.2.3                  ┆ [1, 2, 3]                  │
╰────────────────────────┴────────────────────────────╯
(Showing first 3 of 3 rows)
Source code in daft/functions/str.py
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
def split(expr: Expression, split_on: str | Expression) -> Expression:
    r"""Splits each string on the given string, into a list of strings.

    Args:
        expr: The expression to split.
        split_on: The string on which each string should be split, or a column to pick such patterns from.

    Returns:
        Expression: A List[String] expression containing the string splits for each string in the column.

    Examples:
        >>> import daft
        >>> from daft.functions import split
        >>> df = daft.from_pydict({"data": ["daft.distributed.query", "a.b.c", "1.2.3"]})
        >>> df.with_column("split", split(df["data"], ".")).collect()
        ╭────────────────────────┬────────────────────────────╮
        │ data                   ┆ split                      │
        │ ---                    ┆ ---                        │
        │ String                 ┆ List[String]               │
        ╞════════════════════════╪════════════════════════════╡
        │ daft.distributed.query ┆ [daft, distributed, query] │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ a.b.c                  ┆ [a, b, c]                  │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ 1.2.3                  ┆ [1, 2, 3]                  │
        ╰────────────────────────┴────────────────────────────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)
    """
    return Expression._call_builtin_scalar_fn("split", expr, split_on)