Explode a list expression.
A row is created for each item in the lists, and the other non-exploded output columns are broadcasted to match.
If exploding multiple columns at once, all list lengths must match.
Note
Since this changes the cardinality of the dataframe, We only allow a single explode per projection (select, with_columns) If you need to do multiple explodes, each one must be done separately.
Parameters:
| Name | Type | Description | Default |
list_expr | List Expression | | required |
ignore_empty_and_null | bool | If True, drops rows where the list is empty or null. If False (default), empty lists and null values each produce a single row with a null value. | False |
Returns:
| Name | Type | Description |
Expression | Expression | Expression representing the exploded list. |
See also
DataFrame.explode
Examples:
Explode one column, broadcast the rest:
| >>> import daft
>>> from daft.functions import explode
>>>
>>> df = daft.from_pydict({"id": [1, 2, 3], "sentence": ["lorem ipsum", "foo bar baz", "hi"]})
>>>
>>> df.with_column("word", explode(df["sentence"].split(" "))).show()
|
╭───────┬─────────────┬────────╮
│ id ┆ sentence ┆ word │
│ --- ┆ --- ┆ --- │
│ Int64 ┆ String ┆ String │
╞═══════╪═════════════╪════════╡
│ 1 ┆ lorem ipsum ┆ lorem │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ lorem ipsum ┆ ipsum │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ foo │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ bar │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ baz │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ hi ┆ hi │
╰───────┴─────────────┴────────╯
(Showing first 6 of 6 rows)
Explode multiple columns with the same lengths:
| >>> df.select(
... explode(df["sentence"].split(" ")).alias("word"),
... explode(df["sentence"].capitalize().split(" ")).alias("capitalized_word"),
... ).show()
>>>
|
╭────────┬──────────────────╮
│ word ┆ capitalized_word │
│ --- ┆ --- │
│ String ┆ String │
╞════════╪══════════════════╡
│ lorem ┆ Lorem │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ipsum ┆ ipsum │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ foo ┆ Foo │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ bar ┆ bar │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ baz ┆ baz │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ hi ┆ Hi │
╰────────┴──────────────────╯
(Showing first 6 of 6 rows)
This will error because exploded lengths are different:
| >>> # df.select(
>>> # df["sentence"]
>>> # .split(" ")
>>> # .explode()
>>> # .alias("word"),
>>> # df["sentence"]
>>> # .split("a")
>>> # .explode()
>>> # .alias("split_on_a")
>>> # ).show()
|
Source code in daft/functions/list.py
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475 | def explode(list_expr: Expression, ignore_empty_and_null: bool = False) -> Expression:
"""Explode a list expression.
A row is created for each item in the lists, and the other non-exploded output columns are broadcasted to match.
If exploding multiple columns at once, all list lengths must match.
Note:
Since this changes the cardinality of the dataframe, We only allow a single explode per projection (`select`, `with_columns`)
If you need to do multiple explodes, each one must be done separately.
Args:
list_expr (List Expression): expression to explode.
ignore_empty_and_null: If True, drops rows where the list is empty or null.
If False (default), empty lists and null values each produce a single row with a null value.
Returns:
Expression: Expression representing the exploded list.
Tip: See also
[`DataFrame.explode`](https://docs.daft.ai/en/stable/api/dataframe/#daft.DataFrame.explode)
Examples:
Explode one column, broadcast the rest:
>>> import daft
>>> from daft.functions import explode
>>>
>>> df = daft.from_pydict({"id": [1, 2, 3], "sentence": ["lorem ipsum", "foo bar baz", "hi"]})
>>>
>>> df.with_column("word", explode(df["sentence"].split(" "))).show()
╭───────┬─────────────┬────────╮
│ id ┆ sentence ┆ word │
│ --- ┆ --- ┆ --- │
│ Int64 ┆ String ┆ String │
╞═══════╪═════════════╪════════╡
│ 1 ┆ lorem ipsum ┆ lorem │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 1 ┆ lorem ipsum ┆ ipsum │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ foo │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ bar │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ foo bar baz ┆ baz │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ hi ┆ hi │
╰───────┴─────────────┴────────╯
<BLANKLINE>
(Showing first 6 of 6 rows)
Explode multiple columns with the same lengths:
>>> df.select(
... explode(df["sentence"].split(" ")).alias("word"),
... explode(df["sentence"].capitalize().split(" ")).alias("capitalized_word"),
... ).show()
╭────────┬──────────────────╮
│ word ┆ capitalized_word │
│ --- ┆ --- │
│ String ┆ String │
╞════════╪══════════════════╡
│ lorem ┆ Lorem │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ipsum ┆ ipsum │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ foo ┆ Foo │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ bar ┆ bar │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ baz ┆ baz │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ hi ┆ Hi │
╰────────┴──────────────────╯
<BLANKLINE>
(Showing first 6 of 6 rows)
>>>
This will error because exploded lengths are different:
>>> # df.select(
>>> # df["sentence"]
>>> # .split(" ")
>>> # .explode()
>>> # .alias("word"),
>>> # df["sentence"]
>>> # .split("a")
>>> # .explode()
>>> # .alias("split_on_a")
>>> # ).show()
"""
from daft.expressions import lit
return Expression._call_builtin_scalar_fn("explode", list_expr, lit(ignore_empty_and_null))
|