Skip to content

daft.functions.list_distinct#

list_distinct #

list_distinct(list_expr: Expression) -> Expression

Returns a list of unique elements in each list, preserving order of first occurrence and ignoring nulls.

Parameters:

Name Type Description Default
list_expr List Expression

The input list expression

required

Returns:

Name Type Description
Expression List Expression

an expression with lists containing only unique elements

Examples:

1
2
3
4
>>> import daft
>>> from daft.functions import list_distinct
>>> df = daft.from_pydict({"a": [[1, 2, 2, 3], [4, 4, 6, 2], [6, 7, 1], [None, 1, None, 1]]})
>>> df.select(list_distinct(df["a"])).show()
╭─────────────╮
│ a           │
│ ---         │
│ List[Int64] │
╞═════════════╡
│ [1, 2, 3]   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [4, 6, 2]   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [6, 7, 1]   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [1]         │
╰─────────────╯
(Showing first 4 of 4 rows)

Note that null values are ignored:

1
2
>>> df = daft.from_pydict({"a": [[None, None], [1, None, 1], [None]]})
>>> df.select(list_distinct(df["a"])).show()
╭─────────────╮
│ a           │
│ ---         │
│ List[Int64] │
╞═════════════╡
│ []          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [1]         │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ []          │
╰─────────────╯
(Showing first 3 of 3 rows)
Source code in daft/functions/list.py
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
def list_distinct(list_expr: Expression) -> Expression:
    """Returns a list of unique elements in each list, preserving order of first occurrence and ignoring nulls.

    Args:
        list_expr (List Expression): The input list expression

    Returns:
        Expression (List Expression): an expression with lists containing only unique elements

    Examples:
        >>> import daft
        >>> from daft.functions import list_distinct
        >>> df = daft.from_pydict({"a": [[1, 2, 2, 3], [4, 4, 6, 2], [6, 7, 1], [None, 1, None, 1]]})
        >>> df.select(list_distinct(df["a"])).show()
        ╭─────────────╮
        │ a           │
        │ ---         │
        │ List[Int64] │
        ╞═════════════╡
        │ [1, 2, 3]   │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ [4, 6, 2]   │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ [6, 7, 1]   │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ [1]         │
        ╰─────────────╯
        <BLANKLINE>
        (Showing first 4 of 4 rows)

        Note that null values are ignored:

        >>> df = daft.from_pydict({"a": [[None, None], [1, None, 1], [None]]})
        >>> df.select(list_distinct(df["a"])).show()
        ╭─────────────╮
        │ a           │
        │ ---         │
        │ List[Int64] │
        ╞═════════════╡
        │ []          │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ [1]         │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ []          │
        ╰─────────────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)

    """
    return Expression._call_builtin_scalar_fn("list_distinct", list_expr)