Skip to content

daft.functions.value_counts#

value_counts #

value_counts(list_expr: Expression) -> Expression

Counts the occurrences of each distinct value in the list.

Parameters:

Name Type Description Default
list_expr List Expression

expression to count the occurrences of each distinct value in.

required

Returns:

Name Type Description
Expression Map Expression

A Map expression where the keys are distinct elements from the original list of type X, and the values are UInt64 counts representing the number of times each element appears in the list.

Note

This function does not work for nested types. For example, it will not produce a map with lists as keys.

Examples:

1
2
3
4
>>> import daft
>>> from daft.functions import value_counts
>>> df = daft.from_pydict({"letters": [["a", "b", "a"], ["b", "c", "b", "c"]]})
>>> df.with_column("value_counts", value_counts(df["letters"])).collect()
╭──────────────┬─────────────────────╮
│ letters      ┆ value_counts        │
│ ---          ┆ ---                 │
│ List[String] ┆ Map[String: UInt64] │
╞══════════════╪═════════════════════╡
│ [a, b, a]    ┆ {"a": 2, "b": 1}    │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [b, c, b, c] ┆ {"b": 2, "c": 2}    │
╰──────────────┴─────────────────────╯
(Showing first 2 of 2 rows)
Source code in daft/functions/list.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def value_counts(list_expr: Expression) -> Expression:
    """Counts the occurrences of each distinct value in the list.

    Args:
        list_expr (List Expression): expression to count the occurrences of each distinct value in.

    Returns:
        Expression (Map Expression):
            A Map<X, UInt64> expression where the keys are distinct elements from the
            original list of type X, and the values are UInt64 counts representing
            the number of times each element appears in the list.

    Note:
        This function does not work for nested types. For example, it will not produce a map
        with lists as keys.

    Examples:
        >>> import daft
        >>> from daft.functions import value_counts
        >>> df = daft.from_pydict({"letters": [["a", "b", "a"], ["b", "c", "b", "c"]]})
        >>> df.with_column("value_counts", value_counts(df["letters"])).collect()
        ╭──────────────┬─────────────────────╮
        │ letters      ┆ value_counts        │
        │ ---          ┆ ---                 │
        │ List[String] ┆ Map[String: UInt64] │
        ╞══════════════╪═════════════════════╡
        │ [a, b, a]    ┆ {"a": 2, "b": 1}    │
        ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ [b, c, b, c] ┆ {"b": 2, "c": 2}    │
        ╰──────────────┴─────────────────────╯
        <BLANKLINE>
        (Showing first 2 of 2 rows)
    """
    return Expression._call_builtin_scalar_fn("list_value_counts", list_expr)