Skip to content

Aggregations#

When performing aggregations such as sum, mean and count, Daft enables you to group data by certain keys and aggregate within those keys.

Calling df.groupby() returns a GroupedDataFrame object which is a view of the original DataFrame but with additional context on which keys to group on. You can then call various aggregation methods to run the aggregation within each group, returning a new DataFrame.

GroupedDataFrame #

GroupedDataFrame(df: DataFrame, group_by: ExpressionsProjection)

Methods:

Name Description
agg

Perform aggregations on this GroupedDataFrame. Allows for mixed aggregations.

any_value

Returns an arbitrary value on this GroupedDataFrame.

count

Performs grouped count on this GroupedDataFrame.

count_distinct

Performs grouped count of distinct values on this GroupedDataFrame.

list_agg

Performs grouped list on this GroupedDataFrame.

list_agg_distinct

Performs grouped list distinct on this GroupedDataFrame (ignoring nulls).

map_groups

Apply a user-defined function to each group. The name of the resultant column will default to the name of the first input column.

max

Performs grouped max on this GroupedDataFrame.

mean

Performs grouped mean on this GroupedDataFrame.

min

Perform grouped min on this GroupedDataFrame.

product

Performs grouped product on this GroupedDataFrame.

skew

Performs grouped skew on this GroupedDataFrame.

stddev

Performs grouped standard deviation on this GroupedDataFrame.

string_agg

Performs grouped string concat on this GroupedDataFrame.

sum

Perform grouped sum on this GroupedDataFrame.

var

Performs grouped variance on this GroupedDataFrame.

Attributes:

Name Type Description
df DataFrame
group_by ExpressionsProjection

df #

group_by #

group_by: ExpressionsProjection

agg #

agg(*to_agg: Expression | Iterable[Expression]) -> DataFrame

Perform aggregations on this GroupedDataFrame. Allows for mixed aggregations.

Parameters:

Name Type Description Default
*to_agg Union[Expression, Iterable[Expression]]

aggregation expressions

()

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped aggregations

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
>>> import daft
>>> from daft import col
>>> df = daft.from_pydict(
...     {
...         "pet": ["cat", "dog", "dog", "cat"],
...         "age": [1, 2, 3, 4],
...         "name": ["Alex", "Jordan", "Sam", "Riley"],
...     }
... )
>>> grouped_df = df.groupby("pet").agg(
...     df["age"].min().alias("min_age"),
...     df["age"].max().alias("max_age"),
...     df["pet"].count().alias("count"),
...     df["name"].any_value(),
... )
>>> grouped_df = grouped_df.sort("pet")
>>> grouped_df.show()
╭────────┬─────────┬─────────┬────────┬────────╮
│ pet    ┆ min_age ┆ max_age ┆ count  ┆ name   │
│ ---    ┆ ---     ┆ ---     ┆ ---    ┆ ---    │
│ String ┆ Int64   ┆ Int64   ┆ UInt64 ┆ String │
╞════════╪═════════╪═════════╪════════╪════════╡
│ cat    ┆ 1       ┆ 4       ┆ 2      ┆ Alex   │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ dog    ┆ 2       ┆ 3       ┆ 2      ┆ Jordan │
╰────────┴─────────┴─────────┴────────┴────────╯
(Showing first 2 of 2 rows)
Source code in daft/dataframe/dataframe.py
6057
6058
6059
6060
6061
6062
6063
6064
6065
6066
6067
6068
6069
6070
6071
6072
6073
6074
6075
6076
6077
6078
6079
6080
6081
6082
6083
6084
6085
6086
6087
6088
6089
6090
6091
6092
6093
6094
6095
6096
6097
6098
6099
6100
6101
6102
6103
6104
6105
6106
6107
def agg(self, *to_agg: Expression | Iterable[Expression]) -> DataFrame:
    """Perform aggregations on this GroupedDataFrame. Allows for mixed aggregations.

    Args:
        *to_agg (Union[Expression, Iterable[Expression]]): aggregation expressions

    Returns:
        DataFrame: DataFrame with grouped aggregations

    Examples:
        >>> import daft
        >>> from daft import col
        >>> df = daft.from_pydict(
        ...     {
        ...         "pet": ["cat", "dog", "dog", "cat"],
        ...         "age": [1, 2, 3, 4],
        ...         "name": ["Alex", "Jordan", "Sam", "Riley"],
        ...     }
        ... )
        >>> grouped_df = df.groupby("pet").agg(
        ...     df["age"].min().alias("min_age"),
        ...     df["age"].max().alias("max_age"),
        ...     df["pet"].count().alias("count"),
        ...     df["name"].any_value(),
        ... )
        >>> grouped_df = grouped_df.sort("pet")
        >>> grouped_df.show()
        ╭────────┬─────────┬─────────┬────────┬────────╮
        │ pet    ┆ min_age ┆ max_age ┆ count  ┆ name   │
        │ ---    ┆ ---     ┆ ---     ┆ ---    ┆ ---    │
        │ String ┆ Int64   ┆ Int64   ┆ UInt64 ┆ String │
        ╞════════╪═════════╪═════════╪════════╪════════╡
        │ cat    ┆ 1       ┆ 4       ┆ 2      ┆ Alex   │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
        │ dog    ┆ 2       ┆ 3       ┆ 2      ┆ Jordan │
        ╰────────┴─────────┴─────────┴────────┴────────╯
        <BLANKLINE>
        (Showing first 2 of 2 rows)

    """
    to_agg_list = (
        list(to_agg[0])
        if (len(to_agg) == 1 and not isinstance(to_agg[0], Expression))
        else list(typing.cast("tuple[Expression]", to_agg))
    )

    for expr in to_agg_list:
        if not isinstance(expr, Expression):
            raise ValueError(f"GroupedDataFrame.agg() only accepts expression type, received: {type(expr)}")

    return self.df._agg(to_agg_list, group_by=self.group_by)

any_value #

any_value(*cols: ColumnInputType) -> DataFrame

Returns an arbitrary value on this GroupedDataFrame.

Values for each column are not guaranteed to be from the same row.

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to get

()

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with any values.

Source code in daft/dataframe/dataframe.py
5941
5942
5943
5944
5945
5946
5947
5948
5949
5950
5951
5952
def any_value(self, *cols: ColumnInputType) -> DataFrame:
    """Returns an arbitrary value on this GroupedDataFrame.

    Values for each column are not guaranteed to be from the same row.

    Args:
        *cols (Union[str, Expression]): columns to get

    Returns:
        DataFrame: DataFrame with any values.
    """
    return self.df._apply_agg_fn(Expression.any_value, cols, self.group_by)

count #

count(*cols: ColumnInputType) -> DataFrame

Performs grouped count on this GroupedDataFrame.

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped count per column.

Source code in daft/dataframe/dataframe.py
5954
5955
5956
5957
5958
5959
5960
def count(self, *cols: ColumnInputType) -> DataFrame:
    """Performs grouped count on this GroupedDataFrame.

    Returns:
        DataFrame: DataFrame with grouped count per column.
    """
    return self.df._apply_agg_fn(Expression.count, cols, self.group_by)

count_distinct #

count_distinct(*cols: ColumnInputType) -> DataFrame

Performs grouped count of distinct values on this GroupedDataFrame.

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to count distinct values

()

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped count of distinct values per column.

Examples:

1
2
3
4
5
>>> import daft
>>> df = daft.from_pydict({"keys": ["a", "a", "a", "b", "b", "b"], "vals": [1, 1, 2, 3, 3, 3]})
>>> df = df.groupby("keys").count_distinct("vals")
>>> df = df.sort("keys")
>>> df.show()
╭────────┬────────╮
│ keys   ┆ vals   │
│ ---    ┆ ---    │
│ String ┆ UInt64 │
╞════════╪════════╡
│ a      ┆ 2      │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ b      ┆ 1      │
╰────────┴────────╯
(Showing first 2 of 2 rows)
Source code in daft/dataframe/dataframe.py
6000
6001
6002
6003
6004
6005
6006
6007
6008
6009
6010
6011
6012
6013
6014
6015
6016
6017
6018
6019
6020
6021
6022
6023
6024
6025
6026
6027
6028
def count_distinct(self, *cols: ColumnInputType) -> DataFrame:
    """Performs grouped count of distinct values on this GroupedDataFrame.

    Args:
        *cols (Union[str, Expression]): columns to count distinct values

    Returns:
        DataFrame: DataFrame with grouped count of distinct values per column.

    Examples:
        >>> import daft
        >>> df = daft.from_pydict({"keys": ["a", "a", "a", "b", "b", "b"], "vals": [1, 1, 2, 3, 3, 3]})
        >>> df = df.groupby("keys").count_distinct("vals")
        >>> df = df.sort("keys")
        >>> df.show()
        ╭────────┬────────╮
        │ keys   ┆ vals   │
        │ ---    ┆ ---    │
        │ String ┆ UInt64 │
        ╞════════╪════════╡
        │ a      ┆ 2      │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
        │ b      ┆ 1      │
        ╰────────┴────────╯
        <BLANKLINE>
        (Showing first 2 of 2 rows)

    """
    return self.df._apply_agg_fn(Expression.count_distinct, cols, self.group_by)

list_agg #

list_agg(*cols: ColumnInputType) -> DataFrame

Performs grouped list on this GroupedDataFrame.

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped list per column.

Source code in daft/dataframe/dataframe.py
6030
6031
6032
6033
6034
6035
6036
def list_agg(self, *cols: ColumnInputType) -> DataFrame:
    """Performs grouped list on this GroupedDataFrame.

    Returns:
        DataFrame: DataFrame with grouped list per column.
    """
    return self.df._apply_agg_fn(Expression.list_agg, cols, self.group_by)

list_agg_distinct #

list_agg_distinct(*cols: ColumnInputType) -> DataFrame

Performs grouped list distinct on this GroupedDataFrame (ignoring nulls).

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to form into a set

()

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped list distinct per column.

Source code in daft/dataframe/dataframe.py
6038
6039
6040
6041
6042
6043
6044
6045
6046
6047
def list_agg_distinct(self, *cols: ColumnInputType) -> DataFrame:
    """Performs grouped list distinct on this GroupedDataFrame (ignoring nulls).

    Args:
        *cols (Union[str, Expression]): columns to form into a set

    Returns:
        DataFrame: DataFrame with grouped list distinct per column.
    """
    return self.df._apply_agg_fn(Expression.list_agg_distinct, cols, self.group_by)

map_groups #

map_groups(udf: Expression) -> DataFrame

Apply a user-defined function to each group. The name of the resultant column will default to the name of the first input column.

Parameters:

Name Type Description Default
udf Expression

User-defined function to apply to each group.

required

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped aggregations

Examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
>>> import daft, statistics
>>>
>>> df = daft.from_pydict({"group": ["a", "a", "a", "b", "b", "b"], "data": [1, 20, 30, 4, 50, 600]})
>>>
>>> @daft.udf(return_dtype=daft.DataType.float64())
... def std_dev(data):
...     return [statistics.stdev(data)]
>>>
>>> df = df.groupby("group").map_groups(std_dev(df["data"]))
>>> df = df.sort("group")
>>> df.show()
╭────────┬────────────────────╮
│ group  ┆ data               │
│ ---    ┆ ---                │
│ String ┆ Float64            │
╞════════╪════════════════════╡
│ a      ┆ 14.730919862656235 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b      ┆ 331.62026476076517 │
╰────────┴────────────────────╯
(Showing first 2 of 2 rows)
Source code in daft/dataframe/dataframe.py
6109
6110
6111
6112
6113
6114
6115
6116
6117
6118
6119
6120
6121
6122
6123
6124
6125
6126
6127
6128
6129
6130
6131
6132
6133
6134
6135
6136
6137
6138
6139
6140
6141
6142
6143
def map_groups(self, udf: Expression) -> DataFrame:
    """Apply a user-defined function to each group. The name of the resultant column will default to the name of the first input column.

    Args:
        udf (Expression): User-defined function to apply to each group.

    Returns:
        DataFrame: DataFrame with grouped aggregations

    Examples:
        >>> import daft, statistics
        >>>
        >>> df = daft.from_pydict({"group": ["a", "a", "a", "b", "b", "b"], "data": [1, 20, 30, 4, 50, 600]})
        >>>
        >>> @daft.udf(return_dtype=daft.DataType.float64())
        ... def std_dev(data):
        ...     return [statistics.stdev(data)]
        >>>
        >>> df = df.groupby("group").map_groups(std_dev(df["data"]))
        >>> df = df.sort("group")
        >>> df.show()
        ╭────────┬────────────────────╮
        │ group  ┆ data               │
        │ ---    ┆ ---                │
        │ String ┆ Float64            │
        ╞════════╪════════════════════╡
        │ a      ┆ 14.730919862656235 │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ b      ┆ 331.62026476076517 │
        ╰────────┴────────────────────╯
        <BLANKLINE>
        (Showing first 2 of 2 rows)

    """
    return self.df._map_groups(udf, group_by=self.group_by)

max #

max(*cols: ColumnInputType) -> DataFrame

Performs grouped max on this GroupedDataFrame.

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to max

()

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped max.

Source code in daft/dataframe/dataframe.py
5930
5931
5932
5933
5934
5935
5936
5937
5938
5939
def max(self, *cols: ColumnInputType) -> DataFrame:
    """Performs grouped max on this GroupedDataFrame.

    Args:
        *cols (Union[str, Expression]): columns to max

    Returns:
        DataFrame: DataFrame with grouped max.
    """
    return self.df._apply_agg_fn(Expression.max, cols, self.group_by)

mean #

mean(*cols: ColumnInputType) -> DataFrame

Performs grouped mean on this GroupedDataFrame.

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to mean

()

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped mean.

Source code in daft/dataframe/dataframe.py
5844
5845
5846
5847
5848
5849
5850
5851
5852
5853
def mean(self, *cols: ColumnInputType) -> DataFrame:
    """Performs grouped mean on this GroupedDataFrame.

    Args:
        *cols (Union[str, Expression]): columns to mean

    Returns:
        DataFrame: DataFrame with grouped mean.
    """
    return self.df._apply_agg_fn(Expression.mean, cols, self.group_by)

min #

min(*cols: ColumnInputType) -> DataFrame

Perform grouped min on this GroupedDataFrame.

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to min

()

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped min.

Source code in daft/dataframe/dataframe.py
5919
5920
5921
5922
5923
5924
5925
5926
5927
5928
def min(self, *cols: ColumnInputType) -> DataFrame:
    """Perform grouped min on this GroupedDataFrame.

    Args:
        *cols (Union[str, Expression]): columns to min

    Returns:
        DataFrame: DataFrame with grouped min.
    """
    return self.df._apply_agg_fn(Expression.min, cols, self.group_by)

product #

product(*cols: ColumnInputType) -> DataFrame

Performs grouped product on this GroupedDataFrame.

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to product

()

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped products.

Examples:

1
2
3
4
5
>>> import daft
>>> df = daft.from_pydict({"keys": ["a", "a", "a", "b"], "col_a": [1, 2, 3, 100]})
>>> df = df.groupby("keys").product()
>>> df = df.sort("keys")
>>> df.show()
╭────────┬───────╮
│ keys   ┆ col_a │
│ ---    ┆ ---   │
│ String ┆ Int64 │
╞════════╪═══════╡
│ a      ┆ 6     │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ b      ┆ 100   │
╰────────┴───────╯
(Showing first 2 of 2 rows)
Source code in daft/dataframe/dataframe.py
5970
5971
5972
5973
5974
5975
5976
5977
5978
5979
5980
5981
5982
5983
5984
5985
5986
5987
5988
5989
5990
5991
5992
5993
5994
5995
5996
5997
5998
def product(self, *cols: ColumnInputType) -> DataFrame:
    """Performs grouped product on this GroupedDataFrame.

    Args:
        *cols (Union[str, Expression]): columns to product

    Returns:
        DataFrame: DataFrame with grouped products.

    Examples:
        >>> import daft
        >>> df = daft.from_pydict({"keys": ["a", "a", "a", "b"], "col_a": [1, 2, 3, 100]})
        >>> df = df.groupby("keys").product()
        >>> df = df.sort("keys")
        >>> df.show()
        ╭────────┬───────╮
        │ keys   ┆ col_a │
        │ ---    ┆ ---   │
        │ String ┆ Int64 │
        ╞════════╪═══════╡
        │ a      ┆ 6     │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ b      ┆ 100   │
        ╰────────┴───────╯
        <BLANKLINE>
        (Showing first 2 of 2 rows)

    """
    return self.df._apply_agg_fn(Expression.product, cols, self.group_by)

skew #

skew(*cols: ColumnInputType) -> DataFrame

Performs grouped skew on this GroupedDataFrame.

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with the grouped skew per column.

Source code in daft/dataframe/dataframe.py
5962
5963
5964
5965
5966
5967
5968
def skew(self, *cols: ColumnInputType) -> DataFrame:
    """Performs grouped skew on this GroupedDataFrame.

    Returns:
        DataFrame: DataFrame with the grouped skew per column.
    """
    return self.df._apply_agg_fn(Expression.skew, cols, self.group_by)

stddev #

stddev(*cols: ColumnInputType, ddof: int = 1) -> DataFrame

Performs grouped standard deviation on this GroupedDataFrame.

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to stddev

()
ddof int

Delta degrees of freedom used in the denominator N - ddof. Defaults to 1 (sample standard deviation).

1

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped standard deviation.

Examples:

1
2
3
4
5
>>> import daft
>>> df = daft.from_pydict({"keys": ["a", "a", "a", "b"], "col_a": [0, 1, 2, 100]})
>>> df = df.groupby("keys").stddev()
>>> df = df.sort("keys")
>>> df.show()
╭────────┬─────────╮
│ keys   ┆ col_a   │
│ ---    ┆ ---     │
│ String ┆ Float64 │
╞════════╪═════════╡
│ a      ┆ 1       │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ b      ┆ None    │
╰────────┴─────────╯
(Showing first 2 of 2 rows)
Source code in daft/dataframe/dataframe.py
5855
5856
5857
5858
5859
5860
5861
5862
5863
5864
5865
5866
5867
5868
5869
5870
5871
5872
5873
5874
5875
5876
5877
5878
5879
5880
5881
5882
5883
5884
5885
def stddev(self, *cols: ColumnInputType, ddof: int = 1) -> DataFrame:
    """Performs grouped standard deviation on this GroupedDataFrame.

    Args:
        *cols (Union[str, Expression]): columns to stddev
        ddof (int): Delta degrees of freedom used in the denominator `N - ddof`.
            Defaults to 1 (sample standard deviation).

    Returns:
        DataFrame: DataFrame with grouped standard deviation.

    Examples:
        >>> import daft
        >>> df = daft.from_pydict({"keys": ["a", "a", "a", "b"], "col_a": [0, 1, 2, 100]})
        >>> df = df.groupby("keys").stddev()
        >>> df = df.sort("keys")
        >>> df.show()
        ╭────────┬─────────╮
        │ keys   ┆ col_a   │
        │ ---    ┆ ---     │
        │ String ┆ Float64 │
        ╞════════╪═════════╡
        │ a      ┆ 1       │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
        │ b      ┆ None    │
        ╰────────┴─────────╯
        <BLANKLINE>
        (Showing first 2 of 2 rows)

    """
    return self.df._apply_agg_fn(lambda expr: Expression.stddev(expr, ddof), cols, self.group_by)

string_agg #

string_agg(*cols: ColumnInputType, delimiter: str | None = None) -> DataFrame

Performs grouped string concat on this GroupedDataFrame.

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped string concatenated per column.

Source code in daft/dataframe/dataframe.py
6049
6050
6051
6052
6053
6054
6055
def string_agg(self, *cols: ColumnInputType, delimiter: str | None = None) -> DataFrame:
    """Performs grouped string concat on this GroupedDataFrame.

    Returns:
        DataFrame: DataFrame with grouped string concatenated per column.
    """
    return self.df._apply_agg_fn(lambda expr: Expression.string_agg(expr, delimiter=delimiter), cols, self.group_by)

sum #

sum(*cols: ColumnInputType) -> DataFrame

Perform grouped sum on this GroupedDataFrame.

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to sum

()

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped sums.

Source code in daft/dataframe/dataframe.py
5833
5834
5835
5836
5837
5838
5839
5840
5841
5842
def sum(self, *cols: ColumnInputType) -> DataFrame:
    """Perform grouped sum on this GroupedDataFrame.

    Args:
        *cols (Union[str, Expression]): columns to sum

    Returns:
        DataFrame: DataFrame with grouped sums.
    """
    return self.df._apply_agg_fn(Expression.sum, cols, self.group_by)

var #

var(*cols: ColumnInputType, ddof: int = 1) -> DataFrame

Performs grouped variance on this GroupedDataFrame.

Parameters:

Name Type Description Default
*cols Union[str, Expression]

columns to compute variance for

()
ddof int

Delta degrees of freedom used in the denominator N - ddof. Defaults to 1 (sample variance).

1

Returns:

Name Type Description
DataFrame DataFrame

DataFrame with grouped variance.

Examples:

1
2
3
4
5
>>> import daft
>>> df = daft.from_pydict({"keys": ["a", "a", "a", "b"], "col_a": [0, 1, 2, 100]})
>>> df = df.groupby("keys").var()
>>> df = df.sort("keys")
>>> df.show()
╭────────┬─────────╮
│ keys   ┆ col_a   │
│ ---    ┆ ---     │
│ String ┆ Float64 │
╞════════╪═════════╡
│ a      ┆ 1       │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ b      ┆ None    │
╰────────┴─────────╯
(Showing first 2 of 2 rows)
Source code in daft/dataframe/dataframe.py
5887
5888
5889
5890
5891
5892
5893
5894
5895
5896
5897
5898
5899
5900
5901
5902
5903
5904
5905
5906
5907
5908
5909
5910
5911
5912
5913
5914
5915
5916
5917
def var(self, *cols: ColumnInputType, ddof: int = 1) -> DataFrame:
    """Performs grouped variance on this GroupedDataFrame.

    Args:
        *cols (Union[str, Expression]): columns to compute variance for
        ddof (int): Delta degrees of freedom used in the denominator `N - ddof`.
            Defaults to 1 (sample variance).

    Returns:
        DataFrame: DataFrame with grouped variance.

    Examples:
        >>> import daft
        >>> df = daft.from_pydict({"keys": ["a", "a", "a", "b"], "col_a": [0, 1, 2, 100]})
        >>> df = df.groupby("keys").var()
        >>> df = df.sort("keys")
        >>> df.show()
        ╭────────┬─────────╮
        │ keys   ┆ col_a   │
        │ ---    ┆ ---     │
        │ String ┆ Float64 │
        ╞════════╪═════════╡
        │ a      ┆ 1       │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
        │ b      ┆ None    │
        ╰────────┴─────────╯
        <BLANKLINE>
        (Showing first 2 of 2 rows)

    """
    return self.df._apply_agg_fn(lambda expr: Expression.var(expr, ddof), cols, self.group_by)