Skip to content

daft.functions.monotonically_increasing_id#

monotonically_increasing_id #

monotonically_increasing_id() -> Expression

Generates a column of monotonically increasing unique ids.

The implementation puts the partition number in the upper 28 bits, and the row number in each partition in the lower 36 bits. This allows for 2^28 ≈ 268 million partitions and 2^36 ≈ 68 billion rows per partition.

Returns:

Name Type Description
Expression UInt64 Expression

An expression that generates monotonically increasing IDs

Examples:

1
2
3
4
5
6
7
>>> import daft
>>> from daft.functions import monotonically_increasing_id
>>> daft.set_runner_ray()
>>>
>>> df = daft.from_pydict({"a": [1, 2, 3, 4]}).into_partitions(2)
>>> df = df.with_column("id", monotonically_increasing_id())
>>> df.show()
╭───────┬─────────────╮
│ a     ┆ id          │
│ ---   ┆ ---         │
│ Int64 ┆ UInt64      │
╞═══════╪═════════════╡
│ 1     ┆ 0           │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2     ┆ 1           │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3     ┆ 68719476736 │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4     ┆ 68719476737 │
╰───────┴─────────────╯
(Showing first 4 of 4 rows)
Source code in daft/functions/misc.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def monotonically_increasing_id() -> Expression:
    """Generates a column of monotonically increasing unique ids.

    The implementation puts the partition number in the upper 28 bits, and the row number in each partition
    in the lower 36 bits. This allows for 2^28 ≈ 268 million partitions and 2^36 ≈ 68 billion rows per partition.

    Returns:
        Expression (UInt64 Expression): An expression that generates monotonically increasing IDs

    Examples:
        >>> import daft
        >>> from daft.functions import monotonically_increasing_id
        >>> daft.set_runner_ray()  # doctest: +SKIP
        >>>
        >>> df = daft.from_pydict({"a": [1, 2, 3, 4]}).into_partitions(2)
        >>> df = df.with_column("id", monotonically_increasing_id())
        >>> df.show()  # doctest: +SKIP
        ╭───────┬─────────────╮
        │ a     ┆ id          │
        │ ---   ┆ ---         │
        │ Int64 ┆ UInt64      │
        ╞═══════╪═════════════╡
        │ 1     ┆ 0           │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ 2     ┆ 1           │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ 3     ┆ 68719476736 │
        ├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ 4     ┆ 68719476737 │
        ╰───────┴─────────────╯
        <BLANKLINE>
        (Showing first 4 of 4 rows)

    """
    f = native.get_function_from_registry("monotonically_increasing_id")
    return Expression._from_pyexpr(f())