Skip to content

daft.functions.substring_index#

substring_index #

substring_index(expr: Expression, delim: str | Expression, count: int | Expression) -> Expression

Returns the substring from string before count occurrences of the delimiter.

If count is positive, returns everything to the left of the final delimiter (counting from left). If count is negative, returns everything to the right of the final delimiter (counting from right). This is compatible with Spark's substring_index function.

Parameters:

Name Type Description Default
expr Expression

The string expression

required
delim str | Expression

The delimiter string

required
count int | Expression

The number of occurrences of the delimiter

required

Returns:

Name Type Description
Expression Expression

a String expression with the substring result

Examples:

1
2
3
4
5
>>> import daft
>>> from daft.functions import substring_index
>>> df = daft.from_pydict({"x": ["www.apache.org", "a.b.c.d"]})
>>> df = df.select(substring_index(df["x"], ".", 2))
>>> df.show()
╭────────────╮
│ x          │
│ ---        │
│ String     │
╞════════════╡
│ www.apache │
├╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a.b        │
╰────────────╯
(Showing first 2 of 2 rows)
Source code in daft/functions/str.py
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
def substring_index(
    expr: Expression,
    delim: str | Expression,
    count: int | Expression,
) -> Expression:
    """Returns the substring from string before count occurrences of the delimiter.

    If count is positive, returns everything to the left of the final delimiter (counting from left).
    If count is negative, returns everything to the right of the final delimiter (counting from right).
    This is compatible with Spark's substring_index function.

    Args:
        expr: The string expression
        delim: The delimiter string
        count: The number of occurrences of the delimiter

    Returns:
        Expression: a String expression with the substring result

    Examples:
        >>> import daft
        >>> from daft.functions import substring_index
        >>> df = daft.from_pydict({"x": ["www.apache.org", "a.b.c.d"]})
        >>> df = df.select(substring_index(df["x"], ".", 2))
        >>> df.show()
        ╭────────────╮
        │ x          │
        │ ---        │
        │ String     │
        ╞════════════╡
        │ www.apache │
        ├╌╌╌╌╌╌╌╌╌╌╌╌┤
        │ a.b        │
        ╰────────────╯
        <BLANKLINE>
        (Showing first 2 of 2 rows)

    """
    return Expression._call_builtin_scalar_fn("substring_index", expr, delim, count)