Skip to content

daft.functions.regexp#

regexp #

regexp(expr: Expression, pattern: str | Expression) -> Expression

Check whether each string matches the given regular expression pattern in a string column.

Parameters:

Name Type Description Default
expr Expression

String expression to search in

required
pattern str | Expression

Regex pattern to search for as string or as a column to pick values from

required

Returns:

Name Type Description
Expression Expression

a Boolean expression indicating whether each value matches the provided pattern

Examples:

1
2
3
4
5
>>> import daft
>>> from daft.functions import regexp
>>>
>>> df = daft.from_pydict({"x": ["foo", "bar", "baz"]})
>>> df.with_column("match", regexp(df["x"], "ba.")).collect()
╭────────┬───────╮
│ x      ┆ match │
│ ---    ┆ ---   │
│ String ┆ Bool  │
╞════════╪═══════╡
│ foo    ┆ false │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ bar    ┆ true  │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ baz    ┆ true  │
╰────────┴───────╯
(Showing first 3 of 3 rows)
Source code in daft/functions/str.py
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
def regexp(expr: Expression, pattern: str | Expression) -> Expression:
    """Check whether each string matches the given regular expression pattern in a string column.

    Args:
        expr: String expression to search in
        pattern: Regex pattern to search for as string or as a column to pick values from

    Returns:
        Expression: a Boolean expression indicating whether each value matches the provided pattern

    Examples:
        >>> import daft
        >>> from daft.functions import regexp
        >>>
        >>> df = daft.from_pydict({"x": ["foo", "bar", "baz"]})
        >>> df.with_column("match", regexp(df["x"], "ba.")).collect()
        ╭────────┬───────╮
        │ x      ┆ match │
        │ ---    ┆ ---   │
        │ String ┆ Bool  │
        ╞════════╪═══════╡
        │ foo    ┆ false │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ bar    ┆ true  │
        ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
        │ baz    ┆ true  │
        ╰────────┴───────╯
        <BLANKLINE>
        (Showing first 3 of 3 rows)

    """
    return Expression._call_builtin_scalar_fn("regexp_match", expr, pattern)