Skip to content

daft.functions.count_matches#

count_matches #

count_matches(expr: Expression, patterns: Any, *, whole_words: bool = False, case_sensitive: bool = True) -> Expression

Counts the number of times a pattern, or multiple patterns, appear in a string.

If whole_words is true, then matches are only counted if they are whole words. This also applies to multi-word strings. For example, on the string "abc def", the strings "def" and "abc def" would be matched, but "bc de", "abc d", and "abc " (with the space) would not.

If case_sensitive is false, then case will be ignored. This only applies to ASCII characters; unicode uppercase/lowercase will still be considered distinct.

Parameters:

Name Type Description Default
expr Expression

The expression to check.

required
patterns Any

A pattern or a list of patterns.

required
whole_words bool

Whether to only match whole word(s). Defaults to false.

False
case_sensitive bool

Whether the matching should be case sensitive. Defaults to true.

True
Note

If a pattern is a substring of another pattern, the longest pattern is matched first. For example, in the string "hello world", with patterns "hello", "world", and "hello world", one match is counted for "hello world".

Source code in daft/functions/str.py
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
def count_matches(
    expr: Expression,
    patterns: Any,
    *,
    whole_words: bool = False,
    case_sensitive: bool = True,
) -> Expression:
    """Counts the number of times a pattern, or multiple patterns, appear in a string.

    If whole_words is true, then matches are only counted if they are whole words. This
    also applies to multi-word strings. For example, on the string "abc def", the strings
    "def" and "abc def" would be matched, but "bc de", "abc d", and "abc " (with the space)
    would not.

    If case_sensitive is false, then case will be ignored. This only applies to ASCII
    characters; unicode uppercase/lowercase will still be considered distinct.

    Args:
        expr: The expression to check.
        patterns: A pattern or a list of patterns.
        whole_words: Whether to only match whole word(s). Defaults to false.
        case_sensitive: Whether the matching should be case sensitive. Defaults to true.

    Note:
        If a pattern is a substring of another pattern, the longest pattern is matched first.
        For example, in the string "hello world", with patterns "hello", "world", and "hello world",
        one match is counted for "hello world".
    """
    if isinstance(patterns, str):
        patterns = [patterns]
    if not isinstance(patterns, Expression):
        series = item_to_series("items", patterns)
        patterns = Expression._from_pyexpr(list_lit(series._series))

    return Expression._call_builtin_scalar_fn(
        "count_matches", expr, patterns, whole_words=whole_words, case_sensitive=case_sensitive
    )