daft.functions.jaro_winkler_similarity#
jaro_winkler_similarity #
jaro_winkler_similarity(left: Expression, right: Expression) -> Expression
Compute the Jaro-Winkler similarity between two strings.
This is the Jaro similarity with a prefix bonus for strings sharing a common prefix (up to 4 characters). Returns a value between 0.0 (no similarity) and 1.0 (identical strings).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
left | Expression | The left string expression to compare. | required |
right | Expression | The right string expression to compare against. | required |
Returns:
| Type | Description |
|---|---|
Expression | The Jaro-Winkler similarity (0.0 to 1.0) for each pair of strings. Returns |
Expression | null when either input is null. |
Examples:
1 2 3 4 5 | |
╭────────┬──────────┬────────────────────╮
│ x ┆ y ┆ similarity │
│ --- ┆ --- ┆ --- │
│ String ┆ String ┆ Float64 │
╞════════╪══════════╪════════════════════╡
│ martha ┆ marhta ┆ 0.9611111111111111 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ dwayne ┆ duane ┆ 0.8400000000000001 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ dixon ┆ dicksonx ┆ 0.8133333333333332 │
╰────────┴──────────┴────────────────────╯
(Showing first 3 of 3 rows) Source code in daft/functions/str.py
1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 | |