daft.functions.damerau_levenshtein_distance#
damerau_levenshtein_distance #
damerau_levenshtein_distance(left: Expression, right: Expression) -> Expression
Compute the Damerau-Levenshtein distance between two strings.
This extends the Levenshtein distance by also counting transpositions of two adjacent characters as a single edit operation (in addition to insertions, deletions, and substitutions).
Note
This computes the Optimal String Alignment (OSA) variant, which does not allow a substring to be edited more than once. Results may differ from the true Damerau-Levenshtein distance for inputs with overlapping transpositions (e.g., "CA" to "ABC" is 3 under OSA but 2 under true Damerau-Levenshtein). OSA does not satisfy the triangle inequality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
left | Expression | The left string expression to compare. | required |
right | Expression | The right string expression to compare against. | required |
Returns:
| Type | Description |
|---|---|
Expression | The Damerau-Levenshtein (OSA) distance for each pair of strings. Returns null |
Expression | when either input is null. |
Examples:
1 2 3 4 5 | |
╭────────┬────────┬──────────╮
│ x ┆ y ┆ distance │
│ --- ┆ --- ┆ --- │
│ String ┆ String ┆ Int64 │
╞════════╪════════╪══════════╡
│ abc ┆ bac ┆ 1 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ abc ┆ acb ┆ 1 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ ┆ abc ┆ 3 │
╰────────┴────────┴──────────╯
(Showing first 3 of 3 rows) Source code in daft/functions/str.py
1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 | |