daft.functions.regexp_extract#
regexp_extract #
regexp_extract(expr: Expression, pattern: str | Expression, index: int = 0) -> Expression
Extracts the specified match group from the first regex match in each string in a string column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expr | Expression | String expression to extract from | required |
pattern | str | Expression | The regex pattern to extract | required |
index | int | The index of the regex match group to extract | 0 |
Returns:
| Name | Type | Description |
|---|---|---|
Expression | Expression | a String expression with the extracted regex match |
Note
If index is 0, the entire match is returned. If the pattern does not match or the group does not exist, a null value is returned.
Examples:
1 2 3 4 5 6 | |
╭─────────┬────────╮
│ x ┆ match │
│ --- ┆ --- │
│ String ┆ String │
╞═════════╪════════╡
│ 123-456 ┆ 123 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 789-012 ┆ 789 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 345-678 ┆ 345 │
╰─────────┴────────╯
(Showing first 3 of 3 rows) Extract the first capture group
1 | |
╭─────────┬────────╮
│ x ┆ match │
│ --- ┆ --- │
│ String ┆ String │
╞═════════╪════════╡
│ 123-456 ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 789-012 ┆ 7 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 345-678 ┆ 3 │
╰─────────┴────────╯
(Showing first 3 of 3 rows) See Also
Source code in daft/functions/str.py
1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 | |