Skip to content

daft.functions.decode#

decode #

decode(bytes: Expression, charset: ENCODING_CHARSET) -> Expression

Decodes binary values using the specified character set.

Note that if the charset is "utf-8" or "utf8", then this is equivalent to cast(bytes, daft.DataType.string())

If an invalid encoding is encountered, an error will be raised. To handle invalid encodings, use try_decode instead.

Parameters:

Name Type Description Default
bytes Binary Expression

The expression to decode.

required
charset str

The decoding character set (utf-8, base64).

required

Returns:

Name Type Description
Expression Binary Expression

A binary expression with the decoded values.

Examples:

1
2
3
4
>>> import daft
>>> from daft.functions import decode
>>> df = daft.from_pydict({"bytes": [b"aGVsbG8sIHdvcmxkIQ=="]})
>>> df.select(decode(df["bytes"], "base64")).show()
╭──────────────────╮
│ bytes            │
│ ---              │
│ Binary           │
╞══════════════════╡
│ b"hello, world!" │
╰──────────────────╯
(Showing first 1 of 1 rows)
Source code in daft/functions/binary.py
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def decode(bytes: Expression, charset: ENCODING_CHARSET) -> Expression:
    """Decodes binary values using the specified character set.

    Note that if the charset is "utf-8" or "utf8", then this is equivalent
    to cast(bytes, daft.DataType.string())

    If an invalid encoding is encountered, an error will be raised.
    To handle invalid encodings, use `try_decode` instead.

    Args:
        bytes (Binary Expression): The expression to decode.
        charset (str): The decoding character set (utf-8, base64).

    Returns:
        Expression (Binary Expression): A binary expression with the decoded values.

    Examples:
        >>> import daft
        >>> from daft.functions import decode
        >>> df = daft.from_pydict({"bytes": [b"aGVsbG8sIHdvcmxkIQ=="]})
        >>> df.select(decode(df["bytes"], "base64")).show()
        ╭──────────────────╮
        │ bytes            │
        │ ---              │
        │ Binary           │
        ╞══════════════════╡
        │ b"hello, world!" │
        ╰──────────────────╯
        <BLANKLINE>
        (Showing first 1 of 1 rows)
    """
    if charset.lower() in ("utf-8", "utf8"):
        return bytes.cast(DataType.string())

    return Expression._call_builtin_scalar_fn("decode", bytes, codec=charset)