Start a conditional expression, similar to SQL CASE WHEN.
If the condition is true, the then value will be returned. Otherwise, the next when condition will be evaluated. If no conditions are true, the value will be set to the value provided in the otherwise clause, or null if not provided.
Parameters:
| Name | Type | Description | Default |
condition | Expression | bool | The Boolean expression to evaluate | required |
then | Expression | Any | Expression to return when the condition is true | required |
Returns:
| Type | Description |
WhenExpr | A WhenExpr that can be chained with more when clauses and ended with otherwise |
Examples:
Simple conditional assignment:
| >>> import daft
>>> from daft.functions import when
>>>
>>> df = daft.from_pydict({"x": [1, 2, 3, 4, 5]})
>>> df = df.select(when(df["x"] > 3, then="high").otherwise("low").alias("category"))
>>> df.show()
|
╭──────────╮
│ category │
│ --- │
│ String │
╞══════════╡
│ low │
├╌╌╌╌╌╌╌╌╌╌┤
│ low │
├╌╌╌╌╌╌╌╌╌╌┤
│ low │
├╌╌╌╌╌╌╌╌╌╌┤
│ high │
├╌╌╌╌╌╌╌╌╌╌┤
│ high │
╰──────────╯
(Showing first 5 of 5 rows)
Multiple conditions using chained when clauses:
| >>> df = daft.from_pydict({"score": [85, 92, 78, 65, 88]})
>>> df = df.select(
... when(df["score"] >= 90, then="A")
... .when(df["score"] >= 80, then="B")
... .when(df["score"] >= 70, then="C")
... .otherwise("F")
... .alias("grade")
... )
>>> df.show()
|
╭────────╮
│ grade │
│ --- │
│ String │
╞════════╡
│ B │
├╌╌╌╌╌╌╌╌┤
│ A │
├╌╌╌╌╌╌╌╌┤
│ C │
├╌╌╌╌╌╌╌╌┤
│ F │
├╌╌╌╌╌╌╌╌┤
│ B │
╰────────╯
(Showing first 5 of 5 rows)
Using complex conditions and returning different data types:
| >>> df = daft.from_pydict({"name": ["Alice", "Bob", "Charlie"], "age": [25, 17, 35]})
>>> df = df.select(
... df["name"],
... when((df["age"] >= 18) & (df["age"] < 65), then=df["age"])
... .when(df["age"] < 18, then=-1)
... .otherwise(0)
... .alias("working_age"),
... )
>>> df.show()
|
╭─────────┬─────────────╮
│ name ┆ working_age │
│ --- ┆ --- │
│ String ┆ Int64 │
╞═════════╪═════════════╡
│ Alice ┆ 25 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Bob ┆ -1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Charlie ┆ 35 │
╰─────────┴─────────────╯
(Showing first 3 of 3 rows)
Handling null values:
| >>> df = daft.from_pydict({"value": [10, None, 20, 0]})
>>> df = df.select(
... when(df["value"].is_null(), then="missing")
... .when(df["value"] == 0, then="zero")
... .when(df["value"] > 15, then="high")
... .otherwise("normal")
... .alias("status")
... )
>>> df.show()
|
╭─────────╮
│ status │
│ --- │
│ String │
╞═════════╡
│ normal │
├╌╌╌╌╌╌╌╌╌┤
│ missing │
├╌╌╌╌╌╌╌╌╌┤
│ high │
├╌╌╌╌╌╌╌╌╌┤
│ zero │
╰─────────╯
(Showing first 4 of 4 rows)
Without otherwise clause (returns null when no conditions match):
| >>> df = daft.from_pydict({"x": [1, 2, 3]})
>>> df = df.select(when(df["x"] > 1, then="big").alias("result"))
>>> df.show()
|
╭────────╮
│ result │
│ --- │
│ String │
╞════════╡
│ None │
├╌╌╌╌╌╌╌╌┤
│ big │
├╌╌╌╌╌╌╌╌┤
│ big │
╰────────╯
(Showing first 3 of 3 rows)
Source code in daft/functions/misc.py
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982 | def when(condition: Expression | bool, then: Expression | Any) -> WhenExpr:
"""Start a conditional expression, similar to SQL CASE WHEN.
If the condition is true, the `then` value will be returned. Otherwise, the next `when` condition will be evaluated.
If no conditions are true, the value will be set to the value provided in the `otherwise` clause, or null if not provided.
Args:
condition: The Boolean expression to evaluate
then: Expression to return when the condition is true
Returns:
A WhenExpr that can be chained with more `when` clauses and ended with `otherwise`
Examples:
Simple conditional assignment:
>>> import daft
>>> from daft.functions import when
>>>
>>> df = daft.from_pydict({"x": [1, 2, 3, 4, 5]})
>>> df = df.select(when(df["x"] > 3, then="high").otherwise("low").alias("category"))
>>> df.show()
╭──────────╮
│ category │
│ --- │
│ String │
╞══════════╡
│ low │
├╌╌╌╌╌╌╌╌╌╌┤
│ low │
├╌╌╌╌╌╌╌╌╌╌┤
│ low │
├╌╌╌╌╌╌╌╌╌╌┤
│ high │
├╌╌╌╌╌╌╌╌╌╌┤
│ high │
╰──────────╯
<BLANKLINE>
(Showing first 5 of 5 rows)
Multiple conditions using chained `when` clauses:
>>> df = daft.from_pydict({"score": [85, 92, 78, 65, 88]})
>>> df = df.select(
... when(df["score"] >= 90, then="A")
... .when(df["score"] >= 80, then="B")
... .when(df["score"] >= 70, then="C")
... .otherwise("F")
... .alias("grade")
... )
>>> df.show()
╭────────╮
│ grade │
│ --- │
│ String │
╞════════╡
│ B │
├╌╌╌╌╌╌╌╌┤
│ A │
├╌╌╌╌╌╌╌╌┤
│ C │
├╌╌╌╌╌╌╌╌┤
│ F │
├╌╌╌╌╌╌╌╌┤
│ B │
╰────────╯
<BLANKLINE>
(Showing first 5 of 5 rows)
Using complex conditions and returning different data types:
>>> df = daft.from_pydict({"name": ["Alice", "Bob", "Charlie"], "age": [25, 17, 35]})
>>> df = df.select(
... df["name"],
... when((df["age"] >= 18) & (df["age"] < 65), then=df["age"])
... .when(df["age"] < 18, then=-1)
... .otherwise(0)
... .alias("working_age"),
... )
>>> df.show()
╭─────────┬─────────────╮
│ name ┆ working_age │
│ --- ┆ --- │
│ String ┆ Int64 │
╞═════════╪═════════════╡
│ Alice ┆ 25 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Bob ┆ -1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Charlie ┆ 35 │
╰─────────┴─────────────╯
<BLANKLINE>
(Showing first 3 of 3 rows)
Handling null values:
>>> df = daft.from_pydict({"value": [10, None, 20, 0]})
>>> df = df.select(
... when(df["value"].is_null(), then="missing")
... .when(df["value"] == 0, then="zero")
... .when(df["value"] > 15, then="high")
... .otherwise("normal")
... .alias("status")
... )
>>> df.show()
╭─────────╮
│ status │
│ --- │
│ String │
╞═════════╡
│ normal │
├╌╌╌╌╌╌╌╌╌┤
│ missing │
├╌╌╌╌╌╌╌╌╌┤
│ high │
├╌╌╌╌╌╌╌╌╌┤
│ zero │
╰─────────╯
<BLANKLINE>
(Showing first 4 of 4 rows)
Without `otherwise` clause (returns null when no conditions match):
>>> df = daft.from_pydict({"x": [1, 2, 3]})
>>> df = df.select(when(df["x"] > 1, then="big").alias("result"))
>>> df.show()
╭────────╮
│ result │
│ --- │
│ String │
╞════════╡
│ None │
├╌╌╌╌╌╌╌╌┤
│ big │
├╌╌╌╌╌╌╌╌┤
│ big │
╰────────╯
<BLANKLINE>
(Showing first 3 of 3 rows)
"""
return WhenExpr([]).when(condition, then)
|