scala - How to flatmap a nested Dataframe in Spark -
i have nested string shown below. want flat map them produce unique rows in spark
my dataframe has
a,b,"x,y,z",d
i want convert produce output like
a,b,x,d a,b,y,d a,b,z,d
how can that.
basically how can flat map , apply function inside dataframe
thanks
spark 2.0+
dataset.flatmap
:
val ds = df.as[(string, string, string, string)] ds.flatmap { case (x1, x2, x3, x4) => x3.split(",").map((x1, x2, _, x4)) }.todf
spark 1.3+.
use split
, explode
functions:
val df = seq(("a", "b", "x,y,z", "d")).todf("x1", "x2", "x3", "x4") df.withcolumn("x3", explode(split($"x3", ",")))
spark 1.x
dataframe.explode
(deprecated in spark 2.x)
df.explode($"x3")(_.getas[string](0).split(",").map(tuple1(_)))
Comments
Post a Comment