Apply Spark Column operations directly on a Series
See original GitHub issueOn issue #1492, I noticed it was noted the following operation was possible on a column of a Koalas DataFrame
:
>>> import databricks.koalas as ks
>>> df = ks.DataFrame(["example"], columns=["column"])
>>> from pyspark.sql import functions as F
>>> df["column"] = F.trim(F.upper(F.col("column")))
>>> df
column
0 EXAMPLE
By any chance, is there currently a way to apply Spark SQL/Column functions to a Koalas Series
? I can imagine it looking something like so:
>>> import databricks.koalas as ks
>>> import pyspark.sql.functions as F
>>> kss = ks.Series(["example"])
>>> kss.apply(F.trim(F.upper(kss.spark_column)))
0 EXAMPLE
Name: 0, dtype: object
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (8 by maintainers)
Top Results From Across the Web
Operations on One Column - Spark for Data Scientists - GitBook
Select a subset of columns to show, use select("col1","col2". ... columnar means you can operate on columns only and directly with Spark native...
Read more >Spark DataFrame withColumn
Spark withColumn() is a DataFrame function that is used to add a new column to DataFrame, change the value of an existing column,...
Read more >Value and column operations in scala spark, how to use a ...
Show activity on this post. Use literal Column import org.apache.spark.sql.functions.lit lit(1) / col("col2").
Read more >Essential PySpark DataFrame Column Operations for Data ...
PySpark Column Operations plays a key role in manipulating and displaying desired results of PySpark DataFrame. Let's understand them here.
Read more >Column - Apache Spark
col("columnName.field") // Extracting a struct field col("`a.column.with.dots`") // Escape `.` in column names ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
tada
I opened a PR. The idea is basically to collect everything related to Spark itself to
.spark
namespace. This specific issue could be resolved via:Given that the current APIs in Pandas such as
Series.transform
,Series.apply
,DataFrame.apply
,DataFrame.transform
, etc, I named itSeries.spark.transform
and made the usage similar. Also, it shares the same limitation.