[OP] Add apply lambda op to use with UDFs
See original GitHub issueIs your feature request related to a problem? Please describe.
df.apply and df.apply(lambda: x… ) are two methods commonly used to apply UDFs in W&D model preprocessing and feature engineering.
Describe the solution you’d like
We should be able to create UDFs and use with df.apply
or df.apply(lambda: x...)
Additional context
cudf does not have apply
or apply lambda
functionality yet. There is applymap
method that applies an elementwise function to transform the values in the Column.
Pandas example:
pdf = pd.DataFrame({'display_id':['1', '2','3', '4'], 'clicks':[0, 1, 1, 0], 'views':[1, 5,10, 3]})
pdf.loc[:, 'ctr'] = pdf.apply(ctr_udf, axis=1)
pdf.head()
display_id clicks views ctr
0 1 0 1 0.0
1 2 1 5 0.2
2 3 1 10 0.1
3 4 0 3 0.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
[OP] Add apply lambda op to use with UDFs · Issue #75 - GitHub
We should be able to create UDFs and use with df.apply or df.apply(lambda: x...) ... cudf does not have apply or apply lambda...
Read more >Overview of User Defined Functions with cuDF - RAPIDS Docs
Series.apply for applying scalar UDFs to series objects. ... you would in pandas - by using a lambda function to map the UDF...
Read more >Example uses of user-defined functions (UDFs)
Accessing external components using Amazon Redshift Lambda UDFs – describes how Amazon Redshift Lambda UDFs work and walks through creating a Lambda UDF....
Read more >How to Turn Python Functions into PySpark Functions (UDF)
Registering a UDF. PySpark UDFs work in a similar way as the pandas .map() and .apply() methods for pandas series and dataframes.
Read more >PySpark UDF (User Defined Function) - Spark by {Examples}
UDF's are used to extend the functions of the framework and re-use these functions on multiple DataFrame's. For example, you wanted to convert...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
No, I think you can apply it like:
I don’t think we should be letting people apply python functions at the row level. Can we restrict this to operations happening at the dataframe or series level instead? If we are calling a python function per row we won’t be able to have acceptable performance (which is why there isn’t an ‘apply’ function in cudf right now).
For your example of calculating CTR, we could add something where the calculation per GDF chunk looks like:
instead of using an apply function. The idea here is that we are declaring the work that needs done in python, but the actual work is done on the gpu using cudf. (also take a look https://github.com/NVIDIA/NVTabular/pull/84#discussion_r438953563 for how to express a similar idea as a NVT operator).