question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[OP] Add apply lambda op to use with UDFs

See original GitHub issue

Is your feature request related to a problem? Please describe.

df.apply and df.apply(lambda: x… ) are two methods commonly used to apply UDFs in W&D model preprocessing and feature engineering.

Describe the solution you’d like We should be able to create UDFs and use with df.apply or df.apply(lambda: x...)

Additional context cudf does not have apply or apply lambda functionality yet. There is applymap method that applies an elementwise function to transform the values in the Column.

Pandas example:

pdf = pd.DataFrame({'display_id':['1', '2','3', '4'], 'clicks':[0, 1, 1, 0], 'views':[1, 5,10, 3]})
pdf.loc[:, 'ctr'] = pdf.apply(ctr_udf, axis=1)
pdf.head()

     display_id  clicks  views  ctr
0	1	0	1	0.0
1	2	1	5	0.2
2	3	1	10	0.1
3	4	0	3	0.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
bschifferercommented, Jun 12, 2020

No, I think you can apply it like:

f = lambda gdf: gdf['clicks']/gdf['views']
gdf['ctr'] = f(gdf)
2reactions
benfredcommented, Jun 11, 2020

I don’t think we should be letting people apply python functions at the row level. Can we restrict this to operations happening at the dataframe or series level instead? If we are calling a python function per row we won’t be able to have acceptable performance (which is why there isn’t an ‘apply’ function in cudf right now).

For your example of calculating CTR, we could add something where the calculation per GDF chunk looks like:

gdf['ctr'] = gdf['views'] / gdf['displays']

instead of using an apply function. The idea here is that we are declaring the work that needs done in python, but the actual work is done on the gpu using cudf. (also take a look https://github.com/NVIDIA/NVTabular/pull/84#discussion_r438953563 for how to express a similar idea as a NVT operator).

Read more comments on GitHub >

github_iconTop Results From Across the Web

[OP] Add apply lambda op to use with UDFs · Issue #75 - GitHub
We should be able to create UDFs and use with df.apply or df.apply(lambda: x...) ... cudf does not have apply or apply lambda...
Read more >
Overview of User Defined Functions with cuDF - RAPIDS Docs
Series.apply for applying scalar UDFs to series objects. ... you would in pandas - by using a lambda function to map the UDF...
Read more >
Example uses of user-defined functions (UDFs)
Accessing external components using Amazon Redshift Lambda UDFs – describes how Amazon Redshift Lambda UDFs work and walks through creating a Lambda UDF....
Read more >
How to Turn Python Functions into PySpark Functions (UDF)
Registering a UDF. PySpark UDFs work in a similar way as the pandas .map() and .apply() methods for pandas series and dataframes.
Read more >
PySpark UDF (User Defined Function) - Spark by {Examples}
UDF's are used to extend the functions of the framework and re-use these functions on multiple DataFrame's. For example, you wanted to convert...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found