Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[OP] Add apply lambda op to use with UDFs

See original GitHub issue

Is your feature request related to a problem? Please describe.

df.apply and df.apply(lambda: x… ) are two methods commonly used to apply UDFs in W&D model preprocessing and feature engineering.

Describe the solution you’d like We should be able to create UDFs and use with df.apply or df.apply(lambda: x...)

Additional context cudf does not have apply or apply lambda functionality yet. There is applymap method that applies an elementwise function to transform the values in the Column.

Pandas example:

pdf = pd.DataFrame({'display_id':['1', '2','3', '4'], 'clicks':[0, 1, 1, 0], 'views':[1, 5,10, 3]})
pdf.loc[:, 'ctr'] = pdf.apply(ctr_udf, axis=1)
pdf.head()

     display_id  clicks  views  ctr
0	1	0	1	0.0
1	2	1	5	0.2
2	3	1	10	0.1
3	4	0	3	0.0

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

bschifferercommented, Jun 12, 2020

No, I think you can apply it like:

f = lambda gdf: gdf['clicks']/gdf['views']
gdf['ctr'] = f(gdf)

2reactions

benfredcommented, Jun 11, 2020

I don’t think we should be letting people apply python functions at the row level. Can we restrict this to operations happening at the dataframe or series level instead? If we are calling a python function per row we won’t be able to have acceptable performance (which is why there isn’t an ‘apply’ function in cudf right now).

For your example of calculating CTR, we could add something where the calculation per GDF chunk looks like:

gdf['ctr'] = gdf['views'] / gdf['displays']

instead of using an apply function. The idea here is that we are declaring the work that needs done in python, but the actual work is done on the gpu using cudf. (also take a look https://github.com/NVIDIA/NVTabular/pull/84#discussion_r438953563 for how to express a similar idea as a NVT operator).

Top Results From Across the Web

[OP] Add apply lambda op to use with UDFs · Issue #75 - GitHub

We should be able to create UDFs and use with df.apply or df.apply(lambda: x...) ... cudf does not have apply or apply lambda...

Overview of User Defined Functions with cuDF - RAPIDS Docs

Series.apply for applying scalar UDFs to series objects. ... you would in pandas - by using a lambda function to map the UDF...

Example uses of user-defined functions (UDFs)

Accessing external components using Amazon Redshift Lambda UDFs – describes how Amazon Redshift Lambda UDFs work and walks through creating a Lambda UDF....

How to Turn Python Functions into PySpark Functions (UDF)

Registering a UDF. PySpark UDFs work in a similar way as the pandas .map() and .apply() methods for pandas series and dataframes.

PySpark UDF (User Defined Function) - Spark by {Examples}

UDF's are used to extend the functions of the framework and re-use these functions on multiple DataFrame's. For example, you wanted to convert...