[FEATURE REQUEST]: UDF to return custom business objects
See original GitHub issueI have a C# component which is presently running in Azure Data lake and i am planning to move to Spark and reuse the same component. My example scenario
C# takes an input of Manager Dataset like
mgrId | name |
---|---|
11 | ABC |
22 | DEF |
C# component returns a List of Reportee, where Reportee is Defined as
Class { public int EmpId; public string Name; public string Role; public int MgrId; }
Reportee dataset
empId | name | role | mgrId |
---|---|---|---|
100 | pqr | admin | 11 |
200 | stu | reader | 11 |
300 | wxy | reader | 22 |
intended UDF
var udf = Udf<int, List<Reportee>>((mgrId) => return component.Execute(mgrId); });
for each row in my Manager dataset, i have to call UDF to get final result in spark as
mgrId | mgrname | empname | empid | Role |
---|---|---|---|---|
11 | ABC | pqr | 100 | admin |
11 | ABC | stu | 200 | reader |
22 | DEF | wxy | 300 | reader |
Issue Analytics
- State:
- Created 4 years ago
- Comments:25 (16 by maintainers)
Top Results From Across the Web
Apache Spark UDF that returns dynamic data types
All the row_type 's are set dynamically. I can great Schema for each row_type , but I cannot make same UDF return results...
Read more >User-defined scalar functions - Python
Call the UDF in Spark SQL; Use UDF with DataFrames; Evaluation order and null checking. Register a function as a UDF. Python.
Read more >How to Write Spark UDF (User Defined Functions) in Python
For this, all we have to do use @ sign(decorator) in front of udf function, and give the return type of the function...
Read more >N1QL Now Supports User-Defined Functions
Return Values with UDFs. User-defined functions only return one value of any type. If you need to return more than one value, return...
Read more >User-defined functions | BigQuery
A UDF accepts columns of input, performs actions on the input, and returns the result of those actions as a value. You can...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The UDT (user-defined type) as a return type of UDF will not be supported. (UDT API in Spark became private since 2.0, and not much traction in PR, etc.)
However, we plan to achieve something similar using StructType. This is how it’s done in PySpark.
I did a quick prototype, and it looks like the following:
This feature be available in coming weeks.
Why not something like
Udf_SomeOtherName<T>(T t, Schema s)
for UDFs that return Row? It’s awkward to allow schema for almost all other types that don’t even need it.So we have two options:
Row
andGenericRow
, especially if you are going from an object with less info to more info (GenericRow
->Row
, how can it logically be possible?)We can explore both options and get some early feedback.