question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for Array type hints in APIs that take Python native functions (e.g., DataFrame.apply)

See original GitHub issue
import databricks.koalas as ks

def tokenizeDF(col1) -> ks.Series[np.array(... ???)]:
    pass

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Callum027commented, May 18, 2020

Hi there, I was wondering if there has been any progress with this? I’d like to be able to annotate my functions that generate lists, so it doesn’t have to infer the type and incur a performance penalty.

from typing import List
import databricks.koalas as ks

# Equivalent Spark UDF return type: ArrayType(StringType())
def to_list(x: str) -> List[str]:
    return [x]

ks.Series(["x", "y"]).apply(to_list)

Result:

Traceback (most recent call last)
<ipython-input-17-13d49d0c81a1> in <module>
      6     return [x]
      7 
----> 8 ks.Series(["x", "y"]).apply(to_list)

/opt/conda/lib/python3.7/site-packages/databricks/koalas/series.py in apply(self, func, args, **kwds)
   2664                 raise ValueError(
   2665                     "Expected the return type of this function to be of scalar type, "
-> 2666                     "but found type {}".format(sig_return)
   2667                 )
   2668             return_schema = sig_return.tpe

ValueError: Expected the return type of this function to be of scalar type, but found type UnknownType[typing.List[str]]
0reactions
itholiccommented, Nov 10, 2020

Will work on this from now on. Maybe we can do this like List[...] one.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How Python type hints simplify Pandas UDFs in Apache Spark ...
The Python function takes and outputs a Pandas Series. You can perform a vectorized operation for adding one to each value by using...
Read more >
PEP 484 – Type Hints - Python Enhancement Proposals
For example, here is a simple function whose argument and return type are declared in the annotations: def greeting(name: str) -> str: return...
Read more >
Type Hints in Pandas API on Spark
Pandas API on Spark understands the type hints specified in the return type and converts it as a Spark schema for pandas UDFs...
Read more >
Type hinting / annotation (PEP 484) for numpy.ndarray
Type hinting / annotation (PEP 484) for numpy.ndarray · 1. pypi.python.org/pypi/plac can make use of Py3 annotations - to populate an argparse parser....
Read more >
Extending pandas — pandas 1.5.2 documentation
pandas offers a few options for extending pandas. Registering custom accessors#. Libraries can use the decorators pandas.api.extensions.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found