Document that we don't support the compatibility with non-Koalas APIs yet.
See original GitHub issueSeems like people want to convert their codes directly from pandas to Koalas. One case I often observe is, they want to convert the codes that works together with other Python standard functions such as max
, min
, or list/generator comprehensions, e.g.)
import pandas as pd
data = []
for a in pd.Series([1, 2, 3]):
data.append(a)
pd.DataFrame(data)
In Koalas, such example does not work. We should preemptively document and guide users to stick to Koalas APIs only.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
10 Minutes from pandas to Koalas on Apache Spark - Databricks
This blog post will not only demonstrate how easy it is to convert code written in pandas to Koalas, but also discuss the...
Read more >From Pandas to PySpark with Koalas - Towards Data Science
I recently stumbled upon Koalas from a very interesting Databricks presentation about Apache Spark 3.0, Delta Lake and Koalas, and thought that ...
Read more >Koalas: pandas API on Apache Spark - PyPI
Some older versions of Spark may work too but they are not officially supported. A recent version of pandas. It is officially developed...
Read more >`unique()` has wrong return type · Issue #555 · databricks/koalas
I hear you on using it as a possible mitigation, but I thought the goal of koalas was to not require user changes...
Read more >Working with pandas and PySpark - Koalas - Read the Docs
Users from pandas and/or PySpark face API compatibility issue sometimes when they work with Koalas. Since Koalas does not target 100% compatibility of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think we should better move them to Best Practice. I think we could rephrase, for example as below. Feel free to reword or rephrase.
Title: Use Koalas APIs directly whenever possible
Contents: While Koalas has similar APIs with pandas, some APIs are not explicitly supported. For example, Python built-in functions such as
min
,max
, etc. require the given argument to be iterrable. Koalas does not implement__iter__()
yet to prevent users to collect all data into the client (driver) side from the cluster. See the example below:pandas dataset live in the local, iterable … blah blah …
Koalas performes it in a distributed manner… blah blah
… Another common pattern from pandas users is to rely on list or generator comprehensions …:
In Koalas, you can do it via:
In case of NumPy universial functions, they are supported and can be naturally used in most cases. -> it was added https://github.com/databricks/koalas/pull/1096 https://github.com/databricks/koalas/pull/1106 https://github.com/databricks/koalas/pull/1128 FYI
Using
to_numpy
should still be discouraged and the last resort.okay~ I’ll open a PR. thank you