question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Koalas DataFrame should only have Pandas corresponding APIs (not Spark APIs)

See original GitHub issue

This is partially of https://github.com/databricks/koalas/issues/119

I was thinking Koalas DataFrame strictly should have Pandas corresponding APIs, although there might be few exceptions with some strong reasons.

Meaning Koalas DataFrame should not have Spark DataFrame specific APIs like explain() or selectExpr(). My current thought is we have a API like koalas_df.to_spark() (borrowed from @ueshin’s idea via offline discussion) so that users can use Spark APIs.

Koalas API usages

koalas_df.loc(...)
koalas_df.drop(...)  # works as Pandas API

Spark API usages

koalas_df.to_spark().explain()
koalas_df.to_spark().selectExpr()
koalas_df.to_spark().drop(...)  # works as Spark API

This can clearly define what Koalas APIs expect and Spark APIs expect.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
thunterdbcommented, Apr 23, 2019

I think that users should be able to pick what they need, in particular things like .cache() or .repartition() which can be necessary. Doing koala_df.to_spark().cache().to_koalas() looses the index and metadata, so it is a no-go. Other users should also chime in though.

0reactions
rxincommented, May 1, 2019

Closing this as it will be part of the design philosophy doc.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Interoperability between Koalas and Apache Spark - Databricks
Koalas is useful for not only pandas users but also PySpark users ... Koalas translates pandas APIs into the logical plan of Spark...
Read more >
Design Principles — Koalas 1.8.2 documentation
The Koalas DataFrame is meant to provide the best of pandas and Spark under a single API, with easy and clear conversions between...
Read more >
databricks.koalas.DataFrame — Koalas 1.8.2 documentation
Koalas DataFrame that corresponds to pandas DataFrame logically. This holds Spark DataFrame internally. _internal – an internal immutable Frame to manage ...
Read more >
Working with pandas and PySpark - Koalas - Read the Docs
PySpark users can access to full PySpark APIs by calling DataFrame.to_spark() . Koalas DataFrame and Spark DataFrame are virtually interchangeable. For example, ...
Read more >
Koalas: pandas API on Apache Spark — Koalas 1.8.2 ...
The Koalas project makes data scientists more productive when interacting with big data, by implementing the pandas DataFrame API on top of Apache...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found