question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Does spark DataFrame.limit guarantees any order?

See original GitHub issue

This is just a question, not an issue.
Out of curiosity, does spark.DataFrame.limit guarantees the order ?

That method is used to implement many methods like .head, .iloc
In pandas the data is small and the order could be guaranteed. But in koalas, the data is distributed. Is it guaranteed those methods would keep order?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
HyukjinKwoncommented, Nov 22, 2019

Yes, it’s a bit questionable at this moment. Maybe we should do one of these: …

  1. We just disallow all such cases
  2. Document the case when the order can be deterministic
  3. Document that the index always should be sorted before such operations for order guarantee
  4. Internally always sort on the index before such operations in such cases
1reaction
HyukjinKwoncommented, Nov 22, 2019

Yes, that’s correct 😃.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does df.limit keep changing in Pyspark? - Stack Overflow
The LIMIT clause is used to constrain the number of rows returned by the SELECT statement. In general, this clause is used in...
Read more >
sort() vs orderBy() in Spark - Towards Data Science
Both sort() and orderBy() functions can be used to sort Spark DataFrames on at least one column and any desired order, namely ascending...
Read more >
[#SPARK-16207] order guarantees for DataFrames - ASF JIRA
There's no clear explanation in the documentation about what guarantees are available for the preservation of order in DataFrames.
Read more >
Using the Spark Connector - Snowflake Documentation
From Spark SQL to Snowflake ... Limits. Projections. Sorts (ORDER BY). Union and Union All ... Window Functions (note: these do not work...
Read more >
Spark SQL Aggregate Functions
Spark SQL provides built-in standard Aggregate functions defines in DataFrame API, these come in handy when we need to make aggregate ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found