Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

df.cache() question

See original GitHub issue

PySpark allows for giving an argument for caching type. How can we pass this in Koalas?


>>> df.persist(pyspark.StorageLevel.MEMORY_ONLY)
DataFrame[id: bigint, name: string]

>>> df.persist(pyspark.StorageLevel.DISK_ONLY)
DataFrame[id: bigint, name: string]

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

itholiccommented, Mar 31, 2020


Btw, the default value is MEMORY_AND_DISK instead of MEMORY_ONLY.

Thanks for the comment and fixing the mistake!

Sure, I’ll add the DataFrame.persist() soon.

@Harshitg I’ll try to make the DataFrame.persist() method until next release so that you can use the caching type after then. 😸

itholiccommented, Mar 31, 2020

@Harshitg My pleasure 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

where does df.cache() is stored - apache spark - Stack Overflow
df.cache() calls the persist() method which stores on storage level as MEMORY_AND_DISK , but you can change the storage level.
Read more >
Best practices for caching in Spark SQL - Towards Data Science
cache() if the df contains lots of columns and only a small subset will be needed in follow-up queries. Use the caching only...
Read more >
Best practice for cache(), count(), and take() - Databricks
cache() is an Apache Spark transformation that can be used on a DataFrame, Dataset, or RDD when you want to perform more than...
Read more >
df.cache() is not working on jdbc table - Cloudera Community
By using df.cache() I cannot see any query in rdbms executed for reading data unless I do It means that data is...
Read more >
Spark DataFrame Cache and Persist Explained
Spark DataFrame or Dataset cache() method by default saves it to storage level ` MEMORY_AND_DISK ` because recomputing the in-memory columnar ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found