Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

df.cache() question

See original GitHub issue

PySpark allows for giving an argument for caching type. How can we pass this in Koalas?

Spark

>>> df.persist(pyspark.StorageLevel.MEMORY_ONLY)
DataFrame[id: bigint, name: string]

>>> df.persist(pyspark.StorageLevel.DISK_ONLY)
DataFrame[id: bigint, name: string]

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

itholiccommented, Mar 31, 2020

Btw, the default value is MEMORY_AND_DISK instead of MEMORY_ONLY.

Thanks for the comment and fixing the mistake!

Sure, I’ll add the DataFrame.persist() soon.

@Harshitg I’ll try to make the DataFrame.persist() method until next release so that you can use the caching type after then. 😸

0reactions

itholiccommented, Mar 31, 2020

@Harshitg My pleasure 😄

Read more comments on GitHub >

Top Results From Across the Web

where does df.cache() is stored - apache spark - Stack Overflow

df.cache() calls the persist() method which stores on storage level as MEMORY_AND_DISK , but you can change the storage level.

Best practices for caching in Spark SQL - Towards Data Science

cache() if the df contains lots of columns and only a small subset will be needed in follow-up queries. Use the caching only...

Best practice for cache(), count(), and take() - Databricks

cache() is an Apache Spark transformation that can be used on a DataFrame, Dataset, or RDD when you want to perform more than...

df.cache() is not working on jdbc table - Cloudera Community

By using df.cache() I cannot see any query in rdbms executed for reading data unless I do df.show(). It means that data is...

Spark DataFrame Cache and Persist Explained

Spark DataFrame or Dataset cache() method by default saves it to storage level ` MEMORY_AND_DISK ` because recomputing the in-memory columnar ...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Py4JJavaError: An error occurred while calling o1446.filter.

df.apply(func_without_typehint, axis=1) is not running in parallel