Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Persist/checkpoint to disk

See original GitHub issue

Do we have some function to persist data into disk when not using a cluster? It would just be a small function that calls compute, writes the data to disk and then loads it back again. I’m currently writing my own wrapper function.

ddf = ddf.checkpoint(to_parquet, filename...)

#do more work with ddf, but computation is faster since ddf is persisted
a = ddf[...]

Issue Analytics

State:
Created 4 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

2reactions

martindurantcommented, Apr 30, 2019

This is sidelong related to the concept of persist in Intake, which specifically has exactly one file format to output for each type of data source. In dask, we could say that the canonical storage for dataframes is parquet and for arrays zarr… or something like that, but this is not a simple problem at all. It may be that Intake or some other pipeline-like system would be a good layer over dask to handle intermediate persistence.

1reaction

martindurantcommented, Apr 30, 2019

related: https://github.com/dask/dask/pull/4025

Read more comments on GitHub >

Top Results From Across the Web

What is the difference between spark checkpoint and persist to ...

Persist (MEMORY_AND_DISK) will store the data frame to disk and memory temporary without breaking the lineage of the program i.e. df.rdd.

What is the difference between spark checkpoint ... - Intellipaat

1 Answer ; Checkpointing. Persist ; Checkpointing stores the RDD in HDFS. It deletes the lineage which created it. When we persist RDD...

Persist, Cache, Checkpoint in Apache Spark - LinkedIn

Best strategy is to start from some checkpoint in case of failure. Checkpointing save some stage of the RDD on disk and breaks...

Checkpoint Deep Dive - Fugue Tutorials

Spark persist is to cache data into memory (or executor local disk), it does not break the lineage in Spark execution). It can't...

16. Cache and checkpoint: enhancing Spark's performances

Spark offers two methods to leverage caching: cache() and persist() . The cache() method is really a synonym to the persist() method. However,...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

DOC: Impossible sequence of commands

Conditional gather?