question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Control Consistency of PersistentDataset

See original GitHub issue

Is your feature request related to a problem? Please describe. Already included as a warning in the source code, the PersistentDataset does not check if any of the transformations changed, and thus after modifying the transformations, the cache is outdated.

Describe the solution you’d like Calculate the hash of the very first element of the dataset (after calling _pre_first_random_transform) and store this in the cache folder. In case the stored hash matches with the one calculated during __init__ of PersistentDataset the cache is still valid and no deterministic transformations have changed. However, if the hashes do not match, either a new cache folder could be created or the old cache can be overwritten.

Describe alternatives you’ve considered Alternatively, the deterministic transformations could be hashed, but I am not sure how that would work. Might be more memory efficient, but potentially harder to implement due to TypeError: Object of type *** is not JSON serializable for most transforms.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
wylicommented, Sep 10, 2020

thanks, I think this is a valid UI issue. but hashing the transforms/data are non-trivial. In the future we may have interfaces to initialise the transform chain from a user-provided config file (e.g. json/yaml), we could track the change at the config level. let’s keep this ticket open for discussions…

0reactions
lukasfollecommented, Sep 22, 2020

@wyli That sounds great! Could you give a hint when its planned to be implemented?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Measuring Data Consistency
A consistent modern database contains data that is valid per clearly defined rules, which includes cascades, triggers, and constraints. Database ...
Read more >
20 Data Concurrency and Consistency
Data consistency means that each user sees a consistent view of the data, including visible changes made by the user's own transactions and...
Read more >
Consistency (database systems)
In database systems, consistency (or correctness) refers to the requirement that any given database transaction must change affected data only in allowed ...
Read more >
Consistency | Cloud Storage
This page explains which Cloud Storage operations are strongly consistent and which are eventually ... See Cache control and consistency for details.
Read more >
CAP Theorem for Databases: Consistency, Availability & ...
In normal operations, your data store provides all three functions. But the CAP theorem maintains that when a distributed database experiences a ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found