Inconsistent type of kwarg 'store' across write and update
See original GitHub issueThe eager
write functions appear to expect the argument supplied to store
to directly be a store
object, whereas the update function appears to expect a factory (python callable) - can it be standardized one way or another please?
import numpy as np
import pandas as pd
from functools import partial
from storefact import get_store_from_url
from tempfile import TemporaryDirectory
from kartothek.io.eager import store_dataframes_as_dataset
from kartothek.io.eager import update_dataset_from_dataframes
df = pd.DataFrame(
{
"A": 1.,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([3] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo",
}
)
dataset_dir = TemporaryDirectory()
store = get_store_from_url(f"hfs://{dataset_dir.name}")
dm = store_dataframes_as_dataset(
store, #store object works fine here
"a_unique_dataset_identifier",
df,
metadata_version=4
)
another_df = pd.DataFrame(
{
"A": 2.,
"B": pd.Timestamp("20190604"),
"C": pd.Series(2, index=list(range(4)), dtype="float32"),
"D": np.array([6] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "bar",
}
)
store_factory = partial(get_store_from_url, f"hfs://{dataset_dir.name}")
dm = update_dataset_from_dataframes(
[another_df],
store=store_factory, #but this needs to be a callable
dataset_uuid="a_unique_dataset_identifier"
)
dm
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (1 by maintainers)
Top Results From Across the Web
Type annotations for *args and **kwargs - Stack Overflow
I'm trying out Python's type annotations with abstract base classes to write some interfaces. Is there a way to annotate the possible types...
Read more >Proposal: signature copying for kwargs. #270 - python/typing
This presents two problems for a static analyzer: the call from function to other_function can not be type-checked properly because of the *args ......
Read more >Python Type Checking (Guide) - Real Python
In this guide, you'll look at Python type checking. Traditionally, types have been handled by the Python interpreter in a flexible but implicit...
Read more >DiskCache Tutorial - Grant Jenks
An index is added to the access time field stored in the cache database. On every access, the field is updated. This makes...
Read more >ResolveChoice class - AWS Glue
MATCH_CATALOG – Attempts to cast each ChoiceType to the corresponding type in the specified Data Catalog table. database – The AWS Glue Data...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Store objects encapsulate connections to a storage service. In the methods that have a distributed computing backend, we pass the function arguments via
pickle
to the other workers. Whilepickle
can preserve the state of the attributes of an object, the connections it holds are no longer valid / cannot be transferred between processes. Thus we pass callables so that on each worker a new connection can be instantiated.@lr4d - Agreed. After posting that comment, I realized that there’s already an issue (#44) that’s about documenting store factories; so maybe adding a Gotchas document a bit further down the line will be a good idea, which can have a section on store factories and the reasoning behind them (as well as pitfalls, best practices, etc).