in built-in object IO managers, handle loading objects for multiple partitions
See original GitHub issueIf a daily asset depends on an hourly asset, then each partition of the daily asset will correspond to 24 partitions of the hourly asset.
When materializing the daily asset and load_input
is called to load the contents of the hourly asset, we should return the contents of those 24 partitions.
For the Pandas type handler of the Snowflake IO manager, we handle this by returning a single dataframe that contains the concatenated contents of all the hourly partitions.
For the built-in object store IO managers, we could just return a list of the pickled objects. Or potentially a dictionary keyed by partition?
What we’ve heard:
Here’s a start on the implementation:
from dagster import io_manager, IOManager, AssetKey
@io_manager
def my_io_manager():
return MyIOManager()
class MyIOManager(IOManager):
def handle_output(self, context, obj):
...
def _load(self, asset_key: AssetKey):
...
def _load_partition(self, asset_key: AssetKey, partition_key: str):
...
def load_input(self, context):
if context.has_asset_partitions:
partition_key_range = context.asset_partition_key_range
if partition_key_range.start == partition_key_range.end:
return self._load_partition(context.asset_key, partition_key_range.start)
else:
return [
self._load_partition(context.asset_key, partition_key)
for partition_key in context.asset_partition_keys
]
else:
return self._load(context.asset_key)
Issue Analytics
- State:
- Created a year ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
IO Managers - Dagster Docs
IO Managers are user-provided objects that store asset and op outputs and load them as inputs to downstream assets and ops.
Read more >Data File Partitioning and Advanced Concepts of Hive
Static Partitioning in Hive You can create new partitions as needed, and define the new partitions using the ADD PARTITION clause. While ......
Read more >Data partitioning guidance - Azure Architecture Center
View guidance for how to separate data partitions to be managed and accessed separately. Understand horizontal, vertical, and functional partitioning ...
Read more >7 Understanding How to Use SQL*Loader - Oracle Help Center
Partitioned database objects enable you to manage sections of data, either collectively or individually. SQL*Loader supports loading partitioned objects.
Read more >Built-in metrics | Dynatrace Docs
Each Dynatrace-supported technology offers multiple "built-in" metrics. ... Cumulative Layout Shift - load action (by key user action, geolocation, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Interesting - this isn’t a pattern that we’ve used before with IO managers, but I think it makes sense.
You could do something like this:
That will run
load_input
fordaily_asset
without trying to materializehourly_asset
in the same run.You might already be aware of this, but you could define a custom
PartitionMapping
to express this