Creating a virtual zarr datacube from COG s3 objects
See original GitHub issue@cgohlke, as mentioned at https://github.com/fsspec/kerchunk/issues/78#issuecomment-1088696226, I’d like to try reading a time stack of geotiffs via the Zarr library.
As as test, I’ve got 2 COGs on S3-API-accessible object storage that I’d like to create a virtual Zarr dataset for using the tifffile
library.
I can get it going if the COGS are local, but I’m a little unsure how to handle the s3 objects.
I’m trying to do:
from tifffile import TiffSequence
import zarr
import fsspec
fs = fsspec.filesystem('s3', anon=True, client_kwargs={'endpoint_url': 'https://mghp.osn.xsede.org'})
flist = fs.ls('/rsignellbucket1/lcmap/cog')
s3_flist = [f's3://{f}' for f in flist]
# try with s3 urls:
image_sequence = TiffSequence(s3_flist, pattern=r'.*_(\d+)_V12_LCPRI.tif')
with image_sequence.aszarr() as store:
dz = zarr.open(store, mode='r')
# try with filelike objects:
file_like_object_list = [fs.open(f) for f in flist]
image_sequence = TiffSequence(file_like_object_list, pattern=r'.*_(\d+)_V12_LCPRI.tif')
but neither one works. These buckets are public access (no egress!) so this code should be reproducible.
Here’s the Jupyter Notebook demonstrating the local success (except I’d like to use the native 2048x2048 chunking in the COG instead of the entire file!) and remote s3 failures.
What is the best way to approach this?
Issue Analytics
- State:
- Created a year ago
- Comments:51 (10 by maintainers)
Top Results From Across the Web
Model [geotiff] postprocessing at scale - what would you do?
Stored s3://bucket/folder/time [e.g. when the model was run] ... Creating a virtual zarr datacube from COG s3 objects · Issue #125 · cgohlke/tifffile ......
Read more >Hosting and Accessing Cloud Optimized GeoTIFFs on AWS S3
This is the workflow that I've come up with for creating COGs using GDAL and hosting and accessing them on AWS S3.
Read more >STAC, ZARR, COG, K8S and Data Cubes: The brave new ...
A discussion about cloud native geospatial in Canada - presented at GeoAlberta 2019.
Read more >Pangeo Forge: Crowdsourcing Analysis-Ready, Cloud ...
Pangeo Forge is a new community-driven platform that accelerates science by providing high-level recipe frameworks alongside cloud compute ...
Read more >D1.10 DataCube Integration Report v2 - European Commission
MEteoreological Environmental Earth Observation. ODC. Open Data Cube. VHR. Very High Resolution. VM. Virtual Machine. WP. Work Package. S3.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Unless you have also updated numcodecs, you need to do
to make sure zarr knows about it.
https://github.com/cgohlke/tifffile/blob/375d97f62df6482142b51f1b38a49bdd24d18a60/examples/issue125.py#L16