question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Creating a virtual zarr datacube from COG s3 objects

See original GitHub issue

@cgohlke, as mentioned at https://github.com/fsspec/kerchunk/issues/78#issuecomment-1088696226, I’d like to try reading a time stack of geotiffs via the Zarr library.

As as test, I’ve got 2 COGs on S3-API-accessible object storage that I’d like to create a virtual Zarr dataset for using the tifffile library.

I can get it going if the COGS are local, but I’m a little unsure how to handle the s3 objects.

I’m trying to do:



from tifffile import TiffSequence
import zarr
import fsspec

fs = fsspec.filesystem('s3', anon=True, client_kwargs={'endpoint_url': 'https://mghp.osn.xsede.org'})

flist = fs.ls('/rsignellbucket1/lcmap/cog')

s3_flist = [f's3://{f}' for f in flist]

# try with s3 urls:
image_sequence = TiffSequence(s3_flist, pattern=r'.*_(\d+)_V12_LCPRI.tif')
with image_sequence.aszarr() as store:
    dz = zarr.open(store, mode='r')

# try with filelike objects:
file_like_object_list = [fs.open(f) for f in flist]
image_sequence = TiffSequence(file_like_object_list, pattern=r'.*_(\d+)_V12_LCPRI.tif')

but neither one works. These buckets are public access (no egress!) so this code should be reproducible.

Here’s the Jupyter Notebook demonstrating the local success (except I’d like to use the native 2048x2048 chunking in the COG instead of the entire file!) and remote s3 failures.

What is the best way to approach this?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:51 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
martindurantcommented, May 13, 2022

Unless you have also updated numcodecs, you need to do

import imagecodecs.numcodecs
imagecodecs.numcodecs.register_codecs()

to make sure zarr knows about it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Model [geotiff] postprocessing at scale - what would you do?
Stored s3://bucket/folder/time [e.g. when the model was run] ... Creating a virtual zarr datacube from COG s3 objects · Issue #125 · cgohlke/tifffile ......
Read more >
Hosting and Accessing Cloud Optimized GeoTIFFs on AWS S3
This is the workflow that I've come up with for creating COGs using GDAL and hosting and accessing them on AWS S3.
Read more >
STAC, ZARR, COG, K8S and Data Cubes: The brave new ...
A discussion about cloud native geospatial in Canada - presented at GeoAlberta 2019.
Read more >
Pangeo Forge: Crowdsourcing Analysis-Ready, Cloud ...
Pangeo Forge is a new community-driven platform that accelerates science by providing high-level recipe frameworks alongside cloud compute ...
Read more >
D1.10 DataCube Integration Report v2 - European Commission
MEteoreological Environmental Earth Observation. ODC. Open Data Cube. VHR. Very High Resolution. VM. Virtual Machine. WP. Work Package. S3.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found