Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

local file operations for subsetting and file conversion

See original GitHub issue

adding another layer of in-memory bytesIO objects for performing local operations on subsetted files from icepyx

subsetting to valid data points using provided or calculated quality flags
converting to different file formats not available from NSIDC (such as zarr)

basic addition to https://github.com/icesat2py/icepyx/blob/master/icepyx/core/granules.py#L390 will be like this (with the file operations coming after):

for zfile in z.filelist:
    # Remove the subfolder name from the filepath
    zfile.filename = os.path.basename(zfile.filename)
    fileID = io.BytesIO(z.read(zfile))
    fileID.seek(0)
    # open in-memory HDF5 file and perform operations
    with h5py.File(fileID,'r') as source:

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:9 (3 by maintainers)

Top GitHub Comments

2reactions

fspaolocommented, Jun 16, 2020

Another use case we might want to think about is “what happens if the user finds out later it needs another variable”? Will the user need to download the data again (since we have not persisted the original files)?

1reaction

JessicaS11commented, Jun 16, 2020

This is a great discussion and a critical one for where icepyx goes next. It will be important to have a way for people to use/interact with data locally that is not dependent on them having just downloaded it, which raises a few questions about where/when some of these subsetting and conversion operations should happen and what files are ultimately stored for the user. The modus operandi I’ve been using can be summarized as “make most of these decisions automatically for the user based on best practices and recommendations from the science team, assuming users just want some basic data without having to make many decisions, but implement those defaults in a way (i.e. with flags and keywords) that make it easy for the heavy-data user to choose something different”. For instance, this is the idea behind the default automatic use of the NSIDC subsetter for spatial and temporal subsetting - most people don’t need full granules if they’ve already created a region of interest, so we only give them data where they’ve asked for it, but if they really want full granules, it’s easy to get them.

Top Results From Across the Web

courses-introduction-to-python/chapter4.md at master

Python code to convert height_in to a numpy array with the correct units is already ... Make sure to wrap a print() call...

Chapter 4 Importing data and managing files - GitHub Pages

File unzipping, conversion, and context. Convert binary to text file using corresponding application. Comma separated values (CSV) files, use comma to separate ...

File Operations - Tosca - Tricentis

The folder TBox Automation Tools->File Operations in the Standard subset ... This Module allows you to verify whether a file exists in a...

Copy data from/to a file system - Azure Data Factory & ...

Defines the copy behavior when the source is files from a file-based data store. Allowed values are: - PreserveHierarchy (default): Preserves ...

Working With Files

Two files are now copied into the archive directory. You can also use multiple from() statements to do the same thing, as shown...