local file operations for subsetting and file conversion
See original GitHub issueadding another layer of in-memory bytesIO objects for performing local operations on subsetted files from icepyx
- subsetting to valid data points using provided or calculated quality flags
- converting to different file formats not available from NSIDC (such as zarr)
basic addition to https://github.com/icesat2py/icepyx/blob/master/icepyx/core/granules.py#L390 will be like this (with the file operations coming after):
for zfile in z.filelist:
# Remove the subfolder name from the filepath
zfile.filename = os.path.basename(zfile.filename)
fileID = io.BytesIO(z.read(zfile))
fileID.seek(0)
# open in-memory HDF5 file and perform operations
with h5py.File(fileID,'r') as source:
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:9 (3 by maintainers)
Top Results From Across the Web
courses-introduction-to-python/chapter4.md at master
Python code to convert height_in to a numpy array with the correct units is already ... Make sure to wrap a print() call...
Read more >Chapter 4 Importing data and managing files - GitHub Pages
File unzipping, conversion, and context. Convert binary to text file using corresponding application. Comma separated values (CSV) files, use comma to separate ...
Read more >File Operations - Tosca - Tricentis
The folder TBox Automation Tools->File Operations in the Standard subset ... This Module allows you to verify whether a file exists in a...
Read more >Copy data from/to a file system - Azure Data Factory & ...
Defines the copy behavior when the source is files from a file-based data store. Allowed values are: - PreserveHierarchy (default): Preserves ...
Read more >Working With Files
Two files are now copied into the archive directory. You can also use multiple from() statements to do the same thing, as shown...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Another use case we might want to think about is “what happens if the user finds out later it needs another variable”? Will the user need to download the data again (since we have not persisted the original files)?
This is a great discussion and a critical one for where icepyx goes next. It will be important to have a way for people to use/interact with data locally that is not dependent on them having just downloaded it, which raises a few questions about where/when some of these subsetting and conversion operations should happen and what files are ultimately stored for the user. The modus operandi I’ve been using can be summarized as “make most of these decisions automatically for the user based on best practices and recommendations from the science team, assuming users just want some basic data without having to make many decisions, but implement those defaults in a way (i.e. with flags and keywords) that make it easy for the heavy-data user to choose something different”. For instance, this is the idea behind the default automatic use of the NSIDC subsetter for spatial and temporal subsetting - most people don’t need full granules if they’ve already created a region of interest, so we only give them data where they’ve asked for it, but if they really want full granules, it’s easy to get them.