question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

local file operations for subsetting and file conversion

See original GitHub issue

adding another layer of in-memory bytesIO objects for performing local operations on subsetted files from icepyx

  • subsetting to valid data points using provided or calculated quality flags
  • converting to different file formats not available from NSIDC (such as zarr)

basic addition to https://github.com/icesat2py/icepyx/blob/master/icepyx/core/granules.py#L390 will be like this (with the file operations coming after):

for zfile in z.filelist:
    # Remove the subfolder name from the filepath
    zfile.filename = os.path.basename(zfile.filename)
    fileID = io.BytesIO(z.read(zfile))
    fileID.seek(0)
    # open in-memory HDF5 file and perform operations
    with h5py.File(fileID,'r') as source:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
fspaolocommented, Jun 16, 2020

Another use case we might want to think about is “what happens if the user finds out later it needs another variable”? Will the user need to download the data again (since we have not persisted the original files)?

1reaction
JessicaS11commented, Jun 16, 2020

This is a great discussion and a critical one for where icepyx goes next. It will be important to have a way for people to use/interact with data locally that is not dependent on them having just downloaded it, which raises a few questions about where/when some of these subsetting and conversion operations should happen and what files are ultimately stored for the user. The modus operandi I’ve been using can be summarized as “make most of these decisions automatically for the user based on best practices and recommendations from the science team, assuming users just want some basic data without having to make many decisions, but implement those defaults in a way (i.e. with flags and keywords) that make it easy for the heavy-data user to choose something different”. For instance, this is the idea behind the default automatic use of the NSIDC subsetter for spatial and temporal subsetting - most people don’t need full granules if they’ve already created a region of interest, so we only give them data where they’ve asked for it, but if they really want full granules, it’s easy to get them.

Read more comments on GitHub >

github_iconTop Results From Across the Web

courses-introduction-to-python/chapter4.md at master
Python code to convert height_in to a numpy array with the correct units is already ... Make sure to wrap a print() call...
Read more >
Chapter 4 Importing data and managing files - GitHub Pages
File unzipping, conversion, and context.​​ Convert binary to text file using corresponding application. Comma separated values (CSV) files, use comma to separate ...
Read more >
File Operations - Tosca - Tricentis
The folder TBox Automation Tools->File Operations in the Standard subset ... This Module allows you to verify whether a file exists in a...
Read more >
Copy data from/to a file system - Azure Data Factory & ...
Defines the copy behavior when the source is files from a file-based data store. Allowed values are: - PreserveHierarchy (default): Preserves ...
Read more >
Working With Files
Two files are now copied into the archive directory. You can also use multiple from() statements to do the same thing, as shown...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found