question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add support for opening a dataset memory without writing it to disc

See original GitHub issue

Is it possible to read a Dataset from a Zipfile without writing to disc?

This works like a charm:

with gzip.open('some_file', "rb") as f_in:
     with open("tmp", "wb") as f_out:
           shutil.copyfileobj(f_in, f_out)
grib_files = cfgrib.open_datasets("tmp", backend_kwargs={"indexpath": ""})

I want something like that:

with gzip.GzipFile('some_file', 'rb') as zipfile:
    bytes_content = zipfile.read()
grib_files = cfgrib.open_datasets(zipfile, backend_kwargs={"indexpath": ""})

But i am getting the following error:

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 4: invalid start byte

Any ideas?

Kind Regards.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

3reactions
juhi24commented, Jan 13, 2021

I’m also looking forward for this feature. In the best case scenario, cfgrib.read_datasets would accept a file-like object. In addition to in-memory objects, this could enable stuff like reading files directly from S3 object storage using boto3.

1reaction
Plantaincommented, Sep 20, 2019

This is something that would be useful to me as well - the loading from memory part rather than the zip part. Currently we download GRIB2 files to memory with python, then write them out to disk solely in order to be able to open them with cfgrib. Perhaps this bug should be renamed as I’m not sure the gzip part is relevant. If you give me some pointers I will take a look at getting this implemented.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Mounting memory buffer as a file without writing to disk
Use /dev/fd . Get the file descriptor of the socket, and append that to /dev/fd/ to get the filename. If the data is...
Read more >
When your data doesn't fit in memory: the basic techniques
You can process data that doesn't fit in memory by using four basic techniques: spending money, compression, chunking, and indexing.
Read more >
Chapter 4, In-Memory Computing with Spark - O'Reilly
In-memory cluster computation enables Spark to run iterative algorithms, as programs can checkpoint data and refer back to it without reloading it from...
Read more >
What an in-memory database is and how it persists data ...
It means that each time you query a database or update data in a database, you only access the main memory. So, there's...
Read more >
disk.frame - README
It does this by loading only a small part of the data, called a chunk, into RAM; process the chunk, write out the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found