question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

savez fails on large array of objects

See original GitHub issue

I’m getting a RuntimeError “File size unexpectedly exceeded ZIP64 limit” when using savez to save a few arrays to a file, the largest of which is an array of matrices each with a different number of rows (so that the top level array’s dtype is object). As best I can tell, the condition used by numpy internally to decide whether or not to pass force_zip64=True to the zipfile open is checking if any of the arrays to be put into the archive has nbytes greater than 2^30. My array of matrices reports nbytes less than 2^30 but in reality the total number of bytes exceeds this (it’s about 1.7GB). I’m using python 3.6 and numpy 1.14.2 on Ubuntu 16.04.

The following code produces the error:

test_data = np.asarray([np.random.rand(np.random.randint(50,100),4) for i in range(800000)])
np.savez('test', test_data=test_data)

whereas changing np.random.randint(50,100)->75 produces no error.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:14 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
belltailjpcommented, Mar 18, 2019

Hi. I have reported #13153 and noticed that this issue is due to the same root cause.

The problem of current numpy is that _savez doesn’t correctly get the actual size of the data in case it is either a dict or a list (of np arrays), so I think modifying this part so that it can properly identify the correct size would solve the issue, while keeping the current conditional force_zip64 behavior.

Let me propose the brief direction of this way below, as a modification of line 727-728. https://github.com/numpy/numpy/blob/d89c4c7e7850578e5ee61e3e09abd86318906975/numpy/lib/npyio.py#L727-L728

    if sys.version_info >= (3, 6):
         # Since Python 3.6 it is possible to write directly to a ZIP file.
         for key, val in namedict.items():
             fname = key + '.npy'
+            nbytes = 0
+            if isinstance(val, dict):
+                nbytes = sum(v.nbytes for v in val.values())
+            elif isinstance(val, list):
+                nbytes = sum(v.nbytes for v in val)
             val = np.asanyarray(val)
+            nbytes += val.nbytes
             force_zip64 = val.nbytes >= 2**30
             with zipf.open(fname, 'w', force_zip64=force_zip64) as fid:

Again, I know that this change is still totally incomplete and dirty, but since it is quite different approach from what is proposed above, I just wanted to show a direction first.

This would solve the both cases with list and dict (be noted that I didn’t convert the list to np.array (first line) here for simplicity), although it sets force_zip64 only when necessary.

np.savez('test', [np.random.rand(np.random.randint(50,100),4) for i in range(800000)])
np.savez('test', {'a': np.random.rand(1024, 1024, 256)})
0reactions
belltailjpcommented, Mar 19, 2019

I agree, that would be the simplest and the best. I just felt like it’s a bit too long to wait until np+py2 EOL (almost) indefinitely for this kind of a bug, but as you mention it’ll ready in 1.17, that’s OK for me too (of course I want this fix right now, though:) ).

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to save dictionaries and arrays in the same archive (with ...
savez is intended to store arrays, rather than arbitrary objects. Because of the way it works, it can store completely arbitrary objects, ...
Read more >
numpy.savez_compressed — NumPy v1.24 Manual
Save several arrays into a single file in compressed . npz format. Provide arrays as keyword arguments to store them under the corresponding...
Read more >
How to Save a NumPy Array to File for Machine Learning
The savez_compressed() NumPy function allows multiple NumPy arrays to be saved to a single compressed .npz file. 3.1 Example of Saving a NumPy ......
Read more >
JavaScript typed arrays - MDN Web Docs
JavaScript typed arrays are array-like objects that provide a mechanism for ... 32-bit two's complement signed integer, long, int32_t.
Read more >
Chapter 4. NumPy Basics: Arrays and Vectorized Computation
One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found