question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Trouble saving CSR matrix to S3 using scipy.sparse.save_npz

See original GitHub issue

I’m running into an issue where I tried to save a CSR matrix to aws s3 using the testing codes below:

import numpy as np
import pandas as pd
from scipy import sparse
import s3fs

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
csr = sparse.csr_matrix(df.values)

s3 = s3fs.S3FileSystem(anon=False)
s3_path = "<an_aws_s3_path>"
f = s3.open(s3_path, 'wb')
sparse.save_npz(f, csr)

If this is a not current support function, could you provide any leads on good ways to achieve my goal?

Thanks.

Error trace:

...
  File "/Users/yangzhou/code/sml/core/util/csr_matrix_wrapper.py", line 37, in save
    sparse.save_npz(f, self.csr)
  File "/usr/local/lib/python3.7/site-packages/scipy/sparse/_matrix_io.py", line 78, in save_npz
    np.savez_compressed(file, **arrays_dict)
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 667, in savez_compressed
    _savez(file, args, kwds, True)
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 695, in _savez
    zipf = zipfile_factory(file, mode="w", compression=compression)
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 112, in zipfile_factory
    return zipfile.ZipFile(file, *args, **kwargs)
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py", line 1214, in __init__
    self.fp.seek(self.start_dir)
  File "/usr/local/lib/python3.7/site-packages/s3fs/core.py", line 1278, in seek
    raise ValueError('Seek only available in read mode')
ValueError: Seek only available in read mode

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Jul 2, 2019

I suppose you’d prefer

    if not self.readable():
        raise OSError('Seek only available in read mode')

and then you don’t have to edit zipfile

0reactions
xuchencommented, Feb 10, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

Trouble saving CSR matrix to S3 using scipy.sparse.save_npz
I'm running into an issue where I tried to save a CSR matrix to aws s3 using the testing codes below: import numpy...
Read more >
Persisting a Large scipy.sparse.csr_matrix - Stack Overflow
save_npz is using the basic numpy savez to the matrix attributes (3 main arrays) to a zip archive. For some reason, possibly some...
Read more >
scipy.sparse.save_npz — SciPy v1.9.3 Manual
Save a sparse matrix to a file using . npz format. Either the file name (string) or an open file (file-like object) where...
Read more >
How to Create a Sparse Matrix in Python - GeeksforGeeks
Representing a sparse matrix by a 2D array leads to wastage of lots of memory as zeroes in the matrix are of no...
Read more >
scipy.sparse.save_npz — SciPy v1.10.0.dev0+2302.7620ef0 ...
scipy.sparse.save_npz(file, matrix, compressed=True)[source]#. Save a sparse matrix to a file using .npz format. Parameters: filestr or file-like object.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found