struct.error: 'i' format requires -2147483648 <= number <= 2147483647
See original GitHub issueHello, I am using joblib to parallelize the computation of a feature matrix, a large numpy array of floats (~7k rows and ~10k columns, ~70M values).
My code breaks at this point:
user_item_features = Parallel(n_jobs=n_jobs)(
delayed(self._compute_features)(data, recommender, users_list)
for users_list in users_list_chunks
)
with this error:
Traceback (most recent call last):
File "entity2rec/node2vec_recommender.py", line 138, in <module>
n_jobs=args.workers, supervised=False)
File "/home/semantic/Repositories/entity2rec/entity2rec/evaluator.py", line 255, in features
users_list_chunks, n_jobs)
File "/home/semantic/Repositories/entity2rec/entity2rec/evaluator.py", line 269, in _compute_features_parallel
for users_list in users_list_chunks)
File "/home/semantic/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 789, in __call__
self.retrieve()
File "/home/semantic/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 699, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/semantic/anaconda3/lib/python3.6/multiprocessing/pool.py", line 608, in get
raise self._value
File "/home/semantic/anaconda3/lib/python3.6/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/home/semantic/anaconda3/lib/python3.6/site-packages/joblib/pool.py", line 372, in send
self._writer.send_bytes(buffer.getvalue())
File "/home/semantic/anaconda3/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/semantic/anaconda3/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
I have obtained this error using Linux and different versions of Python:
- python 3.6.6, 3.6.3, 3.6.0
- joblib 0.11
Any help would be appreciated. Thank you for your work. Enrico
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
python struct.error: 'i' format requires -2147483648 <= number ...
You produced an object that when pickled is larger than fits in a i struct formatter (a four-byte signed integer), which breaks the...
Read more >struct.error: 'i' format requires -2147483648 <= number ...
This issue usually arises when the dataset is very large and the image resizing value for the image is set to a large...
Read more >Python multiprocessing struct.error: 'i' format requires
error : 'i' format requires -2147483648 <= number <= 2147483647. I have a huge text data and at certain amount of data (usually...
Read more >Handing Struct Error when Using Python Multi-processing Pool
Pool to process large data, I met this strange error. struct.error: 'i' format requires -2147483648 <= number <= 2147483647.
Read more >[Example code]-python struct.error: 'i' format requires
Coding example for the question python struct.error: 'i' format requires -2147483648 <= number <= 2147483647.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Alright, it’s caused by a limitation in a low level multiprocessing routine. If your dictionary does not change often, it’s better to serialize it on the disk (using pickle.load / pickle.dump from the standard library which is faster than joblib for this kinds of objects) and then load it in your workers in your own code instead of passing it as an argument to the parallel function.
We cannot do anything at the joblib level in this case, so closing.
Hi @ogrisel, I’m facing a similar issue and can’t upgrade to py38 due to other compatibility issues. Can you please elaborate on the workaround you mentioned here a little more? Do you mean to use the multiprocessing backend which uses the native pickle library to serialize data? ref–> https://joblib.readthedocs.io/en/latest/auto_examples/serialization_and_wrappers.html