Array > 2GB hitting `msgpack` limit
See original GitHub issueI know that there’s supposed to be alternate protocols used for larger arrays but I’m not sure what needs to done to use them (or if they don’t play nice with map_blocks
?
MRE:
import numpy
import dask.array as da
from distributed import Client
client = Client('127.0.0.1:8786')
def increment_by_one(my_array):
return my_array + 1
data = numpy.random.random(300000000)
chunks = (10000,)
data = da.from_array(data, chunks=chunks)
output = da.map_blocks(increment_by_one, data)
output.compute()
Traceback:
distributed.utils - ERROR - 2400000161 exceeds max_bin_len(2147483647)
Traceback (most recent call last):
File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/utils.py", line 207, in log_errors
yield
File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/client.py", line 460, in _handle_report
six.reraise(*clean_exception(**msg))
File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/tornado/gen.py", line 1024, in run
yielded = self.gen.send(value)
File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/core.py", line 258, in read
msg = protocol.loads(frames)
File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/protocol.py", line 152, in loads
msg = loads_msgpack(small_header, small_payload)
File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/protocol.py", line 256, in loads_msgpack
return msgpack.loads(payload, encoding='utf8')
File "pandas/msgpack/_unpacker.pyx", line 138, in pandas.msgpack._unpacker.unpackb (pandas/msgpack/_unpacker.cpp:2059)
ValueError: 2400000161 exceeds max_bin_len(2147483647)
Issue Analytics
- State:
- Created 7 years ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
c# - Can you use List<List<struct>> to get around the 2gb ...
The number of references you can hold on a 32-bit machine before the List hits the 2GB limit is 536.87 million, on a...
Read more >NativeMemoryArray — A library that takes full advantage of ...
This value of 2GB is the limit of int Length. However, nowadays, we often deal with large values, such as 4K/8K video, large...
Read more >MessagePack: It's like JSON. but fast and small.
MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it's faster and smaller.
Read more >Requested array size exceeds VM limit
Java has got a limit on the maximum array size your program can allocate. The exact limit is platform-specific but is generally somewhere...
Read more >Sample records for fast signal readout
Fast Readout Architectures for Large Arrays of Digital Pixels: Examples and ... The main limitation of CMOS sensors is represented by their poor...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey @RokoMijic – can you open a new issue for this? Thanks!
OK, I’ll see whether I can do it.
I eventually resolved the problem by writing my own outer merge function and doing it in Pandas, as Dask was unstable and crashed every time.
https://stackoverflow.com/questions/61026417/how-do-you-efficiently-outer-merge-large-pandas-dataframes-whilst-preserving-dat/