question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Array > 2GB hitting `msgpack` limit

See original GitHub issue

I know that there’s supposed to be alternate protocols used for larger arrays but I’m not sure what needs to done to use them (or if they don’t play nice with map_blocks?

MRE:

import numpy
import dask.array as da
from distributed import Client

client = Client('127.0.0.1:8786')
def increment_by_one(my_array):
    return my_array + 1

data = numpy.random.random(300000000)
chunks = (10000,)
data = da.from_array(data, chunks=chunks)
output = da.map_blocks(increment_by_one, data)
output.compute()

Traceback:

distributed.utils - ERROR - 2400000161 exceeds max_bin_len(2147483647)
Traceback (most recent call last):
  File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/utils.py", line 207, in log_errors
    yield
  File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/client.py", line 460, in _handle_report
    six.reraise(*clean_exception(**msg))
  File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/tornado/concurrent.py", line 237, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/tornado/gen.py", line 1024, in run
    yielded = self.gen.send(value)
  File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/core.py", line 258, in read
    msg = protocol.loads(frames)
  File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/protocol.py", line 152, in loads
    msg = loads_msgpack(small_header, small_payload)
  File "/home/gil/anaconda/envs/dasknumbagpu/lib/python3.5/site-packages/distributed/protocol.py", line 256, in loads_msgpack
    return msgpack.loads(payload, encoding='utf8')
  File "pandas/msgpack/_unpacker.pyx", line 138, in pandas.msgpack._unpacker.unpackb (pandas/msgpack/_unpacker.cpp:2059)
ValueError: 2400000161 exceeds max_bin_len(2147483647)

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
gforsythcommented, Apr 2, 2020

Hey @RokoMijic – can you open a new issue for this? Thanks!

0reactions
RokoMijiccommented, Apr 5, 2020

OK, I’ll see whether I can do it.

I eventually resolved the problem by writing my own outer merge function and doing it in Pandas, as Dask was unstable and crashed every time.

https://stackoverflow.com/questions/61026417/how-do-you-efficiently-outer-merge-large-pandas-dataframes-whilst-preserving-dat/

Read more comments on GitHub >

github_iconTop Results From Across the Web

c# - Can you use List<List<struct>> to get around the 2gb ...
The number of references you can hold on a 32-bit machine before the List hits the 2GB limit is 536.87 million, on a...
Read more >
NativeMemoryArray — A library that takes full advantage of ...
This value of 2GB is the limit of int Length. However, nowadays, we often deal with large values, such as 4K/8K video, large...
Read more >
MessagePack: It's like JSON. but fast and small.
MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON. But it's faster and smaller.
Read more >
Requested array size exceeds VM limit
Java has got a limit on the maximum array size your program can allocate. The exact limit is platform-specific but is generally somewhere...
Read more >
Sample records for fast signal readout
Fast Readout Architectures for Large Arrays of Digital Pixels: Examples and ... The main limitation of CMOS sensors is represented by their poor...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found