Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Minsize argument for Dict.empty()

See original GitHub issue

Feature request

Being able to allocate memory for a Dict based on problem-specific information would minimize resizes and increase performance. A Dict is initialized by a call to the function numba_dict_new_minsize in dictobject.c. The starting number of buckets is fixed at D_MINSIZE = 8, which only allows for 5 entries without a resize.

The code comments state that this is suitable for the common case of a small dictionary used for passing keyword arguments. I believe an optional size argument would give users the ability to achieve significantly better performance with large hash tables. This would be in line with Numba’s focus on fast numerical computation.

I’ve tried to modify Numba so that I could pass a size argument to Dict.empty() and have it call numba_dict_new instead of numba_dict_new_minsize. Unfortunately, I haven’t been successful so far. Please consider adding such an option in a future release. Thank you.

Issue Analytics

State:
Created a year ago
Comments:9 (4 by maintainers)

Top GitHub Comments

1reaction

stefanfedcommented, May 31, 2022

Apologies for the delay in responding. I’m working on uploading the (unsuccessful) changes I made to a fork.

I believe typed lists have this feature under the empty_list method. The argument is called allocated.

0reactions

gmarkallcommented, Jun 20, 2022

I was able to fix the LLVM error by adding a single line to _helpermod.c. Thanks for pointing me there.

Ah - I hadn’t noticed that _helpermod.c already contained some of the dict_new methods - thanks for spotting that it’s simpler than I suggested it could be 🙂

Because Py_ssize_t is signed, it overflows when the next size is 2^63 on 64-bit, for example. I really doubt anyone has anywhere near enough memory to get to there, so this won’t be a problem on 64-bit (though it could cause a crash if someone tried it). However, it could be a substantial problem on 32-bit. I wasn’t sure how to resolve this - I’d appreciate your advice.

If my maths is correct:

In [2]: 2 ** (8 * 4 - 2)
Out[2]: 1073741824

On a 32-bit system, that is still a large size (1GB of contiguous allocation) for a system that can only have 2GB of addressable memory per process (On both Windows and Linux x86, at least). I’d be inclined not to worry about this (do let me know if I seem to have an incorrect calculation or assumption here though please 🙂)

Currently, the only functionality of numba_dict_new_minsize is to call numba_dict_new with size=D_MINSIZE. Since we can guarantee D_MINSIZE is a power of 2, we don’t need to round it up. Therefore, in numba_dict_new, I only attempt to round up if size != D_MINSIZE. An alternative is to copy all the code from numba_dict_new to numba_dict_new_minsize without the rounding up part. Would you say that’s a good idea?

Probably not - it sounds more succinct as it is.

Should I open a pull request?

Yes, I think your progress is looking great and we’re a lot of the way there with this, so I think opening a PR would be totally appropriate and a really helpful way for us to iterate this to completion.

I noticed that you require unit tests. I’m not 100% sure on this part. What kind of tests would you like me to write?

I’ll have a look through what the possibilities are here, since I think you’re correct in identifying that it’s not just obvious how to test this. In the meantime if you make any changes you were going to make and open the PR, and I’ll add some suggestions for testing in comments on the PR.

Thanks for your work and perseverance so far!

Top Results From Across the Web

Why is the empty dictionary a dangerous default value in ...

I put a dict as the default value for an optional argument to a Python function, and pylint (using Sublime package) told me...

Speedup empty dict creation and reduce its memory usage

dict.clear() make the dict to empty key-sharing dict to reduce it's size. New dict can use same technique. $ ./python.default Python 3.7.0a0 ...

Cerberus Usage — Python data validation library

Then you simply invoke the validate() to validate a dictionary against the schema. ... String fields with empty values will still be validated, ......

Python GUI Programming With Tkinter - Real Python

A blank Tkinter application window on Windows 10, macOS, ... This .delete() method takes an integer argument that tells Python which ...

plotly.express.scatter_mapbox — 5.11.0 documentation

data_frame (DataFrame or array-like or dict) – This argument needs to be passed for ... API token to be set using plotly.express.set_mapbox_access_token() ....