Minsize argument for Dict.empty()
See original GitHub issueFeature request
Being able to allocate memory for a Dict based on problem-specific information would minimize resizes and increase performance. A Dict is initialized by a call to the function numba_dict_new_minsize in dictobject.c. The starting number of buckets is fixed at D_MINSIZE = 8, which only allows for 5 entries without a resize.
The code comments state that this is suitable for the common case of a small dictionary used for passing keyword arguments. I believe an optional size argument would give users the ability to achieve significantly better performance with large hash tables. This would be in line with Numba’s focus on fast numerical computation.
I’ve tried to modify Numba so that I could pass a size argument to Dict.empty() and have it call numba_dict_new instead of numba_dict_new_minsize. Unfortunately, I haven’t been successful so far. Please consider adding such an option in a future release. Thank you.
Issue Analytics
- State:
- Created a year ago
- Comments:9 (4 by maintainers)

Top Related StackOverflow Question
Apologies for the delay in responding. I’m working on uploading the (unsuccessful) changes I made to a fork.
I believe typed lists have this feature under the
empty_listmethod. The argument is calledallocated.Ah - I hadn’t noticed that
_helpermod.calready contained some of the dict_new methods - thanks for spotting that it’s simpler than I suggested it could be 🙂If my maths is correct:
On a 32-bit system, that is still a large size (1GB of contiguous allocation) for a system that can only have 2GB of addressable memory per process (On both Windows and Linux x86, at least). I’d be inclined not to worry about this (do let me know if I seem to have an incorrect calculation or assumption here though please 🙂)
Probably not - it sounds more succinct as it is.
Yes, I think your progress is looking great and we’re a lot of the way there with this, so I think opening a PR would be totally appropriate and a really helpful way for us to iterate this to completion.
I’ll have a look through what the possibilities are here, since I think you’re correct in identifying that it’s not just obvious how to test this. In the meantime if you make any changes you were going to make and open the PR, and I’ll add some suggestions for testing in comments on the PR.
Thanks for your work and perseverance so far!