Discussion: DLPack requires 256B alignment of data pointers
See original GitHub issueRecently we found that DLPack has this requirement noted in the header:
https://github.com/dmlc/dlpack/blob/a02bce20e2dfa0044b9b2ef438e8eaa7b0f95e96/include/dlpack/dlpack.h#L139-L141. Would this be an issue for all adopting libraries? As far as I know, CuPy doesn’t do any alignment check and take the pointer (DLTensor.data
) as is, and IIRC quite a few other libraries are also doing this.
cc: @seberg @rgommers @tqchen @kmaehashi @oleksandr-pavlyk @jakirkham @edloper @szha
Issue Analytics
- State:
- Created 2 years ago
- Comments:19 (11 by maintainers)
Top Results From Across the Web
Python Specification for DLPack - DMLC
The Python specification for DLPack is a part of the Python array API standard. More details about the spec can be found under...
Read more >Encoding Spec - Cap'n Proto
Due to alignment requirements, fields in the data section may be separated by padding. However, later-numbered fields may be positioned into the padding...
Read more >NumPy User Guide
This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There ...
Read more >DLTensor Struct Reference - Apache MXNet
This pointer is always aligned to 256 bytes as in CUDA. ... For given DLTensor, the size of memory required to store the...
Read more >Package List — Spack 0.18.1 documentation
c-ares, perl-statistics-basic, r-assertive-data-us ... Link Dependencies: argtable; Description: Clustal Omega: the last alignment program you'll ever need.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It looks like the answer changed here (or at least my reading of it): (xref https://github.com/dmlc/dlpack/pull/83#issuecomment-964382536). Let me try to summarize:
data
anddata + byte_offset
. Hence aligned loads are not possible today unlessdata + byte_offset
happens to be aligned already.These are the options:
data
pointer to always be aligned (using nonzerobyte_offset
), and do the gradual evolution plan in my comment above.dlpack.h
. no library needs to make any changes (except if current handling ofbyte_offset
is buggy, like @seberg pointed out for PyTorch). NumPy and other new implementers then just usebyte_offset=0
always (easiest), and we’re done.The current status is that the fine print in
dlpack.h
requires alignment (option A1), but no one adheres to it or enforces it. This state is not very useful: it requires a >1 year evolution plan, and apparently there’s no gain because of the third bullet above. So it looks like the best choices are either A2 or A3. A3 seems strictly better than A2, and most of the work it requires (versioning/extensibility) is work we wanted to do for other reasons already.So here’s a new proposal:
byte_offset = 0
anddata
pointing to the first element in memory.dlpack.h
about this topic to reflect: current state, desired future state, and a link to a new issue on the DLPack repo with more info (outcome of this discussion to be summarized on that issue).At this point, all I care about is to have a definite answer which exports NumPy is supposed to reject due to insufficient alignment (or readonly data for that matter). And that this answer is added to
dlpack.h
.pytorch has this:
And it supports slicing, right? So, torch can’t possibly have export guarantees beyond the
itemsize
. It seems safe to assume that nobody else cared about it enough to provide alignment beyond what their library guarantees (otherwise, this question would have come up more often).Now, I don’t know what their allocation strategy is, so I don’t know whether they even guarantee itemsize-alignment for
complex128
on 32bit platforms. Since GCC has this to say:which guarantees 8-bytes while complex-double has 16. So even itemsize-aligned is probably broken (albeit only for complex doubles).
As for their
fromDLPack
function just below: It doesn’t even check ifbyte_offset != 0
, so maybe we should just always keep it 0 and and not think about it for CPU data… (unless you want to open a bug report at pytorch after clarifying this.)As far as I am aware there is no versioning for
__dlpack__
or__dlpack_device__
and no way to move API/ABI forward without a dance using some kind oftry/except
. So, while I find it interesting, I will give up discussing what may be nice to have until that is clarified. (Yes, I am cross. I had explicitly warned about this.).To be painfully clear: I very much still think that DLPack in its current form needs both extension and better extensibility to be the good implicit protocol that data-apis should in my opinion aim for. (I really don’t hate DLPack, but I don’t see that was ever tuned or vetted for this task.)
With that: Sorry, I am going to try and leave you all in peace now… If you want to consider extending DLPack liberally and with versioning, I will be interested (but also happy to stay away to leave you in peace – seriously, just tell me). Until then, I will simply try to just stop caring it.