question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Discussion: DLPack requires 256B alignment of data pointers

See original GitHub issue

Recently we found that DLPack has this requirement noted in the header: https://github.com/dmlc/dlpack/blob/a02bce20e2dfa0044b9b2ef438e8eaa7b0f95e96/include/dlpack/dlpack.h#L139-L141. Would this be an issue for all adopting libraries? As far as I know, CuPy doesn’t do any alignment check and take the pointer (DLTensor.data) as is, and IIRC quite a few other libraries are also doing this.

cc: @seberg @rgommers @tqchen @kmaehashi @oleksandr-pavlyk @jakirkham @edloper @szha

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:19 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
rgommerscommented, Nov 9, 2021

The use of byte_offset for opaque data pointers is interesting. Do you suggest to also use it for non-opaque data pointers to enforce/allow a bigger allocation alignment guarantee then data alignment? Is that useful, scanning that initial link @leofang posted sounded like it might be?

Yes this would be really helpful. Knowing the base allocation pointer, offset, and alignment often allows for much more efficient code.

It looks like the answer changed here (or at least my reading of it): (xref https://github.com/dmlc/dlpack/pull/83#issuecomment-964382536). Let me try to summarize:

  • Not all libraries have aligned allocators, certainly not to 256 bytes.
  • There is no place currently for the producer to put information about the alignment of the data in the dlpack capsule it is exporting. It cannot be added until there is a version attribute and a way to extend DLPack (separate issue that needs tackling, state of today is that that doesn’t exist).
  • The consumer cannot know whether it is safe to read memory between data and data + byte_offset. Hence aligned loads are not possible today unless data + byte_offset happens to be aligned already.
  • And no aligned loads for opaque data pointers may be possible at all, unless there’s a device/language guarantee about this (quoting @seberg: “Is a 256 byte alignment guaranteed for any possible opaque pointer?! Because if it is not, I don’t see any point for this guarantee”). It is not documented for which devices this is the case.

These are the options:

  1. A1: required alignment. Require the data pointer to always be aligned (using nonzero byte_offset), and do the gradual evolution plan in my comment above.
  2. A2: no alignment. remove the allocation requirement completely from dlpack.h. no library needs to make any changes (except if current handling of byte_offset is buggy, like @seberg pointed out for PyTorch). NumPy and other new implementers then just use byte_offset=0 always (easiest), and we’re done.
  3. A3: optional alignment. Do not require alignment, but add a way to communicate from the producer to the consumer what the alignment of the data is.

The current status is that the fine print in dlpack.h requires alignment (option A1), but no one adheres to it or enforces it. This state is not very useful: it requires a >1 year evolution plan, and apparently there’s no gain because of the third bullet above. So it looks like the best choices are either A2 or A3. A3 seems strictly better than A2, and most of the work it requires (versioning/extensibility) is work we wanted to do for other reasons already.

So here’s a new proposal:

  • Decide that the long-term desired state is A3: optional alignment
  • NumPy and other new implementers to do whatever is simplest, i.e. to use byte_offset = 0 and data pointing to the first element in memory.
  • Update the comment in dlpack.h about this topic to reflect: current state, desired future state, and a link to a new issue on the DLPack repo with more info (outcome of this discussion to be summarized on that issue).
2reactions
sebergcommented, Oct 27, 2021

At this point, all I care about is to have a definite answer which exports NumPy is supposed to reject due to insufficient alignment (or readonly data for that matter). And that this answer is added to dlpack.h.

pytorch has this:

  atDLMTensor->tensor.dl_tensor.byte_offset = 0;

And it supports slicing, right? So, torch can’t possibly have export guarantees beyond the itemsize. It seems safe to assume that nobody else cared about it enough to provide alignment beyond what their library guarantees (otherwise, this question would have come up more often).

Now, I don’t know what their allocation strategy is, so I don’t know whether they even guarantee itemsize-alignment for complex128 on 32bit platforms. Since GCC has this to say:

The address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems). (link(

which guarantees 8-bytes while complex-double has 16. So even itemsize-aligned is probably broken (albeit only for complex doubles).

As for their fromDLPack function just below: It doesn’t even check if byte_offset != 0, so maybe we should just always keep it 0 and and not think about it for CPU data… (unless you want to open a bug report at pytorch after clarifying this.)


As far as I am aware there is no versioning for __dlpack__ or __dlpack_device__ and no way to move API/ABI forward without a dance using some kind of try/except. So, while I find it interesting, I will give up discussing what may be nice to have until that is clarified. (Yes, I am cross. I had explicitly warned about this.).

To be painfully clear: I very much still think that DLPack in its current form needs both extension and better extensibility to be the good implicit protocol that data-apis should in my opinion aim for. (I really don’t hate DLPack, but I don’t see that was ever tuned or vetted for this task.)

With that: Sorry, I am going to try and leave you all in peace now… If you want to consider extending DLPack liberally and with versioning, I will be interested (but also happy to stay away to leave you in peace – seriously, just tell me). Until then, I will simply try to just stop caring it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python Specification for DLPack - DMLC
The Python specification for DLPack is a part of the Python array API standard. More details about the spec can be found under...
Read more >
Encoding Spec - Cap'n Proto
Due to alignment requirements, fields in the data section may be separated by padding. However, later-numbered fields may be positioned into the padding...
Read more >
NumPy User Guide
This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There ...
Read more >
DLTensor Struct Reference - Apache MXNet
This pointer is always aligned to 256 bytes as in CUDA. ... For given DLTensor, the size of memory required to store the...
Read more >
Package List — Spack 0.18.1 documentation
c-ares, perl-statistics-basic, r-assertive-data-us ... Link Dependencies: argtable; Description: Clustal Omega: the last alignment program you'll ever need.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found