[Feature Request] functions on elements of 1 dimension: reorder (concatenate), and chunk
See original GitHub issueThank you for making our life easier when working with tensors. I have the following suggestions based on #50 and #20.
A. Reorder and concatenation of items of different shapes
A.1 Reorder elements of 1 dimension
As suggested in #50, it is indeed useful when we have an operation for reordering the elements of channels, especially for those working on images with different libraries (open-cv, PIL). It is really better than doing with boring indices.
I totally agree with @remisphere that we can use reorder
without misleading to users.
# instead of doing this
out = imgs[:, [2, 0, 1, 3], :, : ]
# we can use the below
einops.reorder(imgs, 'batch [rg b a -> b rg a] h w', rg=2, b=1, a=1)
A.2 Concatenation of items of different sizes on 1 dimension
Since we only perform operations on the single dimension, we can perform the concatenation of multiple items with different sizes on that dimension. This will easily handle the case mentioned in #20 and extremely useful for those who use concatenate
in their code. I use this function many times to concatenate tensors of different shapes. For example:
# three below tensors have different size on the 2nd dim
print(x.shape) # [b, 10]
print(y.shape) # [b, 15]
print(z.shape) # [b, 20]
# we can concatenate them as
inputs = [x, y, z]
out = einops.reorder(inputs, 'batch [x y z -> x y z]', x=10, y=15, z=20)
The above call is consistent with einops.rearrange
to concatenate inputs including items of the same shape.
It is possible to split out
into their components x, y, z
with three lines using the below chunk
function:
x = einops.chunk(out, 'batch [x yz -> x]', x=10)
y = einops.chunk(out, 'batch [x y z -> y]', x=10, y=15)
z = einops.chunk(out, 'batch [xy z -> z]', z=20)
B. Chunking along 1 dimension
In contrast with #50, I don’t think it is a good idea to merge chunking
into reorder
.
We can separate these functionalities into the above reorder
and chunk
. Chunking is used frequently when we want to sample parts of datasets and features.
Example in #50:
# remove the alpha channel and the bottom half of 256*256 images:
einops.chunk(imgs, 'batch [rg b a -> b rg] [top bottom -> top] w', rg=2, b=1, top=128, batch=10)
Split dataset into train
and val
train_len = int(len(dataset) * 0.8)
train_split = einops.chunk(dataset, '[train val -> train] c h w', train=train_len)
val_split = einops.chunk(dataset, '[train val -> val] c h w', train=train_len)
And we can get the full dataset given train_split
and val_split
:
dataset = einops.reorder([train_split, val_split], '[train val -> train val] c h w', train=len(train_split), val=len(val_split))
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (4 by maintainers)
Top GitHub Comments
@p4perf4ce thanks for thinking about that loudly with examples.
I’ve been poking around with an operation semantic (I’ve dubbed in
rechunk
), it has some overlap with your suggestion.One critical choice: how to specify which axis is modified?
When both input and output are represented in full shape (as in your suggestion) - packs too much into a single operation, and does not focus on axis. It is unclear what a user should focus on.
In my experiments I’ve landed on a very similar “list” in pattern and list as input/output. In your suggestion there is an exceptional case of single element in a list, and introduction of special cases should be avoided.
I have converged on:
It is possible to also support something like:
…but static code analysis would get crazy - easier to just always input and output lists.
Problem with an arbitrary number of inputs is a hard one. I’ve had something like this:
… too complex, no need for this flexibility. Following could completely cover all necessary cases.
There is a natural requirement to have an “inversion” to concatenation (which can properly work only if pattern contains information about a single axis).
I can post more detailed RFC with suggestion if that’s something interesting to discuss, but I’ll not be able to dedicate time for implementing/supporting.
Thanks for the discussion folks!
Brand new
einops.pack
andeinops.unpack
cover common cases for concatenate and chunk, so closing thishttps://github.com/arogozhnikov/einops/blob/master/docs/4-pack-and-unpack.ipynb