Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] functions on elements of 1 dimension: reorder (concatenate), and chunk

See original GitHub issue

Thank you for making our life easier when working with tensors. I have the following suggestions based on #50 and #20.

A. Reorder and concatenation of items of different shapes

A.1 Reorder elements of 1 dimension

As suggested in #50, it is indeed useful when we have an operation for reordering the elements of channels, especially for those working on images with different libraries (open-cv, PIL). It is really better than doing with boring indices.

I totally agree with @remisphere that we can use reorder without misleading to users.

# instead of doing this
out = imgs[:, [2, 0, 1, 3], :, : ]
# we can use the below
einops.reorder(imgs, 'batch [rg b a -> b rg a] h w', rg=2, b=1, a=1)

A.2 Concatenation of items of different sizes on 1 dimension

Since we only perform operations on the single dimension, we can perform the concatenation of multiple items with different sizes on that dimension. This will easily handle the case mentioned in #20 and extremely useful for those who use concatenate in their code. I use this function many times to concatenate tensors of different shapes. For example:

# three below tensors have different size on the 2nd dim
print(x.shape) # [b, 10]
print(y.shape) # [b, 15]
print(z.shape) # [b, 20]

# we can concatenate them as
inputs = [x, y, z]
out = einops.reorder(inputs, 'batch [x y z -> x y z]', x=10, y=15, z=20)

The above call is consistent with einops.rearrange to concatenate inputs including items of the same shape.

It is possible to split out into their components x, y, z with three lines using the below chunk function:

x = einops.chunk(out, 'batch [x yz -> x]', x=10)
y = einops.chunk(out, 'batch [x y z -> y]', x=10, y=15)
z = einops.chunk(out, 'batch [xy z -> z]', z=20)

B. Chunking along 1 dimension

In contrast with #50, I don’t think it is a good idea to merge chunking into reorder. We can separate these functionalities into the above reorder and chunk. Chunking is used frequently when we want to sample parts of datasets and features.

Example in #50:

# remove the alpha channel and the bottom half of 256*256 images:
einops.chunk(imgs, 'batch [rg b a -> b rg] [top bottom -> top] w', rg=2, b=1, top=128, batch=10)

Split dataset into train and val

train_len = int(len(dataset) * 0.8)
train_split = einops.chunk(dataset, '[train val -> train] c h w', train=train_len)
val_split = einops.chunk(dataset, '[train val -> val] c h w', train=train_len)

And we can get the full dataset given train_split and val_split:

dataset = einops.reorder([train_split, val_split], '[train val -> train val] c h w', train=len(train_split), val=len(val_split))

Issue Analytics

State:
Created 3 years ago
Comments:10 (4 by maintainers)

Top GitHub Comments

3reactions

arogozhnikovcommented, Nov 7, 2021

@p4perf4ce thanks for thinking about that loudly with examples.

I’ve been poking around with an operation semantic (I’ve dubbed in rechunk), it has some overlap with your suggestion.

One critical choice: how to specify which axis is modified?

When both input and output are represented in full shape (as in your suggestion) - packs too much into a single operation, and does not focus on axis. It is unclear what a user should focus on.

In my experiments I’ve landed on a very similar “list” in pattern and list as input/output. In your suggestion there is an exceptional case of single element in a list, and introduction of special cases should be avoided.

I have converged on:

[result] = rechunk([a, b, c], '[x,y,z] -> [x+y+z]', axis='b h w *')

It is possible to also support something like:

result = rechunk([a, b, c], '[x,y,z] -> x+y+z', axis='b h w *')

…but static code analysis would get crazy - easier to just always input and output lists.

Problem with an arbitrary number of inputs is a hard one. I’ve had something like this:

[result] = rechunk([a, b, c], '[*x] -> [concat(x)]', axis='b h w *')

… too complex, no need for this flexibility. Following could completely cover all necessary cases.

[result] = rechunk([a, b, c], 'concatenate', axis='b h w *')

There is a natural requirement to have an “inversion” to concatenation (which can properly work only if pattern contains information about a single axis).

I can post more detailed RFC with suggestion if that’s something interesting to discuss, but I’ll not be able to dedicate time for implementing/supporting.

2reactions

arogozhnikovcommented, Nov 8, 2022

Thanks for the discussion folks!

Brand new einops.pack and einops.unpack cover common cases for concatenate and chunk, so closing this

https://github.com/arogozhnikov/einops/blob/master/docs/4-pack-and-unpack.ipynb

Top Results From Across the Web

Multi-dimensional Arrays - Julia Documentation

This will construct a 1-d array with element type T , initialized to contain elements A , B , C , etc. For...

LTE : RLC - 4G | ShareTechnote

Example 1 : in LTE - RLC Transmission from Network (IP Throughput) ... (D) RLC layer concatenate (combine) these multiple packets (SDU) into...

Reference guide for expression functions - Azure Logic Apps

Reference guide to workflow expression functions for Azure Logic Apps and Power Automate.

Merge sort algorithm overview (article) | Khan Academy

Conquer by recursively sorting the subarrays in each of the two subproblems created by the divide step. That is, recursively sort the subarray...

flatten - Functions - Configuration Language | Terraform

Flattening nested structures for for_each ... The resource for_each and dynamic block language features both require a collection value that has one element...