Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use Convolution operator as the expert?

See original GitHub issue

Hi, I am trying to train an convolution-backbone network with MoE. There are two difficulties encountered. The first difficulty is that current API seems unable to directly use. The parameter of class FMoE requires the hidden dimension, but the convolution layer actually does not define the hidden dimension explicity.

Then, I find the FMoE class cannot accept tensor with dimension greater than 2. Therefore, I guess I cannot directly pass the image (with shape N, C, H, W) into the layer? My code snippet is

from fmoe.layers import FMoE
import torch
from fmoe.gates import NaiveGate,SwitchGate
N=3
num_expert=2

hidden_size=5
out_feature=4
layer=torch.nn.Linear(in_features=hidden_size,out_features=out_feature).to("cuda")
layer.weight=torch.nn.Parameter(torch.ones_like(layer.weight))
my_moe=FMoE(num_expert=num_expert,d_model=hidden_size,top_k=1,expert=layer,gate=SwitchGate).to("cuda")
inputs=torch.rand((N,1,hidden_size)).to("cuda")
print(my_moe(inputs))

Here I use the linear layer as the expert just to test the input dimension. The error information is

Traceback (most recent call last):
  File "/home/zyli/fastmoe/try.py", line 15, in <module>
    print(my_moe(inputs))
  File "/home/zyli/anaconda3/envs/QMoE/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zyli/fastmoe/fmoe/layers.py", line 241, in forward
    experts=self.experts
  File "/home/zyli/fastmoe/fmoe/layers.py", line 78, in _fmoe_general_global_forward
    outp = tree.map_structure(gather_func, x)
  File "/home/zyli/anaconda3/envs/QMoE/lib/python3.7/site-packages/tree/__init__.py", line 430, in map_structure
    [func(*args) for args in zip(*map(flatten, structures))])
  File "/home/zyli/anaconda3/envs/QMoE/lib/python3.7/site-packages/tree/__init__.py", line 430, in <listcomp>
    [func(*args) for args in zip(*map(flatten, structures))])
  File "/home/zyli/fastmoe/fmoe/layers.py", line 75, in gather_func
    world_size,
  File "/home/zyli/fastmoe/fmoe/functions.py", line 171, in forward
    maybe_overlap=False)
  File "/home/zyli/fastmoe/fmoe/functions.py", line 89, in _local_gather
    inp_buf.index_copy_(0, pos, inp)
IndexError: index_copy_(): When source and destination are not scalars, their dimensionality must match. Source dimensionality (3), destination dimensionality (2)

One possible solution I think is to first apply img2col to the input so that the convolution is transformed to matrix multiplication, but this incurs oblivious overhead. Or I need to modify the implementation of the class FMoE. Both of them are not elegant, so is there any idea to do this?