How to use Convolution operator as the expert?
See original GitHub issueHi, I am trying to train an convolution-backbone network with MoE. There are two difficulties encountered. The first difficulty is that current API seems unable to directly use. The parameter of class FMoE requires the hidden dimension, but the convolution layer actually does not define the hidden dimension explicity.
Then, I find the FMoE class cannot accept tensor with dimension greater than 2. Therefore, I guess I cannot directly pass the image (with shape N, C, H, W) into the layer? My code snippet is
from fmoe.layers import FMoE
import torch
from fmoe.gates import NaiveGate,SwitchGate
N=3
num_expert=2
hidden_size=5
out_feature=4
layer=torch.nn.Linear(in_features=hidden_size,out_features=out_feature).to("cuda")
layer.weight=torch.nn.Parameter(torch.ones_like(layer.weight))
my_moe=FMoE(num_expert=num_expert,d_model=hidden_size,top_k=1,expert=layer,gate=SwitchGate).to("cuda")
inputs=torch.rand((N,1,hidden_size)).to("cuda")
print(my_moe(inputs))
Here I use the linear layer as the expert just to test the input dimension. The error information is
Traceback (most recent call last):
File "/home/zyli/fastmoe/try.py", line 15, in <module>
print(my_moe(inputs))
File "/home/zyli/anaconda3/envs/QMoE/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zyli/fastmoe/fmoe/layers.py", line 241, in forward
experts=self.experts
File "/home/zyli/fastmoe/fmoe/layers.py", line 78, in _fmoe_general_global_forward
outp = tree.map_structure(gather_func, x)
File "/home/zyli/anaconda3/envs/QMoE/lib/python3.7/site-packages/tree/__init__.py", line 430, in map_structure
[func(*args) for args in zip(*map(flatten, structures))])
File "/home/zyli/anaconda3/envs/QMoE/lib/python3.7/site-packages/tree/__init__.py", line 430, in <listcomp>
[func(*args) for args in zip(*map(flatten, structures))])
File "/home/zyli/fastmoe/fmoe/layers.py", line 75, in gather_func
world_size,
File "/home/zyli/fastmoe/fmoe/functions.py", line 171, in forward
maybe_overlap=False)
File "/home/zyli/fastmoe/fmoe/functions.py", line 89, in _local_gather
inp_buf.index_copy_(0, pos, inp)
IndexError: index_copy_(): When source and destination are not scalars, their dimensionality must match. Source dimensionality (3), destination dimensionality (2)
One possible solution I think is to first apply img2col to the input so that the convolution is transformed to matrix multiplication, but this incurs oblivious overhead. Or I need to modify the implementation of the class FMoE. Both of them are not elegant, so is there any idea to do this?
Issue Analytics
- State:
- Created a year ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
You are right. I see your consideration🤔. Thanks again😁.
Thanks😁! I get your point. BTW, FMoE does not have the document yet?