question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Thanks for sharing the nice model implementation.

image

When I start training, the following warning appears, do you also get the same message? I think it’s a fairseq installation problem. No module named 'lightconv_cuda'

And I’m training in batch size 5… on 24G memory sized RTX 3090. Could the above problem be the cause?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

6reactions
LEECHOONGHOcommented, Sep 15, 2021

Thanks for your help. I’ll try it!

I reach the same.

Is there any problem when I keep training with that problem: “No module named ‘lightconv_cuda’” ?

If you have solved on fairseq, can you share a little about your config and environment, I have also tried #5 but too much error,

Anyway, Thank you too much @keonlee9420

One more thing just for discussion is that why the batch size of this model is too small! , the maximum I can set is 4, while in Tacotron2 is 64 😃)

No, I couldn’t solved fairseq installing problem. Maybe it requires to reinstall cuda or version up it to 11.0

Instead, I use my own lightweight_conv module. Insert the code below in Parallel-Tacotron2/model/blocks and removefrom fairseq.modules import LightweightConv in the same file.

Whether you do this or not, the program runs and you can only train with very low batch sizes. And the loss stays around 70 and it doesn’t seemed to be trained properly.

class LightweightConv(nn.Module):
    def __init__(
        self,
        num_channels,
        kernel_size,
        padding_l,
        weight_softmax,
        num_heads,
        weight_dropout,
        stride=1,
        dilation=1,
        bias=True,
                ):
        super(LightweightConv, self).__init__()
        
        self.channels = num_channels
        self.heads = num_heads
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding_l
        self.dilation = dilation
        self.dropout_p = weight_dropout
        self.bias = bias
        self.weight_softmax = weight_softmax
        
        self.weights = nn.Parameter(torch.Tensor(self.heads, 1, self.kernel_size), requires_grad=True)
        
        self.kernel_softmax = nn.Softmax(dim=-1)
        self.dropout = nn.Dropout(self.dropout_p)
        
        if self.bias:
            self.bias_weights = nn.Parameter(torch.randn(self.heads))

        self.reset_parameters()    
            
    def reset_parameters(self):
        nn.init.xavier_uniform_(self.weights)
        if self.bias_weights is not None:
            nn.init.constant_(self.bias_weights, 0.)
            
    def forward(self, x):

        x = x.permute(1, 2, 0)
        # x.shape = [batchsize, channel, width]
        batch_size, in_channel, width = x.shape
        
        if self.weight_softmax:
            weights = self.kernel_softmax(self.weights)
        else:
            weigths = self.weights
            
        weigths = self.dropout(weights)
        
        x = x.view(-1, self.heads, width)
        
        if self.bias:
            output = F.conv1d(x, weigths, stride=self.stride, padding=self.padding, dilation=self.dilation, groups=self.heads, bias=self.bias_weights)
        else:
            output = F.conv1d(x, weigths, stride=self.stride, padding=self.padding, dilation=self.dilation, groups=self.heads)
        
        output = output.view(batch_size, -1, width).permute(2, 0, 1)
        
        return output
1reaction
keonlee9420commented, Sep 13, 2021

Hi @LEECHOONGHO, thanks for your attention. Please refer to #5 for that. It should resolve your issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

It Isn't Always a Training Issue - Training Industry
If it's not a training issue, then training won't solve the performance problem, no matter how stellar the training solution is.
Read more >
9 Critical Employee Training Challenges to Overcome (2023)
1. Employee training is not cut and dry · 2. There are numerous training options · 3. Employees have diverse needs · 1....
Read more >
10 Challenges of Training & Development of Professionals
The most common challenges of training and development include geographic limitations, increased costs, language barriers, translation issues, and virtual ...
Read more >
More Training Won't Solve Your Company's Problems
The go-to response for organizational issues is typically some form of reactionary training. The mantra goes like this: Design the training.
Read more >
Is it a Training Issue? 5 Critical Questions to Ask Requestors
Training request questions to help identify performance issues · Is this a new issue? · How many employees are impacted by this problem?...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found