Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Errors thown when using custom loss function (rmsle)

See original GitHub issue

Hello! I’m super excited to dig into using pytorch_tabnet, but I’ve been banging my head against a wall for the past 2 nights on this issue, so I’m putting out a call for assistance.

I’ve got everything setup properly and confirmed that my data has no missing values and no values outside the defined dimensions.

I can train properly using the default (MSELoss) loss function, but for my particular problem I need to use either mean squared log error or, ideally, root mean squared log error.

I’ve defined a custom loss function as follows:

def rmsle_loss(y_pred, y_true):
    return torch.sqrt(nn.functional.mse_loss(torch.log(y_pred + 1), torch.log(y_true + 1)))

And I’m applying it to the model with the loss_fn=rmsle_loss parameter to .fit().

However - when I do this, I’m getting these dreaded errors.

Using CPU: index -1 is out of bounds for dimension 1 with size 22

Using GPU: CUDA error: device-side assert triggered

Both of these are being thrown at line 94 in sparsemax.py:

tau = input_cumsum.gather(dim, support_size - 1)

Note this ONLY happens when I’m using a custom loss function. I am able to train the model just fine using the default loss function, but since that’s not ideal for my domain, I really need to use the custom function. As I mentioned above, I’ve confirmed that there are no inf, NA, or out-of-bounds data in my training set.

Any thoughts? Help would be deeply appreciated!

Issue Analytics

State:
Created a year ago
Comments:10

Top GitHub Comments

1reaction

Optimoxcommented, Sep 26, 2022

Thanks for the detailed information.

I’m not sure why the model making any predictions of the form 1 + y_pred < 0.

The model should not predict negative values if training data is only positive. And that will probably be the case when training is finished. However, at start the model weights are randomly initialized so it’s very likely that negative values will occur. Even after few epochs, if the model needs to reach values as high as 10K, it’s hard to implicitly be sure that no input will yield negative scores. So as long as you don’t explicitly prevent the model to make negative predictions it might always happen.

Good luck with tuning your model!

1reaction

noahlhcommented, Sep 23, 2022

Oh wow you are legendary @Optimox! Thanks for uncovering that and I’m glad I was (indirectly) able to help fix a bug 😃

I’m retraining now and there’s still a slight discrepancy (see below), but it’s now within range and likely for the reasons you mentioned, so I think we’re all good. Many many thanks.

epoch 0  | loss: 2.37828 | train_rmsle: 1.7417000532150269| valid_rmsle: 1.7888699769973755|  0:00:22s
epoch 1  | loss: 1.3471  | train_rmsle: 2.0144999027252197| valid_rmsle: 2.071079969406128|  0:00:45s
epoch 2  | loss: 1.01037 | train_rmsle: 2.018090009689331| valid_rmsle: 2.063570022583008|  0:01:06s
epoch 3  | loss: 0.83754 | train_rmsle: 1.5472899675369263| valid_rmsle: 1.5472899675369263|  0:01:28s
epoch 4  | loss: 0.76075 | train_rmsle: 0.9113900065422058| valid_rmsle: 0.9303600192070007|  0:01:49s
epoch 5  | loss: 0.71234 | train_rmsle: 0.7181299924850464| valid_rmsle: 0.7953600287437439|  0:02:12s
epoch 6  | loss: 0.67979 | train_rmsle: 0.6658599972724915| valid_rmsle: 0.7813699841499329|  0:02:34s
epoch 7  | loss: 0.65395 | train_rmsle: 0.6251800060272217| valid_rmsle: 0.7234600186347961|  0:02:56s
epoch 8  | loss: 0.63447 | train_rmsle: 0.6097800135612488| valid_rmsle: 0.704200029373169|  0:03:18s
epoch 9  | loss: 0.62041 | train_rmsle: 0.5897899866104126| valid_rmsle: 0.7026200294494629|  0:03:39s
epoch 10 | loss: 0.60307 | train_rmsle: 0.5744100213050842| valid_rmsle: 0.6758300065994263|  0:04:01s
epoch 11 | loss: 0.59601 | train_rmsle: 0.5818799734115601| valid_rmsle: 0.6536700129508972|  0:04:23s
epoch 12 | loss: 0.58429 | train_rmsle: 0.560479998588562| valid_rmsle: 0.6636599898338318|  0:04:45s
epoch 13 | loss: 0.5752  | train_rmsle: 0.5513899922370911| valid_rmsle: 0.6779299974441528|  0:05:08s
epoch 14 | loss: 0.56832 | train_rmsle: 0.5371400117874146| valid_rmsle: 0.6313999891281128|  0:05:29s
epoch 15 | loss: 0.5622  | train_rmsle: 0.5362799763679504| valid_rmsle: 0.6614099740982056|  0:05:51s