Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered

See original GitHub issue

Describe the bug When running on GPU Tabnet crashes with scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered

What is the current behavior? It works when the matrix I use contains only integers but fails with floats. I also made sure that NaN values are imputed and there are no Inf. Also the largest value fits into float32, Also set the batch size to a very low level.

If the current behavior is a bug, please provide the steps to reproduce. tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

Expected behavior

Screenshots

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10) Traceback (most recent call last):

File “<ipython-input-2-286ba8f48dc1>”, line 1, in <module> tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py”, line 329, in fit fit_params_steps = self._check_fit_params(**fit_params)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py”, line 248, in _check_fit_params “=sample_weight)`.”.format(pname))

ValueError: Pipeline.fit does not accept the batch_size parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight).

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10) No early stopping will be performed, last training weights will be used. Traceback (most recent call last):

File “<ipython-input-3-0b8430e51e24>”, line 1, in <module> tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py”, line 335, in fit self._final_estimator.fit(Xt, y, **fit_params_last_step)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py”, line 173, in fit self._train_epoch(train_dataloader)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py”, line 349, in _train_epoch batch_logs = self._train_batch(X, y)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py”, line 384, in _train_batch output, M_loss = self.network(X)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 276, in forward return self.tabnet(x)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 151, in forward out = self.feat_transformersstep

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 375, in forward x = self.shared(x)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 409, in forward scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device))

RuntimeError: CUDA error: device-side assert triggered

Other relevant information: poetry version:
python version: Operating System: Additional tools:

Additional context

Issue Analytics

State:
Created 3 years ago
Comments:25

Top GitHub Comments

2reactions

ThomasWolf0701commented, Nov 23, 2020

maybe entmax is working but sparsemax is creating this error, did you check that?

I ran the gridsearch on the cpu using only entmax and for the cpu this fixed the problem, even when varying the other parameters. Will also check for the gpu (cuda).

1reaction

ThomasWolf0701commented, Nov 23, 2020

“It can also be due to gradient fading or explosion”

Checked again and the values i gave the model to fit the featureMatrix against were off. So probably the model might not able to fit anything sensible. Fixed this and the error still occurs. Will have a look at the other suggestions.

And attached the updated data. bugFix2.zip