scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered
See original GitHub issueDescribe the bug When running on GPU Tabnet crashes with scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered
What is the current behavior? It works when the matrix I use contains only integers but fails with floats. I also made sure that NaN values are imputed and there are no Inf. Also the largest value fits into float32, Also set the batch size to a very low level.
If the current behavior is a bug, please provide the steps to reproduce. tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)
Expected behavior
Screenshots
tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10) Traceback (most recent call last):
File “<ipython-input-2-286ba8f48dc1>”, line 1, in <module> tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py”, line 329, in fit fit_params_steps = self._check_fit_params(**fit_params)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py”, line 248, in _check_fit_params “=sample_weight)`.”.format(pname))
ValueError: Pipeline.fit does not accept the batch_size parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight)
.
tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10) No early stopping will be performed, last training weights will be used. Traceback (most recent call last):
File “<ipython-input-3-0b8430e51e24>”, line 1, in <module> tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py”, line 335, in fit self._final_estimator.fit(Xt, y, **fit_params_last_step)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py”, line 173, in fit self._train_epoch(train_dataloader)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py”, line 349, in _train_epoch batch_logs = self._train_batch(X, y)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py”, line 384, in _train_batch output, M_loss = self.network(X)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 276, in forward return self.tabnet(x)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 151, in forward out = self.feat_transformersstep
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 375, in forward x = self.shared(x)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)
File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 409, in forward scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device))
RuntimeError: CUDA error: device-side assert triggered
Other relevant information:
poetry version:
python version:
Operating System:
Additional tools:
Additional context
Issue Analytics
- State:
- Created 3 years ago
- Comments:25
Top GitHub Comments
I ran the gridsearch on the cpu using only entmax and for the cpu this fixed the problem, even when varying the other parameters. Will also check for the gpu (cuda).
“It can also be due to gradient fading or explosion”
Checked again and the values i gave the model to fit the featureMatrix against were off. So probably the model might not able to fit anything sensible. Fixed this and the error still occurs. Will have a look at the other suggestions.
And attached the updated data. bugFix2.zip