question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered

See original GitHub issue

Describe the bug When running on GPU Tabnet crashes with scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered

What is the current behavior? It works when the matrix I use contains only integers but fails with floats. I also made sure that NaN values are imputed and there are no Inf. Also the largest value fits into float32, Also set the batch size to a very low level.

If the current behavior is a bug, please provide the steps to reproduce. tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

Expected behavior

Screenshots

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10) Traceback (most recent call last):

File “<ipython-input-2-286ba8f48dc1>”, line 1, in <module> tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),batch_size = 10)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py”, line 329, in fit fit_params_steps = self._check_fit_params(**fit_params)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py”, line 248, in _check_fit_params “=sample_weight)`.”.format(pname))

ValueError: Pipeline.fit does not accept the batch_size parameter. You can pass parameters to specific steps of your pipeline using the stepname__parameter format, e.g. Pipeline.fit(X, y, logisticregression__sample_weight=sample_weight).

tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10) No early stopping will be performed, last training weights will be used. Traceback (most recent call last):

File “<ipython-input-3-0b8430e51e24>”, line 1, in <module> tab_model.fit(np.array(featureMatrix.sparse.to_dense().values),np.array(values),m__batch_size = 10)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\sklearn\pipeline.py”, line 335, in fit self._final_estimator.fit(Xt, y, **fit_params_last_step)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py”, line 173, in fit self._train_epoch(train_dataloader)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py”, line 349, in _train_epoch batch_logs = self._train_batch(X, y)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\abstract_model.py”, line 384, in _train_batch output, M_loss = self.network(X)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 276, in forward return self.tabnet(x)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 151, in forward out = self.feat_transformersstep

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 375, in forward x = self.shared(x)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\torch\nn\modules\module.py”, line 727, in _call_impl result = self.forward(*input, **kwargs)

File “C:\Users\Thomas Wolf\anaconda3\envs\my-rdkit-env\lib\site-packages\pytorch_tabnet\tab_network.py”, line 409, in forward scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device))

RuntimeError: CUDA error: device-side assert triggered

Other relevant information: poetry version:
python version: Operating System: Additional tools:

Additional context

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:25

github_iconTop GitHub Comments

2reactions
ThomasWolf0701commented, Nov 23, 2020

maybe entmax is working but sparsemax is creating this error, did you check that?

I ran the gridsearch on the cpu using only entmax and for the cpu this fixed the problem, even when varying the other parameters. Will also check for the gpu (cuda).

1reaction
ThomasWolf0701commented, Nov 23, 2020

“It can also be due to gradient fading or explosion”

Checked again and the values i gave the model to fit the featureMatrix against were off. So probably the model might not able to fit anything sensible. Fixed this and the error still occurs. Will have a look at the other suggestions.

And attached the updated data. bugFix2.zip

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: CUDA error: device-side assert triggered
I'm putting my code here: with torch.no_grad(): retrieval_one_hot = torch.zeros(k, 10).cuda() for batch_idx, (inputs, targets, ...
Read more >
Very simple torch.tensor().to("cuda") gives CUDA error: device ...
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") ... RuntimeError: CUDA error: device-side assert triggered.
Read more >
PyTorch: test/test_cuda.py - Fossies
assertRaisesRegex (RuntimeError, "out of memory"): 398 torch.empty(application, dtype=torch.int8, device='cuda') 399 400 # ensure out of ...
Read more >
Python API: test/test_cuda.py Source File - Caffe2
33 # cause CUDA OOM error on Windows. 34 TEST_CUDA = torch.cuda.is_available(). 35 TEST_MULTIGPU = TEST_CUDA and torch.cuda.
Read more >
runtimeerror: cuda error: device-side assert triggered
PyTorch的stable版本更新为1.0之后,原本3D模型无脑out of memory、3D模型torch.backends.cudnn.benchmark必须False的问题总算解决了!!!*☆,°*:.☆( ̄▽ ̄)/$:*.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found