[BUG] qml.qnn.TorchLayer breaks multi GPU usage
See original GitHub issueExpected behavior
Setup: 4 Nvidia GPUs
Expected return:
Let's use 4 GPUs!
Average loss over epoch 1: 0.4803
Average loss over epoch 2: 0.3553
Accuracy: 78.0%
Actual behavior
Got back: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! (when checking argument for argument mat2 in method wrapper__bmm)
Additional information
Error occurs always. Sometimes the error may change which GPUs are in error, for example:
RuntimeError: Expected all tensors …cuda:0 and cuda:1… or RuntimeError: Expected all tensors …cuda:3 and cuda:2…
Source code
# Coping and pasting code from: https://pennylane.ai/qml/demos/tutorial_qnn_module_torch.html
import torch
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_moons
# Set random seeds
torch.manual_seed(42)
np.random.seed(42)
X, y = make_moons(n_samples=200, noise=0.1)
y_ = torch.unsqueeze(torch.tensor(y), 1) # used for one-hot encoded labels
y_hot = torch.scatter(torch.zeros((200, 2)), 1, y_, 1)
c = ["#1f77b4" if y_ == 0 else "#ff7f0e" for y_ in y] # colours for each class
# Removing the plot since we don't need it for now
#plt.axis("off")
#plt.scatter(X[:, 0], X[:, 1], c=c)
#plt.show()
import pennylane as qml
n_qubits = 2
dev = qml.device("default.qubit", wires=n_qubits)
@qml.qnode(dev)
def qnode(inputs, weights):
qml.AngleEmbedding(inputs, wires=range(n_qubits))
qml.BasicEntanglerLayers(weights, wires=range(n_qubits))
return [qml.expval(qml.PauliZ(wires=i)) for i in range(n_qubits)]
n_layers = 6
weight_shapes = {"weights": (n_layers, n_qubits)}
qlayer = qml.qnn.TorchLayer(qnode, weight_shapes)
clayer_1 = torch.nn.Linear(2, 2)
clayer_2 = torch.nn.Linear(2, 2)
softmax = torch.nn.Softmax(dim=1)
layers = [clayer_1, qlayer, clayer_2, softmax]
#layers = [clayer_1, clayer_2, softmax]
model = torch.nn.Sequential(*layers)
if torch.cuda.device_count() > 1:
print("Let's use", torch.cuda.device_count(), "GPUs!")
#dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
model = torch.nn.DataParallel(model)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
opt = torch.optim.SGD(model.parameters(), lr=0.2)
loss = torch.nn.L1Loss()
X = torch.tensor(X, requires_grad=True).float()
y_hot = y_hot.float()
batch_size = 5
batches = 200 // batch_size
data_loader = torch.utils.data.DataLoader(
list(zip(X, y_hot)), batch_size=5, shuffle=True, drop_last=True
)
epochs = 2
for epoch in range(epochs):
running_loss = 0
for xs, ys in data_loader:
opt.zero_grad()
# Moving data to device as needed
xs=xs.to(device)
ys=ys.to(device)
loss_evaluated = loss(model(xs), ys)
loss_evaluated.backward()
opt.step()
running_loss += loss_evaluated
avg_loss = running_loss / batches
print("Average loss over epoch {}: {:.4f}".format(epoch + 1, avg_loss))
# Moving data to device as needed
X = X.to(device)
y_pred = model(X)
#predictions = torch.argmax(y_pred, axis=1).detach().numpy()
predictions = torch.argmax(y_pred, axis=1).detach().cpu().numpy()
correct = [1 if p == p_true else 0 for p, p_true in zip(predictions, y)]
accuracy = sum(correct) / len(correct)
print(f"Accuracy: {accuracy * 100}%")
Tracebacks
Let's use 4 GPUs!
Traceback (most recent call last):
File "/home/pennylane_error_gpu_no_plot.py", line 111, in <module>
loss_evaluated = loss(model(xs), ys)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/_utils.py", line 434, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/qnn/torch.py", line 277, in forward
reconstructor.append(self.forward(x))
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/qnn/torch.py", line 281, in forward
return self._evaluate_qnode(inputs)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/qnn/torch.py", line 296, in _evaluate_qnode
return self.qnode(**kwargs).type(x.dtype)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/qnode.py", line 560, in __call__
res = qml.execute(
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/interfaces/batch/__init__.py", line 342, in execute
cache_execute(batch_execute, cache, return_tuple=False, expand_fn=expand_fn)(tapes)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/interfaces/batch/__init__.py", line 173, in wrapper
res = fn(execution_tapes.values(), **kwargs)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/interfaces/batch/__init__.py", line 125, in fn
return original_fn(tapes, **kwargs)
File "/home/miniconda3/envs/py9/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/_qubit_device.py", line 289, in batch_execute
res = self.execute(circuit)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/devices/default_qubit_torch.py", line 233, in execute
return super().execute(circuit, **kwargs)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/_qubit_device.py", line 201, in execute
self.apply(circuit.operations, rotations=circuit.diagonalizing_gates, **kwargs)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/devices/default_qubit.py", line 216, in apply
self._state = self._apply_operation(self._state, operation)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/devices/default_qubit.py", line 247, in _apply_operation
return self._apply_unitary_einsum(state, matrix, wires)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/pennylane/devices/default_qubit.py", line 752, in _apply_unitary_einsum
return self._einsum(einsum_indices, mat, state)
File "/home/miniconda3/envs/py9/lib/python3.9/site-packages/torch/functional.py", line 327, in einsum
return _VF.einsum(equation, operands) # type: ignore[attr-defined]
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper__bmm)
System information
>>> import pennylane as qml; qml.about()
Name: PennyLane
Version: 0.21.0
Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc.
Home-page: https://github.com/XanaduAI/pennylane
Author:
Author-email:
License: Apache License 2.0
Location: /home/miniconda3/envs/py9/lib/python3.9/site-packages
Requires: autoray, retworkx, cachetools, semantic-version, scipy, pennylane-lightning, networkx, numpy, toml, appdirs, autograd
Required-by: PennyLane-Lightning
Platform info: Linux-4.18.0-348.7.1.el8_5.x86_64-x86_64-with-glibc2.28
Python version: 3.9.7
Numpy version: 1.22.2
Scipy version: 1.8.0
Installed devices:
- default.gaussian (PennyLane-0.21.0)
- default.mixed (PennyLane-0.21.0)
- default.qubit (PennyLane-0.21.0)
- default.qubit.autograd (PennyLane-0.21.0)
- default.qubit.jax (PennyLane-0.21.0)
- default.qubit.tf (PennyLane-0.21.0)
- default.qubit.torch (PennyLane-0.21.0)
- lightning.qubit (PennyLane-Lightning-0.21.0)
Existing GitHub issues
- I have searched existing GitHub issues to make sure the issue does not already exist.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
[BUG] qml.qnn.TorchLayer breaks multi GPU usage
Issue description. Expected behavior: (What you expect to happen) Setup: 4 Nvidia GPUs. Expected return: Let's use 4 GPUs!
Read more >Pennylane and Pytorch running on GPU
Running the classical ml code runs without a problem on the GPU but when I run the qml code I get an error....
Read more >YOLOv5 issues with torch==1.12 on Multi-GPU systems #8395
I have searched the YOLOv5 issues and found no similar bug report. YOLOv5 Component. Training, Multi-GPU. Bug. All GPUs are utilized by torch ......
Read more >)[561](file:/~/miniconda3/envs/vid/lib/python3.9/site ... - You.com
Describe the bug After the installation is complete, an error is reported when trying to import AutoGluon. ... TorchLayer breaks multi GPU usage#2203....
Read more >PyTorch 101, Part 4: Memory Management and Using Multiple ...
We conclude with best practises for debugging memory error. ... How to use multiple GPUs for your network, either using data parallelism or...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Hi @dominicpasquali, we had more of a look into this and it looks like a fix could take time as it would require explicit support for
torch.nn.DataParallel.The issue is related to the fact that
torch.nn.DataParallelwill attempt to access the state in the PennyLane device from multiple GPUs in parallel. When usingdefault.qubitwithdiff_method="backprop", we are using the native Torch devicedefault.qubit.torchinternally. This device assumes that device executions happen sequentially.The
executemethod of the device does the transition in between Torch devices, if necessary. It infers the Torch device to use based by checking what Torch device the input parameters to gates were using.The steps in
executecan be summarized as:The error that we see comes from the fact that step 2.a) may be executed more than once before 3 is executed once.
A basic logging was carried out by modifying step 2.a) as:
And adding a
try-exceptblock around the execution:The raw log is:
What this seems to tell is that we’ve changed the device of the state to
cuda:2by the timesuper().execute(circuit, **kwargs)has been called using parameters oncuda:0.Potential solutions could include:
executecan only be run by a single GPU at once. This does seem to defeat the advantage of parallelization.Could potentially
torch.nn.parallel.DistributedDataParallelbe a solution? PyTorch seems to recommend using that.I can confirm that DistributedDataParallel works!