`tf.vectorized_map` doesn't work with the parameter-shift rule
See original GitHub issueHi, I’m trying to parallelize my quantum circuit execution on GPU using tf.vectorized_map following the thread on this link. This function allows the execution of each input to be parallelized on each GPU (or CPU) core and its seems to be working as expected if I just calculate the result of the circuit. But I realized that taking the gradient of the circuit causes some issues. In the following I prepared some sample code.
import tensorflow as tf
import pennylane as qml
dev1 = qml.device("qiskit.aer", wires = 2, shots=10, backend='qasm_simulator')
dev2 = qml.device("default.qubit.tf", wires = 2, shots=None)
@qml.qnode(dev2, diff_method="parameter-shift", interface="tf")
def circuit2(inputs, weights):
qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")
qml.RY(weights[0], wires=0)
qml.RY(weights[1], wires=1)
qml.CNOT(wires = [0, 1])
return qml.probs(op=qml.PauliZ(1))
@qml.qnode(dev1, diff_method="parameter-shift", interface="tf")
def circuit1(inputs, weights):
qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")
qml.RY(weights[0], wires=0)
qml.RY(weights[1], wires=1)
qml.CNOT(wires = [0, 1])
return qml.probs(op=qml.PauliZ(1))
weights = tf.Variable(tf.random.uniform((2,), dtype=tf.float64), trainable=True)
inputs = tf.random.uniform((10,2), dtype=tf.float64)
y_truth = tf.random.stateless_binomial((10,2), [10,11], 1, 0.5)
Above I prepared two simple circuit one using purely TensorFlow and other is using Quasm simulator. Using the batched execution proposed in this link I can produce expected results for both circuit;
batched_circuit1 = batch_input_tf(circuit1)
batched_circuit2 = batch_input_tf(circuit2)
with tf.GradientTape() as tape:
yhat = batched_circuit1(inputs, weights)
loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
tape.gradient(loss, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([-0.15988095, 0.15464286])>
with tf.GradientTape() as tape:
yhat = batched_circuit2(inputs, weights)
loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
tape.gradient(loss, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([-0.07629922, 0.01617766])>
I tested this function in a more realistic example and it works perfectly the problem is its not parallelized hence the execution is extremely slow which just increases with large number of shots as expected. Hence I wanted to parallelize the execution of the circuit using tf.vectorized_map;
circ = tf.function(circuit2) # this can be one or two both gives the same result for parameter-shift
contract = lambda ins, ws : tf.vectorized_map(lambda vec: circ(vec, ws), ins)
with tf.GradientTape() as tape:
tape.watch(weights)
yhat = contract(inputs, weights)
loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
tape.gradient(loss, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0., 0.])>
This function executes each input on different CPU/GPU hence much much more faster than the execution above. However I realised that my gradients are always zero for parameter-shift and I’m getting the following warning;
WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.
and if I instead use backprop for dev2 it seems to work so I believe the problem is with parameter-shift. Hence I was wondering if there is a better way to parallelize the circuit execution or am I doing a mistake in my workflow. Any suggestion highly appreciated.
System Settings:
- pennylane v0.20.0
- tensorflow v2.7.0
- PennyLane-qiskit v0.20.0
Thanks Jack
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)

Top Related StackOverflow Question
Hi @jackaraz, thanks 🙂
Josh managed to recreate our use case fully in TensorFlow and we’ve opened an issue: https://github.com/tensorflow/tensorflow/issues/53726
As this is not an issue directly related to PennyLane and there are workarounds to this, I’ll lift the bug label from here.
Hi @antalszava & @josh146, thanks a lot for all the answers.
Yes definitely,
vectorized_mapis just a mapping of the first axis of the sample, let’s say the input shape is(Nt, nqubit), so the shape of the output will be(Nt, outdim)whereNtis number of examples that you provide. Note that with the gradient included in thevectorized_mapit might be essential to declare the axis to apply to reduce sum.I can confirm that
map_fnworks nicely across all penny lane platforms that I tried so far in a much more complex setting but it is not as efficient asvectorized_map. I believe this is becausevectorize_mapexecutes everything in the eager mode, so tensors are just memory maps, i.e. you can not access the value of the tensor during execution. Butmap_fndoes not do the same. I observed order of magnitude difference in speed both on CPU and GPU betweenmap_fnandvectorized_map. So I guess there is not an easy solution to usevectorized_mapgiven the current status of the TensorFlow and penny lane, but yesmap_fnis definitely a good alternative. However, I wouldn’t use it to execute with theibmqbackend since it will submit jobs one by one to the quantum computer.