Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`tf.vectorized_map` doesn't work with the parameter-shift rule

See original GitHub issue

Hi, I’m trying to parallelize my quantum circuit execution on GPU using tf.vectorized_map following the thread on this link. This function allows the execution of each input to be parallelized on each GPU (or CPU) core and its seems to be working as expected if I just calculate the result of the circuit. But I realized that taking the gradient of the circuit causes some issues. In the following I prepared some sample code.

import tensorflow as tf
import pennylane as qml

dev1 = qml.device("qiskit.aer", wires = 2, shots=10, backend='qasm_simulator')
dev2 = qml.device("default.qubit.tf", wires = 2, shots=None)

@qml.qnode(dev2, diff_method="parameter-shift", interface="tf")
def circuit2(inputs, weights):
    qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")

    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires = [0, 1])

    return qml.probs(op=qml.PauliZ(1))

@qml.qnode(dev1, diff_method="parameter-shift", interface="tf")
def circuit1(inputs, weights):
    qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")

    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires = [0, 1])

    return qml.probs(op=qml.PauliZ(1))

weights = tf.Variable(tf.random.uniform((2,), dtype=tf.float64), trainable=True)
inputs = tf.random.uniform((10,2), dtype=tf.float64)
y_truth = tf.random.stateless_binomial((10,2), [10,11], 1, 0.5)

Above I prepared two simple circuit one using purely TensorFlow and other is using Quasm simulator. Using the batched execution proposed in this link I can produce expected results for both circuit;

batched_circuit1 = batch_input_tf(circuit1)
batched_circuit2 = batch_input_tf(circuit2)

with tf.GradientTape() as tape:
    yhat = batched_circuit1(inputs, weights)
    loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
tape.gradient(loss, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([-0.15988095,  0.15464286])>

with tf.GradientTape() as tape:
    yhat = batched_circuit2(inputs, weights)
    loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
tape.gradient(loss, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([-0.07629922,  0.01617766])>

I tested this function in a more realistic example and it works perfectly the problem is its not parallelized hence the execution is extremely slow which just increases with large number of shots as expected. Hence I wanted to parallelize the execution of the circuit using tf.vectorized_map;

circ = tf.function(circuit2) # this can be one or two both gives the same result for parameter-shift
contract = lambda ins, ws : tf.vectorized_map(lambda vec: circ(vec, ws), ins)
with tf.GradientTape() as tape:
    tape.watch(weights)
    yhat = contract(inputs, weights)
    loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
tape.gradient(loss, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0., 0.])>

This function executes each input on different CPU/GPU hence much much more faster than the execution above. However I realised that my gradients are always zero for parameter-shift and I’m getting the following warning;

WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.

and if I instead use backprop for dev2 it seems to work so I believe the problem is with parameter-shift. Hence I was wondering if there is a better way to parallelize the circuit execution or am I doing a mistake in my workflow. Any suggestion highly appreciated.

System Settings:

pennylane v0.20.0
tensorflow v2.7.0
PennyLane-qiskit v0.20.0

Thanks Jack

Issue Analytics

State:
Created 2 years ago
Comments:11 (11 by maintainers)

Top GitHub Comments

1reaction

antalszavacommented, Jan 11, 2022

Hi @jackaraz, thanks 🙂

Josh managed to recreate our use case fully in TensorFlow and we’ve opened an issue: https://github.com/tensorflow/tensorflow/issues/53726

As this is not an issue directly related to PennyLane and there are workarounds to this, I’ll lift the bug label from here.

0reactions

jackarazcommented, Jan 11, 2022

Hi @antalszava & @josh146, thanks a lot for all the answers.

Would we still require post-processing by using tf.reduce_sum?

Yes definitely, vectorized_map is just a mapping of the first axis of the sample, let’s say the input shape is (Nt, nqubit), so the shape of the output will be (Nt, outdim) where Nt is number of examples that you provide. Note that with the gradient included in the vectorized_map it might be essential to declare the axis to apply to reduce sum.

When changing to tf.map_fn, that is an alternative mentioned for tf.vectorized_map, the original example posted by @jackaraz works without errors and yields the correct results locally.

I can confirm that map_fn works nicely across all penny lane platforms that I tried so far in a much more complex setting but it is not as efficient as vectorized_map. I believe this is because vectorize_map executes everything in the eager mode, so tensors are just memory maps, i.e. you can not access the value of the tensor during execution. But map_fn does not do the same. I observed order of magnitude difference in speed both on CPU and GPU between map_fn and vectorized_map. So I guess there is not an easy solution to use vectorized_map given the current status of the TensorFlow and penny lane, but yes map_fn is definitely a good alternative. However, I wouldn’t use it to execute with the ibmq backend since it will submit jobs one by one to the quantum computer.

Top Results From Across the Web

tf.vectorized_map | TensorFlow v2.11.0

If true, on failing to vectorize an operation, the unsupported op is wrapped in a tf.while_loop to execute the map iterations. Note that...

executing vectorized_map on batches triggers retracing #43710

It seems that vectorized_map allocates memory to run all iterations simultaneously and this is causing OOM. Therefore, I separated the execution ...

Chapter 4. Text Vectorization and Transformation Pipelines

We will look at four types of vector encoding—frequency, one-hot, TF–IDF, and distributed representations—and discuss their implementations in Scikit-Learn, ...

Error in use of the tf.vectorized_map and the tf.linalg.adjoint ...

The first error, IndexError: list index out of range is because a List has been passed to tf.vectorized_map(tf.linalg.adjoint instead of a ...

| notebook.community

Spoiler: my solution doesn't work, and it's going to take several attempts to ... It's really easy to mess up the vectorization in...