Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parallelization / depth reduction pass

See original GitHub issue

What is the expected enhancement?

The following circuit has depth 10:

OPENQASM 2.0;
include "qelib1.inc";
qreg q[4];
cx q[0],q[1];
cx q[1],q[0];
cx q[2],q[1];
cx q[0],q[1];
cx q[2],q[3];
cx q[3],q[2];
cx q[1],q[2];
cx q[2],q[3];
cx q[2],q[1];
cx q[1],q[2];
cx q[3],q[2];

          ┌───┐                                        
q_0: ──■──┤ X ├───────■────────────────────────────────
     ┌─┴─┐└─┬─┘┌───┐┌─┴─┐               ┌───┐          
q_1: ┤ X ├──■──┤ X ├┤ X ├───────■───────┤ X ├──■───────
     └───┘     └─┬─┘└───┘┌───┐┌─┴─┐     └─┬─┘┌─┴─┐┌───┐
q_2: ────────────■────■──┤ X ├┤ X ├──■────■──┤ X ├┤ X ├
                    ┌─┴─┐└─┬─┘└───┘┌─┴─┐     └───┘└─┬─┘
q_3: ───────────────┤ X ├──■───────┤ X ├────────────■──
                    └───┘          └───┘

However by commuting one of the CNOTs to the left, the depth can be reduced to 9. I don’t think there’s a pass that currently does this.

OPENQASM 2.0;
include "qelib1.inc";
qreg q[4];
cx q[0],q[1];
cx q[1],q[0];
cx q[2],q[3];
cx q[2],q[1];
cx q[0],q[1];
cx q[3],q[2];
cx q[1],q[2];
cx q[2],q[3];
cx q[2],q[1];
cx q[1],q[2];
cx q[3],q[2];

          ┌───┐                                   
q_0: ──■──┤ X ├───────■───────────────────────────
     ┌─┴─┐└─┬─┘┌───┐┌─┴─┐          ┌───┐          
q_1: ┤ X ├──■──┤ X ├┤ X ├──■───────┤ X ├──■───────
     └───┘     └─┬─┘├───┤┌─┴─┐     └─┬─┘┌─┴─┐┌───┐
q_2: ──■─────────■──┤ X ├┤ X ├──■────■──┤ X ├┤ X ├
     ┌─┴─┐          └─┬─┘└───┘┌─┴─┐     └───┘└─┬─┘
q_3: ┤ X ├────────────■───────┤ X ├────────────■──
     └───┘                    └───┘

I think this should be easy to write using the DAGDependency since the depth of that DAG is the shortest possible (taking into account all commutation relations). See the template matching passes for examples of how to write a pass utilizing DAGDependency.

Issue Analytics

State:
Created 2 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

TheGupta2012commented, Jan 20, 2022

I was interested in learning and working on this issue. Would it be okay if I took it up?

1reaction

mtreinishcommented, Jan 20, 2022

(This may actually be in some cases a bug, as I don’t think https://qiskit.org/documentation/retworkx/apiref/retworkx.PyDAG.nodes.html#retworkx-pydag-nodes , which underlies DAGDependency.get_nodes(), guarantees anything about node order.)

There’s not explicit order guarantee there, but the implementation does return a fixed order and that likely won’t ever change (and I’d probably be concerned about backwards compatibility if we did need to change it for some reason). It will always be in order of node indices in the graph. This is nominally insertion order unless there are deletions. If nodes are deleted and then subsequently new nodes are added those original node indices are reused. So for example if you did:

g = PyDAG()
graph.add_nodes_from(["A", "B", "C", "D"])
graph.remove_node(1)
graph.add_node("E")
print(graph.nodes())

would return: ["A", "E", "C", "D"]

But without any node removals it will just be insertion order.

Top Results From Across the Web

Parallel Reduction - an overview | ScienceDirect Topics

This can be applied for many problems, a min operation being just one of them. It works by using half the number of...

Parallel Sequences: Fold, Reduce and Scan

Scans are trickier and use a 2-‐pass algorithm that builds a tree. The map-‐reduce-‐fold paradigm, inspired by funclonal programming, is a big winner...

Parallel Computing Basics

Speculation to Decrease Depth. • Example: parallel execution of FSMs over input sequences. – Todd Mytkowicz et al., “Data-Parallel Finite-State Machines”,.

Parallel Computing: Theory and Practice

There has been much research on the problem of reducing friction in scheduling. This research shows that distrubuted scheduling algorithms can work quite...

Parallelizing across multiple CPU/GPUs to speed up deep ...

However, this configuration runs deep learning inference on a single CPU and a single GPU core of the edge device. To reduce inference...