Performance regression caused by #7087

Environment

Qiskit Terra version: main (after https://github.com/Qiskit/qiskit-terra/commit/05b60a5e50ff0a6979f0a7bee1408bd32c9be23c )
Python version: 3.8
Operating system: linux

What is happening?

The nightly benchmark runs have flagged several run time performance regressions after #7087 merged:

none are huge in absolute time (on the order of ms) and likely won’t be noticeable in a larger transpile() call or application but we should try to fix these because #7087 really shouldn’t have had any performance impact.

How can we reproduce the issue?

Run any of the transpiler passes identified in the regressions linked

What should happen?

The addition of a new abstract class defining the interface for a circuit operation shouldn’t cause a noticeable performance regression

Any suggestions?

Identify where the passes are spending more time after the addition of the Operation class and fix the bottleneck.

Issue Analytics

State:
Created 2 years ago
Comments:11 (11 by maintainers)

Top GitHub Comments

1reaction

kdkcommented, Jan 21, 2022

I think this needs some more investigation before picking a direction here, especially to determine if we would expect this slow-down to grow as we increase either the number of operations in a circuit, the number of operation-types or something else.

From my best effort reading of https://github.com/python/cpython/blob/f4c03484da59049eb62a9bf7777b963e2267d187/Modules/_abc.c#L586 , it seems like the isinstance calculation should be cached per-type, so I natively wouldn’t expect much slow down after the first call. It looks like this cache is at least being populated, but isinstance for a type with an ABC base class is still in my timing 4x slower (~100 ns to ~400 ns) than for a type with a concrete base class.

>>> ABCMeta._dump_registry(Gate)
Class: qiskit.circuit.gate.Gate
Inv. counter: 60
_abc_registry: set()
_abc_cache: {<weakref at 0x13f23fdd0; to 'ABCMeta' at 0x7feff5cf5be0 (XGate)>}
_abc_negative_cache: set()
_abc_negative_cache_version: 60

It sounds like there was some investigation into the origin of the regression for CollectMultiQBlocks, but it’s less clear to me where the regressions around the BasisTranslator are originating. That pass doesn’t contain any obvious isinstance calls, and none of the DAGCircuit methods it uses (.op_nodes, and .substitute_ndoe_with_dag) do either. My best guess would be that this is coming from its use of the circuit_to_dag and dag_to_circuit converters and their use of QuantumCircuit._append, but if that were the case, I would’ve expected this commit to show up as a regression in some of our circuit building benchmarks, but those are all noticeably flat, even with 100k+ gates:

https://qiskit.github.io/qiskit/#circuit_construction.CircuitConstructionBench.time_circuit_construction?commits=05b60a5e https://qiskit.github.io/qiskit/#circuit_construction.CircuitConstructionBench.time_circuit_copy?commits=05b60a5e https://qiskit.github.io/qiskit/#converters.ConverterBenchmarks.time_dag_to_circuit?commits=05b60a5e https://qiskit.github.io/qiskit/#converters.ConverterBenchmarks.time_circuit_to_dag?commits=05b60a5e https://qiskit.github.io/qiskit/#ripple_adder.RippleAdderConstruction.time_build_ripple_adder?commits=05b60a5e https://qiskit.github.io/qiskit/#ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder?commits=05b60a5e

@jakelishman 's suggestion seems like a reasonable way to solve the regression, but I am a bit hesitant to immediately jump to making an Operation a non-standard ABC without a better understanding of the origin of the problem, and it’s likelihood to increase in magnitude . If we do go forward with that approach, we should make sure that when someone someday tries Operation.register(MyCustomOp) they see something like raise NotImplemenetedError("see GH-7528").

Possibly helpful reference: https://stackoverflow.com/questions/42378726/why-is-checking-isinstancesomething-mapping-so-slow

1reaction

jakelishmancommented, Jan 17, 2022

The sort key here is specific to CollectMultiQBlocks; it’s not the default sort key for DAGCircuit.topological_sort, which does use object polymorphism (sort of) by accessing x.sort_key.

edit: Matthew just said something pretty similar haha.

I don’t see why we should have a gate Boolean instead of a Gate interface. That’s kind of the point of #7087 in the first place - having those features be a defined interface means you can attach extra functionality, and you get static type-checking that the operations you want to perform are well defined. It’s also something we explicitly removed from DAGNode.

Top Results From Across the Web

Why Regression Defects Are Important For Your Next Release?

There are some reasons which make regression bugs a complicated thing to work on. Increase in Project Cost – Regression defects are produced...

What is Performance Regression Testing? - Mabl

Performance regression testing is a comparative approach that examines how a software application performs across in successive builds.

Performance-Regression Pitfalls Every Project Should Avoid

With proper planning and execution, continuous performance-regression testing can be a powerful tool for hardware and software projects.

Design of a Spark Big Data Framework for PM2.5 Air Pollution ...

It collects real time PM2.5 data and performs ensemble learning through three machine learning algorithms (Linear Regression, Random Forest, Gradient Boosting ...

Chrome Speed - Addressing Performance Regressions

Sometimes you are aware that your CL caused a performance regression, but you believe the CL should be landed as-is anyway. Chrome's core...