Performance regression caused by #7087
See original GitHub issueEnvironment
- Qiskit Terra version: main (after https://github.com/Qiskit/qiskit-terra/commit/05b60a5e50ff0a6979f0a7bee1408bd32c9be23c )
- Python version: 3.8
- Operating system: linux
What is happening?
The nightly benchmark runs have flagged several run time performance regressions after #7087 merged:
- https://qiskit.github.io/qiskit/#passes.MultiQBlockPassBenchmarks.time_collect_multiq_block?cpu=Intel(R) Xeon(R) E-2174G CPU %40 3.80GHz&machine=qiskit-benchmarking01&num_cpu=8&os=Ubuntu 20.04&ram=64GB&python=3.8&p-n_qubits=20&p-depth=1024&p-max_block_size=2&commits=05b60a5e
- https://qiskit.github.io/qiskit/#passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement?cpu=Intel(R) Xeon(R) E-2174G CPU %40 3.80GHz&machine=qiskit-benchmarking01&num_cpu=8&os=Ubuntu 20.04&ram=64GB&python=3.8&p-n_qubits=14&p-depth=1024&commits=05b60a5e
- https://qiskit.github.io/qiskit/#passes.PassBenchmarks.time_remove_reset_in_zero_state?cpu=Intel(R) Xeon(R) E-2174G CPU %40 3.80GHz&machine=qiskit-benchmarking01&num_cpu=8&os=Ubuntu 20.04&ram=64GB&python=3.8&p-n_qubits=20&p-depth=1024&commits=05b60a5e
- https://qiskit.github.io/qiskit/#passes.PassBenchmarks.time_optimize_swap_before_measure?cpu=Intel(R) Xeon(R) E-2174G CPU %40 3.80GHz&machine=qiskit-benchmarking01&num_cpu=8&os=Ubuntu 20.04&ram=64GB&python=3.8&p-n_qubits=20&p-depth=1024&commits=05b60a5e
- https://qiskit.github.io/qiskit/#passes.PassBenchmarks.time_collect_2q_blocks?cpu=Intel(R) Xeon(R) E-2174G CPU %40 3.80GHz&machine=qiskit-benchmarking01&num_cpu=8&os=Ubuntu 20.04&ram=64GB&python=3.8&p-n_qubits=20&p-depth=1024&commits=05b60a5e
- https://qiskit.github.io/qiskit/#assembler.AssemblerBenchmarks.time_assemble_circuit?cpu=Intel(R) Xeon(R) E-2174G CPU %40 3.80GHz&machine=qiskit-benchmarking01&num_cpu=8&os=Ubuntu 20.04&ram=64GB&python=3.8&p-n_qubits=1&p-depth=4096&p-number of circuits=1&commits=05b60a5e
- https://qiskit.github.io/qiskit/#passes.MultiQBlockPassBenchmarks.time_collect_multiq_block?cpu=Intel(R) Xeon(R) E-2174G CPU %40 3.80GHz&machine=qiskit-benchmarking01&num_cpu=8&os=Ubuntu 20.04&ram=64GB&python=3.8&p-n_qubits=14&p-depth=1024&p-max_block_size=1&commits=05b60a5e
- https://qiskit.github.io/qiskit/#passes.MultiQBlockPassBenchmarks.time_collect_multiq_block?cpu=Intel(R) Xeon(R) E-2174G CPU %40 3.80GHz&machine=qiskit-benchmarking01&num_cpu=8&os=Ubuntu 20.04&ram=64GB&python=3.8&p-n_qubits=20&p-depth=1024&p-max_block_size=3&commits=05b60a5e
- https://qiskit.github.io/qiskit/#passes.MultipleBasisPassBenchmarks.time_basis_translator?cpu=Intel(R) Xeon(R) E-2174G CPU %40 3.80GHz&machine=qiskit-benchmarking01&num_cpu=8&os=Ubuntu 20.04&ram=64GB&python=3.8&p-n_qubits=5&p-depth=1024&p-basis_gates=[‘rx’%2C ‘ry’%2C ‘rz’%2C ‘r’%2C ‘rxx’%2C ‘id’]&commits=05b60a5e
none are huge in absolute time (on the order of ms
) and likely won’t be noticeable in a larger transpile()
call or application but we should try to fix these because #7087 really shouldn’t have had any performance impact.
How can we reproduce the issue?
Run any of the transpiler passes identified in the regressions linked
What should happen?
The addition of a new abstract class defining the interface for a circuit operation shouldn’t cause a noticeable performance regression
Any suggestions?
Identify where the passes are spending more time after the addition of the Operation
class and fix the bottleneck.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
Why Regression Defects Are Important For Your Next Release?
There are some reasons which make regression bugs a complicated thing to work on. Increase in Project Cost – Regression defects are produced...
Read more >What is Performance Regression Testing? - Mabl
Performance regression testing is a comparative approach that examines how a software application performs across in successive builds.
Read more >Performance-Regression Pitfalls Every Project Should Avoid
With proper planning and execution, continuous performance-regression testing can be a powerful tool for hardware and software projects.
Read more >Design of a Spark Big Data Framework for PM2.5 Air Pollution ...
It collects real time PM2.5 data and performs ensemble learning through three machine learning algorithms (Linear Regression, Random Forest, Gradient Boosting ...
Read more >Chrome Speed - Addressing Performance Regressions
Sometimes you are aware that your CL caused a performance regression, but you believe the CL should be landed as-is anyway. Chrome's core...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think this needs some more investigation before picking a direction here, especially to determine if we would expect this slow-down to grow as we increase either the number of operations in a circuit, the number of operation-types or something else.
isinstance
calculation should be cached per-type, so I natively wouldn’t expect much slow down after the first call. It looks like this cache is at least being populated, butisinstance
for a type with an ABC base class is still in my timing 4x slower (~100 ns to ~400 ns) than for a type with a concrete base class.CollectMultiQBlocks
, but it’s less clear to me where the regressions around theBasisTranslator
are originating. That pass doesn’t contain any obviousisinstance
calls, and none of theDAGCircuit
methods it uses (.op_nodes
, and.substitute_ndoe_with_dag
) do either. My best guess would be that this is coming from its use of thecircuit_to_dag
anddag_to_circuit
converters and their use ofQuantumCircuit._append
, but if that were the case, I would’ve expected this commit to show up as a regression in some of our circuit building benchmarks, but those are all noticeably flat, even with 100k+ gates:https://qiskit.github.io/qiskit/#circuit_construction.CircuitConstructionBench.time_circuit_construction?commits=05b60a5e https://qiskit.github.io/qiskit/#circuit_construction.CircuitConstructionBench.time_circuit_copy?commits=05b60a5e https://qiskit.github.io/qiskit/#converters.ConverterBenchmarks.time_dag_to_circuit?commits=05b60a5e https://qiskit.github.io/qiskit/#converters.ConverterBenchmarks.time_circuit_to_dag?commits=05b60a5e https://qiskit.github.io/qiskit/#ripple_adder.RippleAdderConstruction.time_build_ripple_adder?commits=05b60a5e https://qiskit.github.io/qiskit/#ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder?commits=05b60a5e
Operation
a non-standardABC
without a better understanding of the origin of the problem, and it’s likelihood to increase in magnitude . If we do go forward with that approach, we should make sure that when someone someday triesOperation.register(MyCustomOp)
they see something likeraise NotImplemenetedError("see GH-7528")
.Possibly helpful reference: https://stackoverflow.com/questions/42378726/why-is-checking-isinstancesomething-mapping-so-slow
The sort key here is specific to
CollectMultiQBlocks
; it’s not the default sort key forDAGCircuit.topological_sort
, which does use object polymorphism (sort of) by accessingx.sort_key
.edit: Matthew just said something pretty similar haha.
I don’t see why we should have a
gate
Boolean instead of aGate
interface. That’s kind of the point of #7087 in the first place - having those features be a defined interface means you can attach extra functionality, and you get static type-checking that the operations you want to perform are well defined. It’s also something we explicitly removed fromDAGNode
.