Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Benchmarks grant

See original GitHub issue

This issue is to provide information, and allow discussion, on the NumFOCUS small development grant to improve benchmarks for pandas (and possibly other projects of the ecosystem). The content of this issue is work in progress, and will be updated as needed.

List of some of the tasks identified to improve asv itself:

Evaluate the need of forking asv, and fork if needed. The project looks abandoned, and maintainers have been contacted, but no answer yet.
Update the style of asv (codebase currently is not following PEP-8), and remove Python 2 compatibility code (and six dependency)
Evaluate if it make sense to allow asv to be imported in benchmarks, and possibly implement it
Evaluate and possibly implement API changes to asv, to improve readability, usability and code clarity. For example, would it be a good idea to provide an abstract base class for benchmarks?
Evaluate using conbench or codespeed as the UI for benchmarks, and possibly make asv compatible with one or more external UI’s, and drop asv UI
Find code not in use anymore, and remove it

Work on the pandas benchmarks:

Make benchmarks run faster, analyzing slower benchmarks, and by using less data when make sense, or avoiding unnecessary parametrizations #16803 https://github.com/pandas-dev/pandas/issues/44450#issuecomment-969016669
Improve the CI for pandas benchmarks. Making them more reliable (we’re currently using a grep to see if the build has failed). We can also check if the logs can be improved
Review benchmark structure, see if it can be improved so benchmarks are easier to find, and remove duplicate benchmarks if it makes sense
Review existing bechmarks issues, discuss with the pandas team on any that should be prioritizes, and work on the important ones

Work on infrastructure:

Evaluate options to replace the current pandas benchmarks infrastructure and coordinate with NumFOCUS
Add hooks and notifications that make sense to detect performance regressions as soon as possible

List of some other projects using asv:

numpy (NumFOCUS sponsored)
scipy (NumFOCUS sponsored)
scikit-learn (NumFOCUS sponsored)
scikit-image (NumFOCUS sponsored)
xarray (NumFOCUS sponsored)
pymc (NumFOCUS sponsored)
pvlib (NumFOCUS affiliated)
modin
vaex
ibis
radis
pandera
msprime
xitorch
kartothek
pyphi
qcodes

Feedback very welcome

CC: @dorothykiz1 @LucyJimenez CC: @pandas-dev/pandas-core @rgommers

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

mdhabercommented, Dec 25, 2021

This sounds great. If you decide to fork asv, I’d be interested in testing the fork.

In case you are interested in suggestions that would help maintainers of other projects, I’ll include thoughts from my experience with ASV below.

Please let me know if either of these should be filed as issues at https://github.com/airspeed-velocity/asv. - I haven't checked recently, but in the past I've been unable to use conda-installed ASV on my Windows machine (see specific error [here](https://github.com/scipy/scipy/pull/12732#issuecomment-674595399)). The only way I've been able to get ASV to work in my conda environment on Windows is to install it with pip. Not good practice, I know, but that's what works. - Sometimes the author of a benchmark wants to time code execution _and_ track something about that code execution without executing that code twice. It sounds like there is not a simple way to do this (see [here](https://github.com/scipy/scipy/pull/10762#issuecomment-567322968)).

If you decide to fork, please consider changing the following behaviors.

When a benchmark exceeds the default time limit, ASV reports it as having “FAILED”. I think it would be useful to distinguish timeout from other failures. (I also think having a default time limit is surprising.)
I think the current default is to silence stderr, so if a test fails, benchmarks need to be re-run to see what went wrong. Should --show-stderr be the default? If not, could it be logged by default?
I think the default is to run benchmarks against master instead of HEAD of the current branch. I understand that there are arguments both ways, but defaulting to master seems surprising to me (and at least some others - see scipy/scipy#11998).

The rest of these may be due to user error. Nonetheless, they might indicate that there is something to be improved.

I use a site.cfg so that BLAS/LAPACK can be found when I build SciPy, but BLAS/LAPACK still can’t be found when ASV tries to build SciPy. I get around this by manually copying openblas.a into the ASV environment’s Lib folder. Is it possible for ASV to use the information in a project’s site.cfg?
What is the recommended workflow for debugging and fixing a benchmark that is failing? (Is it to use --quick --show-stderr? Sometimes fixing bugs requires iteration, and even with the --quick option, there is still a long delay before the benchmarks start running. It is often faster for me to manually instantiate the class representing the benchmark and call its methods. That way I can interact with the stack trace and even run in debug mode. Should there be a simpler way to execute a benchmark within the Python prompt?)
Perhaps I need to read the documentation more carefully, but I am confused about how many times benchmarks run to generate timing and tracking results. I see from here that “The timing itself is based on the Python standard library’s timeit module, with some extensions for automatic heuristics shamelessly stolen from IPython’s %timeit magic function. This means that in most cases the benchmark function itself will be run many times to achieve accurate timing.” Is there a way to control this other than the --quick option, which runs the test only once? Are track_ benchmarks run multiple times, too?

0reactions

datapythonistacommented, Feb 16, 2022

Thanks @JulianWgs for sharing those. Very interesting. Not in scope for the current grant, but very useful for the future.

Top Results From Across the Web

6 Steps to Create Meaningful Benchmarks For Your Nonprofit

In just 6 steps, you can create meaningful and measurable benchmarks for your nonprofit, aiding in grant writing, program evaluation, and staff performance....

Proposal Goals and Benchmarks - Grant proposal template

Proposal benchmarks are clearly listed and appropriate indicators of progress (as stated above, these will be measurable and specific. You may have more...

Nonprofit Benchmarking - The Grantsmanship Center

Learn about the management tool "benchmarking" and how to use it in your nonprofit to measure performance, motivate employees, and build profits.

Benchmarks - Grant Center for the Expressive Arts

Benchmarks - Grant Center for the Expressive Arts. ... Benchmarks. Coming soon! Grant Center for the Expressive Arts. 2510 N 11th St.

Collision Repair Education Foundation Benchmark Grant

The goal of the Benchmark grants is to honor schools that have been doing an outstanding job in educating students in collision repair,...