Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[testing] when to @slow and when not to? (huge models download)

See original GitHub issue

Looking at the CI logs we do have huge models downloaded (i.e. not @slow):

Downloading: 100% 1.16G/1.16G [00:52<00:00, 22.3MB/s]
Downloading: 100% 433M/433M [00:08<00:00, 48.4MB/s]s]
Downloading:  43% 369M/863M [00:08<00:10, 45.4MB/s]

so it’s very inconsistent. Why not have a whole bunch more of tests not be @slow then if we are downloading huge files anyway? A lot of those tests are very fast, other than the download overhead. Or, perhaps, those currently doing huge downloads should be @slow in first place?

I’m asking since I was told not to run any fsmt tests with the full model unless it’s @slow (size ~1.1GB). So it’s unclear when it’s OK to include huge models in the non-slow test suite and when not to.

Also, here is an alternative approach to think about - why not download large weights while other tests not needing them are running? i.e. fork a process early on on CI after pip installs are done and let it cache the models - then they will be ready to be used by the time the tests that need them get to run. This is an unpolished idea, since one needs to figure out how to re-sort the tests so that these large-model tests aren’t run first…

Issue Analytics

State:
Created 3 years ago
Comments:9 (8 by maintainers)

Top GitHub Comments

1reaction

LysandreJikcommented, Nov 23, 2020

Hi, sorry for getting back to you so late. I believe this was due to the pipeline tests, but that should not be the case anymore since the refactor of the pipeline tests by Thom.

If some tests still download large files, then that’s an error which we should resolve.

0reactions

stas00commented, Nov 27, 2020

Thank you for reading my ideas and following up, @LysandreJik.

I made a tentative 50MB suggestion in https://github.com/huggingface/transformers/pull/8824

We can tweak it if it’s not right down the road.

Top Results From Across the Web

Iterating Quickly On Large Data and Slow Models - Medium

This entailed a series of very slow activities, including downloading data locally, looking up values from our database, and making calls to API ......

Testing - Hugging Face

It's easy to measure the run-time incorrectly if for example there is an overheard of downloading a huge model, but if you test...

Performance Testing Tutorial – Types (Example) - Guru99

1. Know your physical test environment, production environment and what testing tools are available. 2. This includes goals and constraints for throughput, response...

How to Test Machine Learning Models | Deepchecks

Testing is an iterative process and can be difficult when working on ML projects that may require huge amounts of data along with...

Performance Testing Types, Steps, Best Practices, and Metrics

Identify performance test scenarios that take into account user variability, test data, and target metrics. This will create one or two models.