Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[testing] making network tests more reliable

See original GitHub issue

We have a group of tests that require a reliable network, which is never 100% so they fail for many months.

I propose that those tests will be rewritten with unstable network in mind and include:

time.sleep(3)
retry 3-5 times

e.g. one of the candidates is:

tests/test_hf_api.py::HfApiEndpointsTest::test_list_repos_objs

but also recent tests that push to hub.

Perhaps a simple retry context manager can be added to testing_utils.py, which would trap exceptions and retry after a pause. And then simply wrap the content of existing tests into that context manager, e.g.:

with RetryAfterSleepTest():
    # normal test code

it could accept the number of retries and sleep time between retries for optional arguments.

Of course, it’s probably even better to make it also a decorator. e.g. @unreliable_network_retry

@LysandreJik

Issue Analytics

State:
Created 2 years ago
Comments:24 (24 by maintainers)

Top GitHub Comments

2reactions

LysandreJikcommented, Nov 19, 2021

That sounds good, even if I’m a bit afraid that retrying in succession won’t solve much. When a test fails for server error, then usually other tests fail. I’m still open to trying it out to see if it improves these errors!

Would you like to give it a try? I’m guessing only this method needs to be modified: https://github.com/huggingface/transformers/blob/efea0f868bd381244e3cef51b388293e41a36d1e/src/transformers/file_utils.py#L1594

cc @julien-c as this is a safeguard against the server’s instabilities.

1reaction

LysandreJikcommented, Jan 17, 2022

Thank you for handling the github bot - would love to make time for this this or next week.