use `@unittest.skipIf` decorators inside tokenizer's tests instead of `if ...: return`
See original GitHub issueFeature request
Currently in tokenizer testing, many tests coded in test_tokenization_common.py
are not relevant for all tokenizers. In most cases, when the test is not relevant, it is still run but no verification is done in it. See the snippet below for example:
https://github.com/huggingface/transformers/blob/114295c010dd9c94d48add7a0f091ba6ebdf482b/tests/test_tokenization_common.py#L384-L396
I would like to propose to replace these if tests in the test methods by @unittest.skipIf
decorators. On the previous example it would give:
@unittest.skipIf(not test_sentencepiece, "Not testing sentencepiece")
def test_subword_regularization_tokenizer(self) -> None:
# Subword regularization is only available for the slow tokenizer.
sp_model_kwargs = {"enable_sampling": True, "alpha": 0.1, "nbest_size": -1}
tokenizer = self.get_tokenizer(sp_model_kwargs=sp_model_kwargs)
self.assertTrue(hasattr(tokenizer, "sp_model_kwargs"))
self.assertIsNotNone(tokenizer.sp_model_kwargs)
self.assertTrue(isinstance(tokenizer.sp_model_kwargs, dict))
self.assertEqual(tokenizer.sp_model_kwargs, sp_model_kwargs)
self.check_subword_sampling(tokenizer)
Motivation
The problem with the current method is that we don’t have a view on the number of tests actually performed on each type of tokenizers. If errors are made in the configuration of the test classes, we can have a green check for all the tests but in reality nothing has been checked
Your contribution
If you ever find it relevant, I can make the changes or let someone else who would be available to do it before me.
Issue Analytics
- State:
- Created a year ago
- Comments:16 (10 by maintainers)
Top GitHub Comments
That works for me!
This is awesome, @SaulLu ! Thank you 😃. I would love this new approach to skip. Leave @sgugger and @LysandreJik for a final confirmation.