Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

use `@unittest.skipIf` decorators inside tokenizer's tests instead of `if ...: return`

See original GitHub issue

Feature request

Currently in tokenizer testing, many tests coded in test_tokenization_common.py are not relevant for all tokenizers. In most cases, when the test is not relevant, it is still run but no verification is done in it. See the snippet below for example: https://github.com/huggingface/transformers/blob/114295c010dd9c94d48add7a0f091ba6ebdf482b/tests/test_tokenization_common.py#L384-L396

I would like to propose to replace these if tests in the test methods by @unittest.skipIf decorators. On the previous example it would give:

    @unittest.skipIf(not test_sentencepiece, "Not testing sentencepiece")
    def test_subword_regularization_tokenizer(self) -> None:
        # Subword regularization is only available for the slow tokenizer.
        sp_model_kwargs = {"enable_sampling": True, "alpha": 0.1, "nbest_size": -1}
        tokenizer = self.get_tokenizer(sp_model_kwargs=sp_model_kwargs)

        self.assertTrue(hasattr(tokenizer, "sp_model_kwargs"))
        self.assertIsNotNone(tokenizer.sp_model_kwargs)
        self.assertTrue(isinstance(tokenizer.sp_model_kwargs, dict))
        self.assertEqual(tokenizer.sp_model_kwargs, sp_model_kwargs)
        self.check_subword_sampling(tokenizer)

Motivation

The problem with the current method is that we don’t have a view on the number of tests actually performed on each type of tokenizers. If errors are made in the configuration of the test classes, we can have a green check for all the tests but in reality nothing has been checked