Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Refactor tests into small test suits

See original GitHub issue

🚀 Refactor tests into small test suits.

As suggested/discussed in some PR, I would like to clean up how unit tests are organized, parameterized and ran.

Motivation

Functionalities in torchaudio have many common test patterns, such as, comparison against librosa, jit-ability and batch consistency. So if a function has nicely-organized test suite, adding the same type of function will be easier. (E.g. “I added this new SoX filter which is very similar to A, and there is a suite of test defined for A, so those are the tests I should be adding.”) That way, it becomes easier for new contributor to write tests.

Let’s take an example of ./test/test_functional.py module. There are a lot of test methods under TestFunctional class.

The difficulty I had with this structure when I worked on https://github.com/pytorch/audio/pull/448 is that it was not immediately clear what kind of tests are typical to add.

The current organization of test_functional.py

$ grep '\bclass\b\|def test' -r ./test/test_functional.py
class TestFunctional(unittest.TestCase):
    def test_torchscript_spectrogram(self):
    def test_torchscript_griffinlim(self):
    def test_griffinlim(self):
    def test_batch_griffinlim(self):
    def test_compute_deltas_onechannel(self):
    def test_compute_deltas_twochannel(self):
    def test_compute_deltas_randn(self):
    def test_batch_pitch(self):
    def test_jit_pitch(self):
    def test_istft_is_inverse_of_stft1(self):
    def test_istft_is_inverse_of_stft2(self):
    def test_istft_is_inverse_of_stft3(self):
    def test_istft_is_inverse_of_stft4(self):
    def test_istft_is_inverse_of_stft5(self):
    def test_istft_of_ones(self):
    def test_istft_of_zeros(self):
    def test_istft_requires_overlap_windows(self):
    def test_istft_requires_nola(self):
    def test_istft_requires_non_empty(self):
    def test_istft_of_sine(self):
    def test_linearity_of_istft1(self):
    def test_linearity_of_istft2(self):
    def test_linearity_of_istft3(self):
    def test_linearity_of_istft4(self):
    def test_batch_istft(self):
    def test_create_fb(self):
    def test_gain(self):
    def test_dither(self):
    def test_vctk_transform_pipeline(self):
    def test_pitch(self):
    def test_torchscript_create_fb_matrix(self):
    def test_torchscript_amplitude_to_DB(self):
    def test_torchscript_create_dct(self):
    def test_torchscript_mu_law_encoding(self):
    def test_torchscript_mu_law_decoding(self):
    def test_torchscript_complex_norm(self):
    def test_mask_along_axis(self):
    def test_mask_along_axis_iid(self):
    def test_torchscript_gain(self):
    def test_torchscript_dither(self):
def test_phase_vocoder(complex_specgrams, rate, hop_length):
def test_complex_norm(complex_tensor, power):
def test_mask_along_axis(specgram, mask_param, mask_value, axis):
def test_mask_along_axis_iid(specgrams, mask_param, mask_value, axis):

Pitch

Continuing with the test_fucntional.py example above, we can start from breaking down test cases and compose test suites.

# This is just an illustration
Class TestSpectrogram:
    """Test suite for `spectrogram`"""

    def test_accuracy(self):
        """Produces expected results"""

    def test_jit_consistency(self):
        """is jit-able and returns consistent result"""

    def test_batch_consistency(self):
        """returns consistent results for batched input"""

    def test_comparison_against_librosa(self):
        """should yield results very close to librosa's implementation"""

Alternatives

An alternative way to break down tests is by the type of test.

class TestJit:
    """Test suite for jit-ability and consistency"""
    def test_spectrogram(self):
        """`spectrogram` should be jit-able"""

    def test_griffinlim(self):
        """`griffinlim` should be jit-able"""

class TestSoxConsistensy:
    """Test suite for sox compatibilities of filters"""
    def test_allpass(self):
        """`allpass` produces result close to SoX"""

    def test_highpass(self):
        """`highpass` produces result close to SoX"""

Pro: With this approach, it is easy to find an answer to question like, ‘Which function is jit-able?’ Con: Suit of tests for same function are scattered.

Related stuff

In addition to the above, when we discuss testing, we can also talk about

Run flake8 / black test?
Which test runner to use? pytest / unittest
Should we use PyTorch’s test utility? etc…

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

mthrokcommented, Mar 30, 2020

@vincentqb I created PR #480, which re-organize tests based on category in a small scope (only test_functional module). Once we can agree on pros-and-cons of the re-org, I will proceed with the rest of the test module.

Regarding, git blame, it is true that the information of the original author will be lost from the latest commit, but if the intention of test is correctly documented (either as descriptive test function name, or docstring), then I think it’s overall more beneficial to make test more readable and intuitively comprehensive while the whole code base is somewhat small.

1reaction

cpuhrschcommented, Mar 26, 2020

As a word of caution on code formatting: git blame can be very useful in quickly figuring out why something went wrong. Therefore, I would not just apply some formatter to all files and commit it for the sake of code formatting. Instead, we could setup a system that forces formatting on new lines or at the very least new files. In general, it makes sense to set a code formatting standard through an automated tool to, at the very least, avoid discussion and spending time on the topic. It’s easy to start bike-shedding whitespace.

On code coverage: It’s a good way to find gaping blindspots in your test, but it’s a bad way to feel good about our tests. There is also some code (for example codegen) that won’t necessarily be covered and therefore will continue to have to be explicitly excluded. I use code coverage for my own personal dev, but it being part of a CI can cause confusion / cause more overhead.