Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Refactor test suite to be more readable?

See original GitHub issue

While working on #174, I also worked on the test suite. In there we have the ginormous tests that are hard to parse, because they do so many things at the same time:

https://github.com/pytorch/data/blob/c06066ae360fc6054fb826ae041b1cb0c09b2f3b/test/test_datapipe.py#L382-L426

I was wondering if there is a reason for that. Can’t we split this into multiple smaller ones? Utilizing pytest, placing the following class in the test module is equivalent to the test above:

class TestLineReader:
    @pytest.fixture
    def text1(self):
        return "Line1\nLine2"

    @pytest.fixture
    def text2(self):
        return "Line2,1\nLine2,2\nLine2,3"

    def test_functional_read_lines_correctly(self, text1, text2):
        source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
        line_reader_dp = source_dp.readlines()
        expected_result = [("file1", line) for line in text1.split("\n")] + [
            ("file2", line) for line in text2.split("\n")
        ]
        assert expected_result == list(line_reader_dp)

    def test_functional_strip_new_lines_for_bytes(self, text1, text2):
        source_dp = IterableWrapper(
            [("file1", io.BytesIO(text1.encode("utf-8"))), ("file2", io.BytesIO(text2.encode("utf-8")))]
        )
        line_reader_dp = source_dp.readlines()
        expected_result_bytes = [("file1", line.encode("utf-8")) for line in text1.split("\n")] + [
            ("file2", line.encode("utf-8")) for line in text2.split("\n")
        ]
        assert expected_result_bytes == list(line_reader_dp)

    def test_functional_do_not_strip_newlines(self, text1, text2):
        source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
        line_reader_dp = source_dp.readlines(strip_newline=False)
        expected_result = [
            ("file1", "Line1\n"),
            ("file1", "Line2"),
            ("file2", "Line2,1\n"),
            ("file2", "Line2,2\n"),
            ("file2", "Line2,3"),
        ]
        assert expected_result == list(line_reader_dp)

    def test_reset(self, text1, text2):
        source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
        line_reader_dp = LineReader(source_dp, strip_newline=False)
        expected_result = [
            ("file1", "Line1\n"),
            ("file1", "Line2"),
            ("file2", "Line2,1\n"),
            ("file2", "Line2,2\n"),
            ("file2", "Line2,3"),
        ]

        n_elements_before_reset = 2
        res_before_reset, res_after_reset = reset_after_n_next_calls(line_reader_dp, n_elements_before_reset)
        assert expected_result[:n_elements_before_reset] == res_before_reset
        assert expected_result == res_after_reset

    def test_len(self, text1, text2):
        source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
        line_reader_dp = LineReader(source_dp, strip_newline=False)

        with pytest.raises(TypeError, match="has no len"):
            len(line_reader_dp)

This is a lot more readable, since we now actually have 5 separate test cases that can individually fail. Plus, while writing this I also found that test_reset and test_len were somewhat dependent on test_functional_do_not_strip_newlines since they don’t neither define line_reader_dp nor expected_result themselves.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

eripcommented, Jan 21, 2022

FWIW, we’ve started something similar in torchtext. See here if you’re interested.

1reaction

pmeiercommented, Jan 20, 2022

Or even more readable:

class TestLineReader:
    @pytest.fixture
    def files_with_text(self):
        return [
            ("file1", "Line1\nLine2"),
            ("file2", "Line2,1\nLine2,2\nLine2,3"),
        ]

    def make_str_dp(self, files_with_text):
        return IterableWrapper([(file, io.StringIO(text)) for file, text in files_with_text])

    def make_bytes_dp(self, files_with_text):
        return IterableWrapper([(file, io.BytesIO(text.encode("utf-8"))) for file, text in files_with_text])

    def test_functional_read_lines_correctly(self, files_with_text):
        line_reader_dp = self.make_str_dp(files_with_text).readlines()

        expected = []
        for file, text in files_with_text:
            expected.extend((file, line) for line in text.splitlines())

        assert expected == list(line_reader_dp)

    def test_functional_strip_new_lines_for_bytes(self, files_with_text):
        line_reader_dp = self.make_bytes_dp(files_with_text).readlines()

        expected = []
        for file, text in files_with_text:
            expected.extend((file, line.encode("utf-8")) for line in text.splitlines())

        assert expected == list(line_reader_dp)

    def test_functional_do_not_strip_newlines(self, files_with_text):
        line_reader_dp = self.make_str_dp(files_with_text).readlines(strip_newline=False)

        expected = []
        for file, text in files_with_text:
            expected.extend((file, line) for line in text.splitlines(keepends=True))

        assert expected == list(line_reader_dp)

    def test_reset(self, files_with_text):
        line_reader_dp = LineReader(self.make_str_dp(files_with_text))

        expected = []
        for file, text in files_with_text:
            expected.extend((file, line) for line in text.splitlines())

        n_elements_before_reset = 2
        res_before_reset, res_after_reset = reset_after_n_next_calls(line_reader_dp, n_elements_before_reset)

        assert expected[:n_elements_before_reset] == res_before_reset
        assert expected == res_after_reset

    def test_len(self, files_with_text):
        line_reader_dp = LineReader(self.make_str_dp(files_with_text))

        with pytest.raises(TypeError, match="has no len"):
            len(line_reader_dp)