question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Refactor test suite to be more readable?

See original GitHub issue

While working on #174, I also worked on the test suite. In there we have the ginormous tests that are hard to parse, because they do so many things at the same time:

https://github.com/pytorch/data/blob/c06066ae360fc6054fb826ae041b1cb0c09b2f3b/test/test_datapipe.py#L382-L426

I was wondering if there is a reason for that. Can’t we split this into multiple smaller ones? Utilizing pytest, placing the following class in the test module is equivalent to the test above:

class TestLineReader:
    @pytest.fixture
    def text1(self):
        return "Line1\nLine2"

    @pytest.fixture
    def text2(self):
        return "Line2,1\nLine2,2\nLine2,3"

    def test_functional_read_lines_correctly(self, text1, text2):
        source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
        line_reader_dp = source_dp.readlines()
        expected_result = [("file1", line) for line in text1.split("\n")] + [
            ("file2", line) for line in text2.split("\n")
        ]
        assert expected_result == list(line_reader_dp)

    def test_functional_strip_new_lines_for_bytes(self, text1, text2):
        source_dp = IterableWrapper(
            [("file1", io.BytesIO(text1.encode("utf-8"))), ("file2", io.BytesIO(text2.encode("utf-8")))]
        )
        line_reader_dp = source_dp.readlines()
        expected_result_bytes = [("file1", line.encode("utf-8")) for line in text1.split("\n")] + [
            ("file2", line.encode("utf-8")) for line in text2.split("\n")
        ]
        assert expected_result_bytes == list(line_reader_dp)

    def test_functional_do_not_strip_newlines(self, text1, text2):
        source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
        line_reader_dp = source_dp.readlines(strip_newline=False)
        expected_result = [
            ("file1", "Line1\n"),
            ("file1", "Line2"),
            ("file2", "Line2,1\n"),
            ("file2", "Line2,2\n"),
            ("file2", "Line2,3"),
        ]
        assert expected_result == list(line_reader_dp)

    def test_reset(self, text1, text2):
        source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
        line_reader_dp = LineReader(source_dp, strip_newline=False)
        expected_result = [
            ("file1", "Line1\n"),
            ("file1", "Line2"),
            ("file2", "Line2,1\n"),
            ("file2", "Line2,2\n"),
            ("file2", "Line2,3"),
        ]

        n_elements_before_reset = 2
        res_before_reset, res_after_reset = reset_after_n_next_calls(line_reader_dp, n_elements_before_reset)
        assert expected_result[:n_elements_before_reset] == res_before_reset
        assert expected_result == res_after_reset

    def test_len(self, text1, text2):
        source_dp = IterableWrapper([("file1", io.StringIO(text1)), ("file2", io.StringIO(text2))])
        line_reader_dp = LineReader(source_dp, strip_newline=False)

        with pytest.raises(TypeError, match="has no len"):
            len(line_reader_dp)

This is a lot more readable, since we now actually have 5 separate test cases that can individually fail. Plus, while writing this I also found that test_reset and test_len were somewhat dependent on test_functional_do_not_strip_newlines since they don’t neither define line_reader_dp nor expected_result themselves.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
eripcommented, Jan 21, 2022

FWIW, we’ve started something similar in torchtext. See here if you’re interested.

1reaction
pmeiercommented, Jan 20, 2022

Or even more readable:

class TestLineReader:
    @pytest.fixture
    def files_with_text(self):
        return [
            ("file1", "Line1\nLine2"),
            ("file2", "Line2,1\nLine2,2\nLine2,3"),
        ]

    def make_str_dp(self, files_with_text):
        return IterableWrapper([(file, io.StringIO(text)) for file, text in files_with_text])

    def make_bytes_dp(self, files_with_text):
        return IterableWrapper([(file, io.BytesIO(text.encode("utf-8"))) for file, text in files_with_text])

    def test_functional_read_lines_correctly(self, files_with_text):
        line_reader_dp = self.make_str_dp(files_with_text).readlines()

        expected = []
        for file, text in files_with_text:
            expected.extend((file, line) for line in text.splitlines())

        assert expected == list(line_reader_dp)

    def test_functional_strip_new_lines_for_bytes(self, files_with_text):
        line_reader_dp = self.make_bytes_dp(files_with_text).readlines()

        expected = []
        for file, text in files_with_text:
            expected.extend((file, line.encode("utf-8")) for line in text.splitlines())

        assert expected == list(line_reader_dp)

    def test_functional_do_not_strip_newlines(self, files_with_text):
        line_reader_dp = self.make_str_dp(files_with_text).readlines(strip_newline=False)

        expected = []
        for file, text in files_with_text:
            expected.extend((file, line) for line in text.splitlines(keepends=True))

        assert expected == list(line_reader_dp)

    def test_reset(self, files_with_text):
        line_reader_dp = LineReader(self.make_str_dp(files_with_text))

        expected = []
        for file, text in files_with_text:
            expected.extend((file, line) for line in text.splitlines())

        n_elements_before_reset = 2
        res_before_reset, res_after_reset = reset_after_n_next_calls(line_reader_dp, n_elements_before_reset)

        assert expected[:n_elements_before_reset] == res_before_reset
        assert expected == res_after_reset

    def test_len(self, files_with_text):
        line_reader_dp = LineReader(self.make_str_dp(files_with_text))

        with pytest.raises(TypeError, match="has no len"):
            len(line_reader_dp)
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to refactor code to be more testable - Medium
To prove the correctness of the code, you have to run a set of tests to see what the valIncrementer function gives you...
Read more >
3 Easy Steps to Refactoring Tests for Greater Clarity
Choosing good method names makes tests more readable. A good test expresses what is important, and hides what is unimportant.
Read more >
Is duplicated code more tolerable in unit tests? - Stack Overflow
It seems there is a trade-off between tests' readability and maintainability. If I leave duplicated code in unit tests, they're more readable, ...
Read more >
Code Refactoring: What You Need to Know About It
As a tester, refactoring of code roughly translates to = in-depth testing + regression testing. · User acceptance tests will be important and ......
Read more >
Refactoring: Make this code readable | by Betty LD | Nov, 2022
A good design has good test coverage. What is the relationship between code design and code coverage? The reason is that testable code...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found