Parallel (mpi4py) tests with coverage often lead to INTERNALERROR
See original GitHub issueWhen running tests in parallel for a program which uses mpi4py, often an INTERNALERROR is signalled. The error arises due to a corruption of the .coverage file. This seems to be due to a race condition (hence why the error does not always arise). More specifically I think it is due to concurrent write commands.
The command which gave this error was:
mpirun -n 2 py.test pygyro/diagnostics/ -sxm parallel --cov=pygyro
The operating system is linux, ubuntu bionic beaver 18.04.1. I am using python 3.6.6.
The error generated is as follows:
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 293, in read_file
INTERNALERROR> self.read_fileobj(f)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 271, in read_fileobj
INTERNALERROR> data = self._read_raw_data(file_obj)
INTERNALERROR> Traceback (most recent call last):INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 314, in _read_raw_data
INTERNALERROR> return json.load(file_obj)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 293, in read_file
INTERNALERROR> File "/usr/lib/python3.6/json/__init__.py", line 299, in loadINTERNALERROR> self.read_fileobj(f)
INTERNALERROR> parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 271, in read_fileobj
INTERNALERROR> data = self._read_raw_data(file_obj)
INTERNALERROR> File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 314, in _read_raw_dataINTERNALERROR> return _default_decoder.decode(s)
INTERNALERROR> return json.load(file_obj)
INTERNALERROR> File "/usr/lib/python3.6/json/decoder.py", line 342, in decode
INTERNALERROR> File "/usr/lib/python3.6/json/__init__.py", line 299, in loadINTERNALERROR> raise JSONDecodeError("Extra data", s, end)
INTERNALERROR> parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
INTERNALERROR> json.decoder.JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
INTERNALERROR>
INTERNALERROR> File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
INTERNALERROR> During handling of the above exception, another exception occurred:INTERNALERROR> return _default_decoder.decode(s)
INTERNALERROR>
INTERNALERROR> File "/usr/lib/python3.6/json/decoder.py", line 342, in decode
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR> raise JSONDecodeError("Extra data", s, end)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 184, in wrap_session
INTERNALERROR> json.decoder.JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
INTERNALERROR> session.exitstatus = doit(config, session) or 0INTERNALERROR>
INTERNALERROR> During handling of the above exception, another exception occurred:
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 224, in _mainINTERNALERROR>
INTERNALERROR> Traceback (most recent call last):INTERNALERROR> config.hook.pytest_runtestloop(session=session)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 184, in wrap_session
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 284, in __call__
INTERNALERROR> session.exitstatus = doit(config, session) or 0
INTERNALERROR> return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 224, in _main
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 67, in _hookexec
INTERNALERROR> config.hook.pytest_runtestloop(session=session)
INTERNALERROR> return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 284, in __call__
INTERNALERROR> return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 61, in <lambda>
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 67, in _hookexec
INTERNALERROR> firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
INTERNALERROR> return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pluggy/callers.py", line 203, in _multicall
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 61, in <lambda>
INTERNALERROR> gen.send(outcome)
INTERNALERROR> firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/plugin.py", line 228, in pytest_runtestloop
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pluggy/callers.py", line 203, in _multicall
INTERNALERROR> gen.send(outcome)
INTERNALERROR> self.cov_controller.finish()
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/plugin.py", line 228, in pytest_runtestloop
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/engine.py", line 167, in finish
INTERNALERROR> self.cov_controller.finish()
INTERNALERROR> self.cov.stop()
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/engine.py", line 167, in finish
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/control.py", line 677, in loadINTERNALERROR> self.cov.stop()
INTERNALERROR> self.data_files.read(self.data)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/control.py", line 677, in load
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 653, in readINTERNALERROR> self.data_files.read(self.data)
INTERNALERROR> data.read_file(self.filename)
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 653, in read
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 297, in read_fileINTERNALERROR> data.read_file(self.filename)
INTERNALERROR> filename, exc.__class__.__name__, exc,
INTERNALERROR> File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 297, in read_file
INTERNALERROR> filename, exc.__class__.__name__, exc,
INTERNALERROR> coverage.misc.CoverageException: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage': JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
INTERNALERROR> coverage.misc.CoverageException: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage': JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
========================= 1731 passed in 17.66 seconds =========================
========================= 1731 passed in 17.66 seconds =========================
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[29764,1],0]
Exit code: 3
--------------------------------------------------------------------------
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Parallel programming in Python: mpi4py (part 1) – PDC Blog
To identify the processes with that group, each process is assigned a rank that is unique within the communicator. It also makes sense...
Read more >How To Implement Parallel Pytesting With Code Coverage ...
When running tests in parallel for a program which uses mpi4py, often an The first process to do so was: Process name: [[29764,1],0]...
Read more >Parallel Programming for Science and Engineering
In this book we focus on parallel computing – and more specifically parallel programming; we will not discuss a lot of theory –...
Read more >"Connection was cancelled here" running non-parallel ...
I have a test class which runs fine (with Parallel tests disabled) if I don't run with coverage, but which fails after 1...
Read more >Python Code Coverage and Multiprocessing
One of the possible causes of missing coverage data from forked processes, even with concurrency=multiprocessing , is ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There is a very easy way to fix this: Don’t write to the same coverage file from multiple cores. This can be done by putting the following line in the
setup.cfg
file (or whatever you use as the configuration file):and run your tests with:
This will create a separate
.coverage
file for every core that executed the tests, and combine them afterward into a single report.Note that this also makes it easier to test a package (for example) both in serial and in parallel, and combine all reports into a single one, by doing this:
Finally, also note that the console output when running
pytest
in MPI this way, will probably be messed up a little bit, as both processes try to write to the console simultaneously. This has no effect at all on the actual testing process itself, but may make it a bit harder to figure out what process printed what. As far as I know, there is no way to fix this without makingpytest
somehow MPI-aware.This can be related to https://github.com/nedbat/coveragepy/issues/883#issuecomment-650562896