question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallel (mpi4py) tests with coverage often lead to INTERNALERROR

See original GitHub issue

When running tests in parallel for a program which uses mpi4py, often an INTERNALERROR is signalled. The error arises due to a corruption of the .coverage file. This seems to be due to a race condition (hence why the error does not always arise). More specifically I think it is due to concurrent write commands.

The command which gave this error was: mpirun -n 2 py.test pygyro/diagnostics/ -sxm parallel --cov=pygyro

The operating system is linux, ubuntu bionic beaver 18.04.1. I am using python 3.6.6.

The error generated is as follows:

INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 293, in read_file
INTERNALERROR>     self.read_fileobj(f)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 271, in read_fileobj
INTERNALERROR>     data = self._read_raw_data(file_obj)

INTERNALERROR> Traceback (most recent call last):INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 314, in _read_raw_data

INTERNALERROR>     return json.load(file_obj)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 293, in read_file
INTERNALERROR>   File "/usr/lib/python3.6/json/__init__.py", line 299, in loadINTERNALERROR>     self.read_fileobj(f)

INTERNALERROR>     parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 271, in read_fileobj
INTERNALERROR>     data = self._read_raw_data(file_obj)
INTERNALERROR>   File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 314, in _read_raw_dataINTERNALERROR>     return _default_decoder.decode(s)

INTERNALERROR>     return json.load(file_obj)
INTERNALERROR>   File "/usr/lib/python3.6/json/decoder.py", line 342, in decode
INTERNALERROR>   File "/usr/lib/python3.6/json/__init__.py", line 299, in loadINTERNALERROR>     raise JSONDecodeError("Extra data", s, end)

INTERNALERROR>     parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
INTERNALERROR> json.decoder.JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
INTERNALERROR> 
INTERNALERROR>   File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
INTERNALERROR> During handling of the above exception, another exception occurred:INTERNALERROR>     return _default_decoder.decode(s)

INTERNALERROR> 
INTERNALERROR>   File "/usr/lib/python3.6/json/decoder.py", line 342, in decode
INTERNALERROR> Traceback (most recent call last):
INTERNALERROR>     raise JSONDecodeError("Extra data", s, end)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 184, in wrap_session
INTERNALERROR> json.decoder.JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
INTERNALERROR>     session.exitstatus = doit(config, session) or 0INTERNALERROR> 

INTERNALERROR> During handling of the above exception, another exception occurred:
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 224, in _mainINTERNALERROR> 

INTERNALERROR> Traceback (most recent call last):INTERNALERROR>     config.hook.pytest_runtestloop(session=session)

INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 184, in wrap_session
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 284, in __call__
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/_pytest/main.py", line 224, in _main
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 67, in _hookexec
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/hooks.py", line 284, in __call__
INTERNALERROR>     return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 61, in <lambda>
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 67, in _hookexec
INTERNALERROR>     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
INTERNALERROR>     return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/callers.py", line 203, in _multicall
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/manager.py", line 61, in <lambda>
INTERNALERROR>     gen.send(outcome)
INTERNALERROR>     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/plugin.py", line 228, in pytest_runtestloop
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pluggy/callers.py", line 203, in _multicall
INTERNALERROR>     gen.send(outcome)
INTERNALERROR>     self.cov_controller.finish()
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/plugin.py", line 228, in pytest_runtestloop
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/engine.py", line 167, in finish
INTERNALERROR>     self.cov_controller.finish()
INTERNALERROR>     self.cov.stop()
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/pytest_cov/engine.py", line 167, in finish
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/control.py", line 677, in loadINTERNALERROR>     self.cov.stop()

INTERNALERROR>     self.data_files.read(self.data)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/control.py", line 677, in load
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 653, in readINTERNALERROR>     self.data_files.read(self.data)

INTERNALERROR>     data.read_file(self.filename)
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 653, in read
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 297, in read_fileINTERNALERROR>     data.read_file(self.filename)

INTERNALERROR>     filename, exc.__class__.__name__, exc,
INTERNALERROR>   File "/home/emily/.local/lib/python3.6/site-packages/coverage/data.py", line 297, in read_file
INTERNALERROR>     filename, exc.__class__.__name__, exc,
INTERNALERROR> coverage.misc.CoverageException: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage': JSONDecodeError: Extra data: line 1 column 9466 (char 9465)
INTERNALERROR> coverage.misc.CoverageException: Couldn't read data from '/home/emily/Documents/Cours_TUM/projet/Code/PyGyro/.coverage': JSONDecodeError: Extra data: line 1 column 9466 (char 9465)


========================= 1731 passed in 17.66 seconds =========================
========================= 1731 passed in 17.66 seconds =========================
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[29764,1],0]
  Exit code:    3
--------------------------------------------------------------------------

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

4reactions
1313ecommented, Oct 30, 2019

There is a very easy way to fix this: Don’t write to the same coverage file from multiple cores. This can be done by putting the following line in the setup.cfg file (or whatever you use as the configuration file):

[coverage:run]
parallel = true

and run your tests with:

mpiexec -n 2 coverage run --rcfile=setup.cfg -m mpi4py -m pytest
coverage combine
coverage report -m

This will create a separate .coverage file for every core that executed the tests, and combine them afterward into a single report.

Note that this also makes it easier to test a package (for example) both in serial and in parallel, and combine all reports into a single one, by doing this:

coverage run --rcfile=setup.cfg -m pytest
mpiexec -n 2 coverage run --rcfile=setup.cfg -m mpi4py -m pytest
coverage combine
coverage report -m

Finally, also note that the console output when running pytest in MPI this way, will probably be messed up a little bit, as both processes try to write to the console simultaneously. This has no effect at all on the actual testing process itself, but may make it a bit harder to figure out what process printed what. As far as I know, there is no way to fix this without making pytest somehow MPI-aware.

0reactions
JulienPalardcommented, Jun 27, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

Parallel programming in Python: mpi4py (part 1) – PDC Blog
To identify the processes with that group, each process is assigned a rank that is unique within the communicator. It also makes sense...
Read more >
How To Implement Parallel Pytesting With Code Coverage ...
When running tests in parallel for a program which uses mpi4py, often an The first process to do so was: Process name: [[29764,1],0]...
Read more >
Parallel Programming for Science and Engineering
In this book we focus on parallel computing – and more specifically parallel programming; we will not discuss a lot of theory –...
Read more >
"Connection was cancelled here" running non-parallel ...
I have a test class which runs fine (with Parallel tests disabled) if I don't run with coverage, but which fails after 1...
Read more >
Python Code Coverage and Multiprocessing
One of the possible causes of missing coverage data from forked processes, even with concurrency=multiprocessing , is ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found