question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reading backend while it is being written sometimes throws an error

See original GitHub issue

General information:

  • emcee version: 3.0.2
  • platform: Ubuntu 18.04
  • installation method (pip/conda/source/other?): conda

Problem description:

Expected behavior:

The backend (HDF5 file) can be read with no errors while the chain is running and the backend is being written.

Actual behavior:

The process writing to the backend sometimes raises an error when another process is trying to read the HDF5 file. The errors, copied from the shell, is this one

Traceback (most recent call last):
  File "/home/mazzi/miniconda3/envs/pylegal/lib/python3.9/site-packages/h5py/_hl/files.py", line 202, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 96, in h5py.h5f.open
OSError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mazzi/Documenti/DOTTORATO/Progetti/sfhchain/code/mcmc.py", line 741, in <module>
    mcmc(settings)
  File "/home/mazzi/Documenti/DOTTORATO/Progetti/sfhchain/code/mcmc.py", line 296, in mcmc
    for sample in sampler.sample(pos[region_idx, :, :], iterations=STEPS, skip_initial_state_check=True, progress=False):
  File "/home/mazzi/miniconda3/envs/pylegal/lib/python3.9/site-packages/emcee/ensemble.py", line 351, in sample
    self.backend.save_step(state, accepted)
  File "/home/mazzi/miniconda3/envs/pylegal/lib/python3.9/site-packages/emcee/backends/hdf.py", line 206, in save_step
    with self.open("a") as f:
  File "/home/mazzi/miniconda3/envs/pylegal/lib/python3.9/site-packages/emcee/backends/hdf.py", line 67, in open
    f = h5py.File(self.filename, mode)
  File "/home/mazzi/miniconda3/envs/pylegal/lib/python3.9/site-packages/h5py/_hl/files.py", line 424, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/mazzi/miniconda3/envs/pylegal/lib/python3.9/site-packages/h5py/_hl/files.py", line 204, in make_fid
    fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 116, in h5py.h5f.create
OSError: Unable to create file (unable to open file: name = 'results/DEBUG/chain-allstars_0000.hdf5', errno = 17, error message = 'File exists', flags = 15, o_flags = c2)

What have you tried so far?:

I tried setting read_only=True when instantiating the HDFBackend in the script that tries to read the backend, but the problem was not solved.

Minimal example:

Run a chain using writer.py and read multiple times with reader.py. After a few tries the error should appear.

  • writer.py
import time
import emcee
import numpy as np

def lnprob(x):
    time.sleep(0.01)
    return 0.

nwalkers = 100
nsteps = 10000

backend = emcee.backends.HDFBackend('backend.h5')
backend.reset(nwalkers,1)

sampler = emcee.EnsembleSampler(nwalkers,1,lnprob,backend=backend)

pos0 = np.ones(nwalkers) + ((np.random.random(nwalkers)-0.5)*2e-3)
print(pos0.shape)
sampler.run_mcmc(pos0[:, None],nsteps,progress=True)
  • 'reader.py`
import emcee

backend = emcee.backends.HDFBackend('backend.h5',read_only=True)
chain = backend.get_chain()

Edit for the sake of completeness: while the example above does not use multiprocessing, in my actual code I do use it. I see the error both with and without mutiprocessing.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
dfmcommented, Jun 11, 2021

@Thalos12: great! I’d be happy to review such a PR!

1reaction
axiezaicommented, Jun 11, 2021

@dfm thank you for pointing this out, I totally missed it… I edited my code accordingly and it turns out its just not waiting for the master process to finish, I also have to define the backends and a few other things inside the pool, now the following code is working:

if __name__ == '__main__':
    # initialize parallel processing samplers:
    with MPIPool() as pool:
        if not pool.is_master():
            pool.wait()
            sys.exit(0)

        # mcmc setup
        pos = parameters + 1e-2*np.random.randn(28,7)
        nwalkers, ndim = pos.shape
        nsteps = 50000

        # backend:
        file_name = '../data/sub-{}_mcmc_fit.h5'.format(sub_id)
        backend = emcee.backends.HDFBackend(file_name)
        backend.reset(nwalkers, ndim)

        sampler = emcee.EnsembleSampler(nwalkers, ndim, nmm.log_probability, pool=pool, backend=backend)
        sampler.run_mcmc(pos, nsteps, progress=True);

Just documenting this in case other new users run into the same problem with MPIPool.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Whether all error messages should come from the backend in ...
So my backend developer sends me error messages that are not suitable for output to an alert arguing that the message is I...
Read more >
Best Practices for Node.js Error-handling - Toptal
Developers working with Node.js sometimes find themselves writing not-so-clean code while handling all sorts of errors. This article will introduce you to ...
Read more >
Common 503 errors on Fastly | Fastly Help Guides
Error 503 backend read error​​ This error typically appears if a timeout error occurs when Fastly cache servers attempt to fetch content from ......
Read more >
How to Fix the "There Has Been a Critical Error on Your ...
This startling glitch would cause your entire website, and sometimes even your backend, to load as a blank white page.
Read more >
Node.js Error Handling Best Practices: Ship With Confidence
Node.js error handling isn't a walk in the park. When deploying applications into production, we want to know that all code has been...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found