Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems with parallelization within a batch using `const` token

See original GitHub issue

Hello Again,

So I’ve been trying to get the parallelization working for this, and when I set n_cores_batch = 2 in the config.json file it keeps giving me the error below. I’m not sure what is causing this issue, and it persists with any other value for n_cores_batch other than 1. Do you have any insight into why this might be?

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/software/anaconda/3/envs/dso-sw/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/software/anaconda/3/envs/dso-sw/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/nas0/tluchko/sandbox/deep-symbolic-optimization/dso/dso/train.py", line 28, in work
    optimized_constants = p.optimize()
  File "/nas0/tluchko/sandbox/deep-symbolic-optimization/dso/dso/program.py", line 393, in optimize
    optimized_constants = Program.const_optimizer(f, x0)
TypeError: 'NoneType' object is not callable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "rundso.py", line 9, in <module>
    model.train()
  File "/nas0/tluchko/sandbox/deep-symbolic-optimization/dso/dso/core.py", line 90, in train
    **self.config_training))
  File "/nas0/tluchko/sandbox/deep-symbolic-optimization/dso/dso/train.py", line 278, in learn
    results = pool.map(work, programs_to_optimize)
  File "/home/software/anaconda/3/envs/dso-sw/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/software/anaconda/3/envs/dso-sw/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
TypeError: 'NoneType' object is not callable

Issue Analytics

State:
Created 2 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

brendenpetersencommented, Aug 20, 2021

Hi @Sean-Reilly, as a temporary hack, you can add pool = None in train.py at the beginning of the learn() function. That will break some other use cases (namely, the control task when using PyBullet envs), but should be just fine for regression.

A real fix will be incoming.

1reaction

brendenpetersencommented, Aug 20, 2021

Hi @Sean-Reilly , thanks for the config! I am able to reproduce the bug. Looks like an issue when using n_cores_batch > 1 and the const token. I can reproduce the bug with a simplified config:

{
"task" : {
    "function_set" : ["add", "mul", "div", "sub", "const"]
},
"training" : {
        "n_samples" : 100,
        "batch_size" : 10,
        "n_cores_batch" : 2
    }
}

I will look into this and report back! Thanks.