Issue loading Keras LSTM in subprocess
See original GitHub issueI’m trying to train an RL model in parallel than relies on the output of a pre-trained Keras model. I’m trying to construct the Keras model in each subprocess and then set the weights from a saved file. The model is initialized as below:
print("building...")
x = Input(shape=(None, self.z_dim+self.action_dim))
print('input done')
lstm = LSTM(self.hidden_units, return_sequences=True, return_state=True)
print('lstm done')
lstm_out, _, _ = lstm(x)
print('dense started')
mdn = Dense(self.n_mixtures * (3*self.z_dim))(lstm_out)
print('dense done')
rnn = Model(x, mdn)
However, the execution hangs on the line lstm_out, _, _ = lstm(x)
. The model initializes without problems in the main module. I am also able to initialize other models (that don’t have LSTM layers) in subprocesses without problem. Does anyone have any idea what the problem could be?
Let me know if there’s any other relevant info I can provide. I’m up to date on Keras and TF, and I’ve only tested this issue on Mac OS.
Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question on StackOverflow or join the Keras Slack channel and ask there instead of filing a GitHub issue.
Thank you!
-
Check that you are up-to-date with the master branch of Keras. You can update with: pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
-
If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
-
If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with: pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
-
Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (4 by maintainers)
I’m actually having some positive results using multiprocessing and making sure I run
multiprocessing.set_start_method('spawn')
At the start of my main script! Seems to allow me to get past the tensorflow locking issue. I’ll be curious to hear if it works for you as well.
I haven’t tried that one, will give it a go tomorrow when I get back to office! I was trying using
fork
andforkserver
starting methods, but that didn’t help, somehow I never thought about using thespawn
method.Since you are working on celery anyway, I think another (safer?) option for you would be to replace
multiprocessing
bybilliard
, which should be more amenable to celery tasks.