Calling crypten.nn.from_pytorch hangs
See original GitHub issueBug
We want to have a parent process that call crypten.nn.from_pytorch
then child processes will do this as well, however, child processes seems to block at the first call to torch.onnx.export
in the crypten.nn.from_pytorch
function.
Reproduce
This example script can help in reproducing the bug
import torch
import crypten
import multiprocessing
def _get_model():
class ExampleNet(torch.nn.Module):
def __init__(self):
super(ExampleNet, self).__init__()
self.conv1 = torch.nn.Conv2d(1, 16, kernel_size=5, padding=0)
self.fc1 = torch.nn.Linear(16 * 12 * 12, 100)
self.fc2 = torch.nn.Linear(
100, 2
) # For binary classification, final layer needs only 2 outputs
def forward(self, x):
out = self.conv1(x)
out = torch.nn.functional.relu(out)
out = torch.nn.functional.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = self.fc1(out)
out = torch.nn.functional.relu(out)
out = self.fc2(out)
return out
dummy_input = torch.empty(1, 1, 28, 28)
example_net = ExampleNet()
model = crypten.nn.from_pytorch(example_net, dummy_input)
return model
def proc():
print("\tGetting model inside proc")
# it blocks here only when we have called crypten.nn.from_pytorch in the parent process
model = _get_model()
print("\tGot model inside proc")
return model
print("[+] Start")
# it doesn't block if we call this multiple times inside the same process
model = _get_model()
print("[+] Got model")
process = multiprocessing.Process(target=proc, args=())
print("[+] Starting process")
process.start()
print("[+] Waiting process")
process.join()
print("[+] End")
Environment
$ python collect_env.py
Collecting environment information...
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Manjaro Linux
GCC version: (GCC) 9.2.0
CMake version: version 3.16.2
Python version: 3.7
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Versions of relevant libraries:
[pip3] numpy==1.18.0
[pip3] torch==1.3.1
[conda] torch 1.4.0 pypi_0 pypi
[conda] torchvision 0.5.0 pypi_0 pypi
$ pip freeze
absl-py==0.9.0
appdirs==1.4.3
astor==0.8.1
attrs==19.3.0
backcall==0.1.0
black==19.10b0
bleach==3.1.0
certifi==2019.11.28
cffi==1.13.2
chardet==3.0.4
Click==7.0
coverage==4.5
-e git+https://github.com/facebookresearch/CrypTen.git@68e0364c66df95ddbb98422fb641382c3f58734c#egg=crypten
cryptography==2.8
decorator==4.4.1
defusedxml==0.6.0
dill==0.3.1.1
entrypoints==0.3
Flask==1.1.1
Flask-SocketIO==4.2.1
future==0.18.2
gast==0.2.2
google-pasta==0.1.8
grpcio==1.26.0
h5py==2.10.0
idna==2.8
importlib-metadata==1.3.0
ipykernel==5.1.3
ipython==7.10.2
ipython-genutils==0.2.0
ipywidgets==7.5.1
itsdangerous==1.1.0
jedi==0.15.1
Jinja2==2.10.3
joblib==0.14.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.0.0
jupyter-core==4.6.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
lz4==3.0.2
Markdown==3.1.1
MarkupSafe==1.1.1
mistune==0.8.4
more-itertools==8.0.2
msgpack==1.0.0
nbconvert==5.6.1
nbformat==4.4.0
notebook==6.0.2
numpy==1.18.1
onnx==1.6.0
opt-einsum==3.1.0
packaging==20.1
pandocfilters==1.4.2
parso==0.5.2
pathspec==0.7.0
pexpect==4.7.0
phe==1.4.0
pickleshare==0.7.5
Pillow==6.2.2
pluggy==0.13.1
prometheus-client==0.7.1
prompt-toolkit==2.0.9
protobuf==3.11.2
ptyprocess==0.6.0
pudb==2019.2
py==1.8.1
pycparser==2.19
Pygments==2.5.2
pyOpenSSL==19.1.0
pyparsing==2.4.6
pyrsistent==0.15.6
pytest==5.3.4
pytest-cov==2.8.1
python-dateutil==2.8.1
python-engineio==3.11.1
python-socketio==4.4.0
PyYAML==5.2
pyzmq==18.1.0
qtconsole==4.6.0
regex==2020.2.20
requests==2.22.0
RestrictedPython==5.0
scikit-learn==0.22
scipy==1.4.1
Send2Trash==1.5.0
six==1.13.0
sklearn==0.0
-e git+git@github.com:youben11/PySyft.git@1faf4400ffcce224fde347333cbfe15a5ab12660#egg=syft
syft-proto==0.2.1a1.post2
tblib==1.6.0
tensorboard==1.15.0
tensorflow==1.15.0
tensorflow-estimator==1.15.1
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
tf-encrypted==0.5.9
toml==0.10.0
torch==1.4.0
torchvision==0.5.0
tornado==4.5.3
traitlets==4.3.3
typed-ast==1.4.1
typing-extensions==3.7.4.1
urllib3==1.25.7
urwid==2.1.0
wcwidth==0.1.7
webencodings==0.5.1
websocket-client==0.57.0
websockets==8.1
Werkzeug==0.16.0
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==0.6.0
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Python execution hangs when attempting to add a pytorch ...
Modules and feed them to an nn.Sequential constructor. For example: import torch.nn as nn modules = [] modules.append(nn.Linear ...
Read more >nn.DataParallel gets stuck - PyTorch Forums
I'm trying to train a model on multiGPU using nn.DataParallel and the program gets stuck. (in the sense I can't even ctrl+c to...
Read more >nn.DataParallel(model).cuda() hangs - PyTorch Forums
Hi, If I use cuda for my network by model.cuda() Everything is ok. The model is big, so it consumes 91% of video...
Read more >DDP hangs upon creation - distributed - PyTorch Forums
Hi. I'm trying to use DDP on two nodes, but the DDP creation hangs forever. The code is like this: import torch import...
Read more >PyTorch nn.DataParallel hang
import logging import os import argparse import sys import warnings import pandas as pd import numpy as np from tqdm import tqdm from ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@youben11 Adding
torch.set_num_threads(1)
as https://github.com/pytorch/pytorch/issues/36191#issuecomment-620956849 suggested does the trick. I successfully ran the script on Linux with the additional setting:Feel free to reopen if you have trouble.
@youben11 I was able to reproduce this behavior on my end. Unfortunately, it seems on some versions of Linux
torch.onnx.export
does not support spawning another child process for exporting 😦I filed a bug with PyTorch https://github.com/pytorch/pytorch/issues/36191#issue-596231442 Feel free to add any context I may have missed on that issue / follow along.