Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Calling crypten.nn.from_pytorch hangs

See original GitHub issue

Bug

We want to have a parent process that call crypten.nn.from_pytorch then child processes will do this as well, however, child processes seems to block at the first call to torch.onnx.export in the crypten.nn.from_pytorch function.

Reproduce

This example script can help in reproducing the bug

import torch
import crypten
import multiprocessing


def _get_model():
    class ExampleNet(torch.nn.Module):
        def __init__(self):
            super(ExampleNet, self).__init__()
            self.conv1 = torch.nn.Conv2d(1, 16, kernel_size=5, padding=0)
            self.fc1 = torch.nn.Linear(16 * 12 * 12, 100)
            self.fc2 = torch.nn.Linear(
                100, 2
            )  # For binary classification, final layer needs only 2 outputs

        def forward(self, x):
            out = self.conv1(x)
            out = torch.nn.functional.relu(out)
            out = torch.nn.functional.max_pool2d(out, 2)
            out = out.view(out.size(0), -1)
            out = self.fc1(out)
            out = torch.nn.functional.relu(out)
            out = self.fc2(out)
            return out

    dummy_input = torch.empty(1, 1, 28, 28)
    example_net = ExampleNet()
    model = crypten.nn.from_pytorch(example_net, dummy_input)
    return model


def proc():
    print("\tGetting model inside proc")
    # it blocks here only when we have called crypten.nn.from_pytorch in the parent process
    model = _get_model()
    print("\tGot model inside proc")
    return model


print("[+] Start")
# it doesn't block if we call this multiple times inside the same process
model = _get_model()
print("[+] Got model")
process = multiprocessing.Process(target=proc, args=())
print("[+] Starting process")
process.start()
print("[+] Waiting process")
process.join()
print("[+] End")

Environment

$ python collect_env.py
Collecting environment information...
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Manjaro Linux
GCC version: (GCC) 9.2.0
CMake version: version 3.16.2

Python version: 3.7
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.18.0
[pip3] torch==1.3.1
[conda] torch                     1.4.0                    pypi_0    pypi
[conda] torchvision               0.5.0                    pypi_0    pypi

$ pip freeze
absl-py==0.9.0
appdirs==1.4.3
astor==0.8.1
attrs==19.3.0
backcall==0.1.0
black==19.10b0
bleach==3.1.0
certifi==2019.11.28
cffi==1.13.2
chardet==3.0.4
Click==7.0
coverage==4.5
-e git+https://github.com/facebookresearch/CrypTen.git@68e0364c66df95ddbb98422fb641382c3f58734c#egg=crypten
cryptography==2.8
decorator==4.4.1
defusedxml==0.6.0
dill==0.3.1.1
entrypoints==0.3
Flask==1.1.1
Flask-SocketIO==4.2.1
future==0.18.2
gast==0.2.2
google-pasta==0.1.8
grpcio==1.26.0
h5py==2.10.0
idna==2.8
importlib-metadata==1.3.0
ipykernel==5.1.3
ipython==7.10.2
ipython-genutils==0.2.0
ipywidgets==7.5.1
itsdangerous==1.1.0
jedi==0.15.1
Jinja2==2.10.3
joblib==0.14.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==5.3.4
jupyter-console==6.0.0
jupyter-core==4.6.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
lz4==3.0.2
Markdown==3.1.1
MarkupSafe==1.1.1
mistune==0.8.4
more-itertools==8.0.2
msgpack==1.0.0
nbconvert==5.6.1
nbformat==4.4.0
notebook==6.0.2
numpy==1.18.1
onnx==1.6.0
opt-einsum==3.1.0
packaging==20.1
pandocfilters==1.4.2
parso==0.5.2
pathspec==0.7.0
pexpect==4.7.0
phe==1.4.0
pickleshare==0.7.5
Pillow==6.2.2
pluggy==0.13.1
prometheus-client==0.7.1
prompt-toolkit==2.0.9
protobuf==3.11.2
ptyprocess==0.6.0
pudb==2019.2
py==1.8.1
pycparser==2.19
Pygments==2.5.2
pyOpenSSL==19.1.0
pyparsing==2.4.6
pyrsistent==0.15.6
pytest==5.3.4
pytest-cov==2.8.1
python-dateutil==2.8.1
python-engineio==3.11.1
python-socketio==4.4.0
PyYAML==5.2
pyzmq==18.1.0
qtconsole==4.6.0
regex==2020.2.20
requests==2.22.0
RestrictedPython==5.0
scikit-learn==0.22
scipy==1.4.1
Send2Trash==1.5.0
six==1.13.0
sklearn==0.0
-e git+git@github.com:youben11/PySyft.git@1faf4400ffcce224fde347333cbfe15a5ab12660#egg=syft
syft-proto==0.2.1a1.post2
tblib==1.6.0
tensorboard==1.15.0
tensorflow==1.15.0
tensorflow-estimator==1.15.1
termcolor==1.1.0
terminado==0.8.3
testpath==0.4.4
tf-encrypted==0.5.9
toml==0.10.0
torch==1.4.0
torchvision==0.5.0
tornado==4.5.3
traitlets==4.3.3
typed-ast==1.4.1
typing-extensions==3.7.4.1
urllib3==1.25.7
urwid==2.1.0
wcwidth==0.1.7
webencodings==0.5.1
websocket-client==0.57.0
websockets==8.1
Werkzeug==0.16.0
widgetsnbextension==3.5.1
wrapt==1.11.2
zipp==0.6.0

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:7 (7 by maintainers)

Top GitHub Comments

2reactions

marksibrahimcommented, Apr 29, 2020

@youben11 Adding torch.set_num_threads(1) as https://github.com/pytorch/pytorch/issues/36191#issuecomment-620956849 suggested does the trick. I successfully ran the script on Linux with the additional setting:

[+] Start
[+] Got model
[+] Starting process
[+] Waiting process
	Getting model inside proc
	Got model inside proc
[+] End

Feel free to reopen if you have trouble.

1reaction

marksibrahimcommented, Apr 8, 2020

@youben11 I was able to reproduce this behavior on my end. Unfortunately, it seems on some versions of Linux torch.onnx.export does not support spawning another child process for exporting 😦

I filed a bug with PyTorch https://github.com/pytorch/pytorch/issues/36191#issue-596231442 Feel free to add any context I may have missed on that issue / follow along.