error while installing caffe2 from pytorch source code
See original GitHub issueI am trying to install pytroch from source code and I met an error while compiling: [ 86%] Building CXX object caffe2/CMakeFiles/torch.dir//torch/csrc/autograd/profiler_cuda.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch.dir//torch/csrc/autograd/functions/comm.cpp.o [ 86%] Building CXX object caffe2/CMakeFiles/torch.dir/__/torch/csrc/cuda/comm.cpp.o [ 86%] Linking CXX shared library …/lib/libtorch.so /usr/bin/ld: /home/wjfan/anaconda3/envs/video-lfb/lib/libmagma.a(error.cpp.o): unrecognized relocation (0x2a) in section .text’ /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status make[2]: *** [lib/libtorch.so] Error 1 make[1]: *** [caffe2/CMakeFiles/torch.dir/all] Error 2 make: *** [all] Error 2 Traceback (most recent call last): File “setup.py”, line 759, in build_deps() File “setup.py”, line 321, in build_deps cmake=cmake) File “/media/sdf/wjfan/Pytorch/pytorch/tools/build_pytorch_libs.py”, line 63, in build_caffe2 cmake.build(my_env) File “/media/sdf/wjfan/Pytorch/pytorch/tools/setup_helpers/cmake.py”, line 329, in build self.run(build_args, my_env) File “/media/sdf/wjfan/Pytorch/pytorch/tools/setup_helpers/cmake.py”, line 142, in run check_call(command, cwd=self.build_dir, env=env) File “/home/wjfan/anaconda3/envs/video-lfb/lib/python2.7/subprocess.py”, line 190, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command ‘[‘cmake’, ‘–build’, ‘.’, ‘–target’, ‘install’, ‘–config’, ‘Release’, ‘–’, ‘-j’, ‘56’]’ returned non-zero exit status 2 `
I have no idea how this error happen and how to fix it. my cuda version is 10.0, cudnn version is 7.30 please help me!
Issue Analytics
- State:
- Created 4 years ago
- Comments:5
Top GitHub Comments
I don’t really understand how this happened. I googled for it and others said that it’s caused by the ‘avx’ module. So I recompiled pytorch with adding a avx cmake option, the warnings of ‘avx is not compiled with caffe2’ is still raised but the ETA reduce from 6 days to 3 days. It seems reasonable to me now, thanks for your reply!
Hi @banshee1, Glad that you solved the issue!
Do the two machines have the same disk and the same number of CPUs? I’d guess that it might be IO issues (one machine has a faster IO speed than the other). I guess you can try using only 4 GPUs on the 8-GPU machine and make sure that you run exactly the same thing on the two machines and compare the results to help debugging.