Installing TorchRec in Nvidia PyTorch 22.07 container from NGC
See original GitHub issueHello. I’m trying to install TorchRec inside nvcr.io/nvidia/pytorch:22.07-py3
container that comes with CUDA 11.7. The installation itself looks successful but when I try to do import torchrec
in Python later I get some errors that apparently are related to fbgemm_gpu
package.
The simplest reproducibility instruction I can offer is:
docker run nvcr.io/nvidia/pytorch:22.07-py3 bash -c 'pip install torchrec && python -c "import torchrec"'
The error message is:
libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/_ops.py", line 203, in __getattr__
op, overload_names = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator fbgemm::jagged_2d_to_dense
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.8/site-packages/torchrec/__init__.py", line 8, in <module>
import torchrec.distributed # noqa
File "/opt/conda/lib/python3.8/site-packages/torchrec/distributed/__init__.py", line 36, in <module>
from torchrec.distributed.model_parallel import DistributedModelParallel # noqa
File "/opt/conda/lib/python3.8/site-packages/torchrec/distributed/model_parallel.py", line 21, in <module>
from torchrec.distributed.planner import (
File "/opt/conda/lib/python3.8/site-packages/torchrec/distributed/planner/__init__.py", line 22, in <module>
from torchrec.distributed.planner.planners import EmbeddingShardingPlanner # noqa
File "/opt/conda/lib/python3.8/site-packages/torchrec/distributed/planner/planners.py", line 16, in <module>
from torchrec.distributed.planner.constants import BATCH_SIZE, MAX_SIZE
File "/opt/conda/lib/python3.8/site-packages/torchrec/distributed/planner/constants.py", line 10, in <module>
from torchrec.distributed.embedding_types import EmbeddingComputeKernel
File "/opt/conda/lib/python3.8/site-packages/torchrec/distributed/embedding_types.py", line 14, in <module>
from fbgemm_gpu.split_table_batched_embeddings_ops import EmbeddingLocation
File "/opt/conda/lib/python3.8/site-packages/fbgemm_gpu/__init__.py", line 22, in <module>
from . import _fbgemm_gpu_docs
File "/opt/conda/lib/python3.8/site-packages/fbgemm_gpu/_fbgemm_gpu_docs.py", line 18, in <module>
torch.ops.fbgemm.jagged_2d_to_dense,
File "/opt/conda/lib/python3.8/site-packages/torch/_ops.py", line 207, in __getattr__
raise AttributeError(f"'_OpNamespace' object has no attribute '{op_name}'") from e
AttributeError: '_OpNamespace' object has no attribute 'jagged_2d_to_dense'
Some details on library versions (from pip freeze
):
torch==1.13.0a0+08820cb
torchrec==0.3.1
fbgemm-gpu==0.3.0
Do you have any idea what goes wrong here?
Issue Analytics
- State:
- Created 10 months ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
PyTorch | NVIDIA NGC
The PyTorch NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance.
Read more >Issues · pytorch/torchrec - GitHub
Contribute to pytorch/torchrec development by creating an account on GitHub. ... Installing TorchRec in Nvidia PyTorch 22.07 container from NGC.
Read more >Serving a Torch-TensorRT model with Triton - PyTorch
Let's first pull the NGC PyTorch Docker container. ... pip install torchvision pip install attrdict pip install nvidia-pyindex pip install tritonclient[all].
Read more >NVIDIA NGC Tutorial: Run a PyTorch Docker Container using ...
This tutorial shows you how to install Docker with GPU support on Ubuntu Linux. To get GPU passthrough to work, you'll need docker, ......
Read more >Use NVIDIA + Docker + VScode + PyTorch for Machine Learning
See how to install NVIDIA drivers, docker & nvidia docker, run gpu accelerated containers with PyTorch, develop with VSCode within the ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@janekl I was able to reproduce the issue and resolve it.
Run: pip uninstall torch pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu117
That should resolve the issue. All three (torch, fbgemm_gpu, and torchrec) should be nightly versions.
One sidenote, latest changes adding support for shuffle on criteo load was added on 11.24, which is after latest torchrec-nightly release dated 2022.11.21. Checking out the prior commit in the facebookresearch/dlrm repo will resolve that.
please install the nightly version of fbgemm-gpu if you’re using torch-nightly
I think
should work