question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

integration TestDynUNetWithInstanceNorm3dNVFuser

See original GitHub issue

Describe the bug logs using pytorch image 22.04 https://github.com/Project-MONAI/MONAI/runs/6306011370?check_suite_focus=true

(| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 )

ERROR: test_consistency_0 (tests.test_dynunet.TestDynUNetWithInstanceNorm3dNVFuser)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/parameterized/parameterized.py", line 533, in standalone_func
test_compute_0 (tests.test_prepare_batch_default_dist.DistributedPrepareBatchDefault) (10.5s)
test_compute_1 (tests.test_prepare_batch_default_dist.DistributedPrepareBatchDefault) (10.6s)
test_verify_0__tmp_tmp_HuvzKkaSgJ_tests_testing_data_metadata_json (tests.test_bundle_verify_net.TestVerifyNetwork) (11.2s)
    return func(*(a + p.args), **p.kwargs)
  File "/tmp/tmp.HuvzKkaSgJ/tests/test_dynunet.py", line 146, in test_consistency
    result_fuser = net_fuser(input_tensor)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/monai/networks/nets/dynunet.py", line 268, in forward
    out = self.skip_layers(x)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/monai/networks/nets/dynunet.py", line 46, in forward
    downout = self.downsample(x)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/monai/networks/blocks/dynunet_block.py", line 80, in forward
    out = self.norm1(out)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/apex/normalization/instance_norm.py", line 138, in forward
    out = InstanceNormNVFuserFunction.apply(
  File "/opt/conda/lib/python3.8/site-packages/apex/normalization/instance_norm.py", line 16, in forward
    instance_norm_nvfuser_cuda = importlib.import_module("instance_norm_nvfuser_cuda")
  File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /opt/conda/lib/python3.8/site-packages/instance_norm_nvfuser_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZTVN5torch3jit5fuser4cuda3kir6KernelE

----------------------------------------------------------------------

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
wylicommented, May 11, 2022

Updates: I checked more about this issue, and with Peixin’s help, we found that it may not the cuda/driver/torch version/docker issue, same environment is used for the cron-docker test (https://github.com/Project-MONAI/MONAI/runs/6306012243?check_suite_focus=true) and the docker image is based on 22.04. So far, the issue is avoided by adding more checks, but I’m not sure if there is any other issue within the env building part of cron-pip.

thanks for looking into this, now it’s clear that reinstalling pytorch/torchvision in the docker breaks the apex customized module https://github.com/Project-MONAI/MONAI/blob/557c12940e3e31be3f01f8031afcd6c027a8870d/.github/workflows/cron.yml#L134 https://github.com/Project-MONAI/MONAI/blob/557c12940e3e31be3f01f8031afcd6c027a8870d/.github/workflows/cron.yml#L149 and Yiheng’s PR makes the call robust in this case https://github.com/Project-MONAI/MONAI/pull/4241

2reactions
yiheng-wang-nvcommented, May 11, 2022

Hi @Nic-Ma and @wyli , my initial thought is that the issue is due to the driver version, with driver version 510.47.03, docker 22.04 does not raise the same issue. For the ci/cd machine, is it possible to test with different driver version?

Updates: I checked more about this issue, and with Peixin’s help, we found that it may not the cuda/driver/torch version/docker issue, same environment is used for the cron-docker test (https://github.com/Project-MONAI/MONAI/runs/6306012243?check_suite_focus=true) and the docker image is based on 22.04. So far, the issue is avoided by adding more checks, but I’m not sure if there is any other issue within the env building part of cron-pip.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Integration Testing: What is, Types with Example - Guru99
Integration Testing is defined as a type of testing where software modules are integrated logically and tested as a group.
Read more >
Test Integration Instances - Oracle Help Center
Oracle Integration provides the Oracle Asserter testing framework for recording tests of integration instances and replaying them to reproduce potential issues.
Read more >
ISBN not valid with isbnid - Zenodo/Zenodo - IssueHint
integration TestDynUNetWithInstanceNorm3dNVFuser, 6, 2022-05-05 ; Name of the General project doesn't change in the bottom toolbar, 1, 2021-11-22 ; Remove default ...
Read more >
Can't install sqlite3 on Node 16 - IssueHint
Issue Title Created Date Comment Count Updated Date Missing static files 2 2021‑04‑28 2022‑09‑25 How can i search jdatetiime? 5 2021‑03‑01 2022‑09‑25 دریافت از نوع datetime...
Read more >
Integration Testing in Spring - Baeldung
In this tutorial, we'll learn how to leverage the Spring MVC test framework in order to write and run integration tests that test...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found