question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Got Segmentation fault when running amc_search.py

See original GitHub issue

Hi,

I have segmentation fault error when running amc_search

after debugging with faulthandler. It seems that the error is caused by scipy

Traceback (most recent call last):
    File "tools/amc_search.py", line 187, in <module>
    pruner.compress()
    File "/home/shared/nfs/acer-share/bushido/third_party/nni/nni/algorithms/compression/pytorch/pruning/amc/amc_pruner.py", line 210, in compress
    self.train(self.ddpg_args.train_episode, self.agent, self.env, self.output_dir)
    File "/home/shared/nfs/acer-share/bushido/third_party/nni/nni/algorithms/compression/pytorch/pruning/amc/amc_pruner.py", line 229, in train
    action  =  agent.select_action(observation, episode = episode)
    File "/home/shared/nfs/acer-share/bushido/third_party/nni/nni/algorithms/compression/pytorch/pruning/amc/lib/agent.py", line 186, in select_action
    action  =  self.sample_from_truncated_normal_distribution(lower = self.lbound, upper = self.rbound, mu = action, sigma = delta)
    File "/home/shared/nfs/acer-share/bushido/third_party/nni/nni/algorithms/compression/pytorch/pruning/amc/lib/agent.py", line 230, in sample_from_truncated_normal_distribution
    return stats.truncnorm.rvs((lower-mu)/sigma, (upper-mu)/sigma, loc = mu, scale = sigma, size = size)
    File "/home/acer/.pyenv/versions/pytorch/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 966, in rvs
    raise ValueError("Domain error in arguments.")
    ValueError: Domain error in arguments.

my scipy versio is 1.4.1 and nni version is v2.3.

any idea?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
twmhtcommented, Aug 20, 2021

@linbinskn

I switch to torch1.7.0 from torch.1.8.0 and the error is gone. there might be some problems between torch1.8.0 and nni v2.3.

0reactions
linbinskncommented, Aug 17, 2021

Exploding gradients will lead to NaN value.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What causes a Python segmentation fault? - Stack Overflow
The program runs great on small data set, but when I run it on a super-large graph (more than 800,000 nodes), it says...
Read more >
Identify what's causing segmentation faults (segfaults)
A segmentation fault (aka segfault) is a common condition that causes programs to crash; they are often associated with a file named core...
Read more >
Segmentation Fault (SIGSEGV) in middle of Training due to ...
Program received signal SIGSEGV, Segmentation fault. ... Then I run train_multi_gpu.py provided above and the following output is generated ...
Read more >
Determining Root Cause of Segmentation Faults SIGSEGV or ...
Problem : When I run my code compiled with the Intel® Fortran Compiler I get SIGSEGV on Linux* (or SIGBUS on MacOS*). This...
Read more >
1000152 – Python get segmentation fault, core dumped ...
I don't see how this could be Python's error. The message AL lib: (EE) alc_cleanup: 1 device not closed is caused by not...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found