question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parallelization results in signal(11): Segmentation fault

See original GitHub issue

I am suffering from a segmentation fault 11 error when multithreading using joblib.Parallel to run a small Julia function.

I import a function in Python, modeled after this attempted solution

from julia.api import Julia
jl = Julia(compiled_modules=False)
from julia import Main
Main.include("fastsum.jl")
from julia.Main import greenfunction

where the function itself is

using Tullio, LoopVectorization
function greenfunction(mu, wns, sigwns, energy, dosnorm)
    new_array = Array{ComplexF64}(undef, length(wns))
    @tullio threads=false new_array[i] = 1 / (mu + wns[i] * 1im - energy[j] - sigwns[i]) * dosnorm[j]
    dosnorm = Nothing
    energy  = Nothing
    sigwns  = Nothing
    wns     = Nothing
    
    return new_array
end

Clearing some of the variables seems to make the number of processes I can run increase from something around 10 to 80, but eventually it still segfaults, with error as

signal (15): Terminated
in expression starting at none:0
mul_fast at ./fastmath.jl:167 [inlined]
mul_fast at ./fastmath.jl:219 [inlined]
...
... ( a lot of PyCall directories)
...
unknown function (ip: (nil))
Allocations: 76861098 (Pool: 76845028; Big: 16070); GC: 116

signal (11): Segmentation fault
in expression starting at none:0
...
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The more detailed errors are here

Is there a way to clear the Julia cache as this seems to be the problem? Or import Julia less frequently somehow? I call that import function several hundred times.

I also attempted

from julia import Main
greenfunction = Main.eval("""
using Tullio, LoopVectorization
function greenfunction(mu, wns, sigwns, energy, dosnorm)
    new_array = Array{ComplexF64}(undef, length(wns))
    @tullio threads=false new_array[i] = 1 / (mu + wns[i] * 1im - energy[j] - sigwns[i]) * dosnorm[j]
    dosnorm = Nothing
    energy  = Nothing
    sigwns  = Nothing
    wns     = Nothing
    
    return new_array
end
""")

but that gives the same error.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
fchorneycommented, Sep 22, 2021

I am having similar issues. I am trying to use pyjulia in a flask web server but it just keeps segfaulting as I assume the webserver is multithreaded.

0reactions
tkfcommented, Dec 15, 2021

What subprocess.Popen uses is not very clear from the documentation. Somebody has to dig into the internal of subprocess.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Segmentation fault 11: C with MPI - Stack Overflow
The program hangs if you run it on more than 2 processes because the sends and receives are hard-coded to only communicate between...
Read more >
Thread: [BUGS] signal 11 segfaults with parallel workers
Core was generated by `postgres: bgworker: parallel worker f'. Program terminated with signal SIGSEGV, Segmentation fault. #0 MemoryContextAlloc ...
Read more >
mpirun problems: exited on signal 11 (segmentation fault)
I installed OpenFOAM-1.6.x and something strange happened. If I launch a parallel running: Code: foamJob -p -s simpleFoam I obtain Code: ...
Read more >
Segfault testing parallel HDF5 with Intel MPI
I get a segmentation fault when running the test suite for parallel HDF5 (1.10.8) when compiling with oneAPI 2021 update 4.
Read more >
signal 11 segfaults with parallel workers - PostgreSQL
almost daily, with a signal 11 seg fault on a query as the triggering event: 2017-07-11 23:00:29.984 UTC LOG: worker process: parallel ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found