question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Script does not finish with dask distributed

See original GitHub issue

I’m using a pipeline that reads text from file via Apache Tika, performs some pre-processing and writes it into a MongoDB. The following is a truncated version of my script.

if __name__ == "__main__":
    mongo_client = MongoClient("mongodb://localhost:27017/")
    dask_client = dask.distributed.Client()
    file_stream_source = Stream()

    file_stream = (
        file_stream_source.scatter()
        .map(add_filesize)
        .map(add_text)
        .map(add_text_lengths)
        .buffer(16)
        .gather()
    )

    file_stream.sink(write_file)

    # file_stream_source emit loop

Everything works well, but the last few documents are missing. It seems like the dask process is killed before the task has finished. The resulting warnings/errors below support this. Is this behavior expected and I’m using the interface wrong or is this a bug?

Update: This does not happen when used in a jupyter notebook. Could this be related to the event loop?

distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-2, started daemon)>
distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-1, started daemon)>
distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-3, started daemon)>
distributed.process - WARNING - reaping stray process <ForkServerProcess(ForkServerProcess-4, started daemon)>
distributed.nanny - WARNING - Worker process 15143 was killed by signal 15
distributed.nanny - WARNING - Worker process 15141 was killed by signal 15
Traceback (most recent call last):
  File "/home/dario/anaconda3/lib/python3.7/multiprocessing/queues.py", line 242, in _feed
    send_bytes(obj)
  File "/home/dario/anaconda3/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/dario/anaconda3/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/dario/anaconda3/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/dario/anaconda3/lib/python3.7/multiprocessing/queues.py", line 242, in _feed
    send_bytes(obj)
  File "/home/dario/anaconda3/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/dario/anaconda3/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/dario/anaconda3/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
distributed.nanny - WARNING - Worker process 15139 was killed by signal 15
distributed.nanny - WARNING - Worker process 15145 was killed by signal 15

relevant package versions

streamz                   0.5.1                      py_0    conda-forge
dask                      1.2.2                      py_0  
dask-core                 1.2.2                      py_0  
tornado                   6.0.2            py37h7b6447c_0 

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Oct 9, 2019

What I mean is, your code doesn’t invoke distributed. But I now understand that you were providing a solution, not a new issue 😃

You should be able to achieve something similar with event loops, but your way may be simpler when none of the source nodes need an event loop anyway (but distributed always has one!). There may perhaps be a way say “run until done” on a source (i.e., stop when all of the events have been processed), which in the case with no timing of backpressure would be immediately.

0reactions
CJ-Wrightcommented, Oct 17, 2019

In the simpler case can the thread be joined? Does that thread respect backpressure?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dask distributed does not run python script - Stack Overflow
I am building a simple example to understand how dask distributed can distribute python scripts on a HPC cluster. The method ...
Read more >
Large tracebacks when starting `LocalCluster` directly in script
$ python debug.py RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase....
Read more >
API — Dask.distributed 2022.12.1 documentation
This performs a tree copy of the data throughout the network individually on each piece of data. This operation blocks until complete. It...
Read more >
scheduling.rst.txt - Dask documentation
It is simple and cheap to use, although it can only be used on a single machine and does not scale 2. **Distributed...
Read more >
Dask.distributed — Dask.distributed 2022.12.1 documentation
A small computation and network roundtrip can complete in less than 10ms. ... complex workflows (not just map/filter/reduce) which are necessary for ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found