question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError: cannot find context for 'fork'

See original GitHub issue

This example code on my system, I assume should run without error:

from data_integration.commands.bash import RunBash
from data_integration.pipelines import Pipeline, Task
from data_integration.ui.cli import run_pipeline, run_interactively

pipeline = Pipeline(
    id='demo',
    description='A small pipeline that demonstrates the interplay between pipelines, tasks and commands')

pipeline.add(Task(id='ping_localhost', description='Pings localhost',
                  commands=[RunBash('ping -c 3 localhost')]))

sub_pipeline = Pipeline(id='sub_pipeline', description='Pings a number of hosts')

for host in ['google', 'amazon', 'facebook']:
    sub_pipeline.add(Task(id=f'ping_{host}', description=f'Pings {host}',
                          commands=[RunBash(f'ping -c 3 {host}.com')]))

sub_pipeline.add_dependency('ping_amazon', 'ping_facebook')
sub_pipeline.add(Task(id='ping_foo', description='Pings foo',
                      commands=[RunBash('ping foo')]), ['ping_amazon'])

pipeline.add(sub_pipeline, ['ping_localhost'])

pipeline.add(Task(id='sleep', description='Sleeps for 2 seconds',
                  commands=[RunBash('sleep 2')]), ['sub_pipeline'])


run_pipeline(pipeline)

Here’s the output of the script:

$ python historical.py
Traceback (most recent call last):
  File "C:\Users\david\Anaconda3\lib\multiprocessing\context.py", line 190, in get_context
    ctx = _concrete_contexts[method]
KeyError: 'fork'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "historical.py", line 28, in <module>
    run_pipeline(pipeline)
  File "C:\Users\david\Anaconda3\lib\site-packages\data_integration\ui\cli.py", line 46, in run_pipeline
    for event in execution.run_pipeline(pipeline, nodes, with_upstreams, interactively_started=interactively_started):
  File "C:\Users\david\Anaconda3\lib\site-packages\data_integration\execution.py", line 48, in run_pipeline
    multiprocessing_context = multiprocessing.get_context('fork')
  File "C:\Users\david\Anaconda3\lib\multiprocessing\context.py", line 238, in get_context
    return super().get_context(method)
  File "C:\Users\david\Anaconda3\lib\multiprocessing\context.py", line 192, in get_context
    raise ValueError('cannot find context for %r' % method)
ValueError: cannot find context for 'fork'

If it matters, I’ve also provisioned a PostgreSQL instance for mara:

import mara_db.auto_migration
import mara_db.config
import mara_db.dbs

mara_db.config.databases \
    = lambda: {'mara': mara_db.dbs.PostgreSQLDB(host='localhost', user='postgres', password = '', database='etl_mara')}

mara_db.auto_migration.auto_discover_models_and_migrate()

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
martin-loetzschcommented, Apr 26, 2020

Hi @dyerrington, forking is a central part of Mara data integration. We chose it over threads because it’s more robust and (more importantly) avoids problems of memory leaks and garbage collection. Each task runs in a forked version of the main process, so whenever the task finishes, all allocated resources automatically vanish with the termination of the sub process.

The alternative would be to use a task queue and worker processes (such as in Airflow), but I think that unnecessarily increases the number of moving parts.

Forking unfortunately only works on Posix style operating systems. If you want to run Mara on Windows, please use the Windows Subsystem for Linux (https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux). I know quite a few people who successfully run Mara in that.

0reactions
martin-loetzschcommented, Apr 29, 2020

In your original comment, the Python path was C:\Users\david\Anaconda3.

The second example seems to use /home/dave/anaconda3/lib/python3.7/.

Could it be that in the original comment you didn’t use the Python that you installed in WSL but instead were in a normal Windows shell?

And if not, can you try the normal python that you get with apt-get install python?

Read more comments on GitHub >

github_iconTop Results From Across the Web

On Windows 10, ValueError: cannot find context for 'fork' #129
fork " · this fix loop in python 3.8 on MacOS · BLOSC_NOLOCK" · 1" · this is required for multiprocessing · cannot...
Read more >
Wikipedia extractor problem ValueError: cannot find context ...
Please post the complete stack trace · Done. · Yes fork context in multiprocessing is only supported in UNIX, context spawn should work...
Read more >
context.py - Google Git
'''Set list of module names to try to load in forkserver process. ... raise ValueError('cannot find context for %r' % method) from None....
Read more >
Multiprocessing Context in Python
In particular, locks created using the fork context cannot be passed ... Discover how to use the Python multiprocessing module including how ...
Read more >
Using the multiprocessing Python Module - PyOxidizer
On non-Windows, fork . Other values are known to cause issues. See the documentation above. Verify sys.frozen is set. If missing or ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found