Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError: cannot find context for 'fork'

See original GitHub issue

This example code on my system, I assume should run without error:

from data_integration.commands.bash import RunBash
from data_integration.pipelines import Pipeline, Task
from data_integration.ui.cli import run_pipeline, run_interactively

pipeline = Pipeline(
    id='demo',
    description='A small pipeline that demonstrates the interplay between pipelines, tasks and commands')

pipeline.add(Task(id='ping_localhost', description='Pings localhost',
                  commands=[RunBash('ping -c 3 localhost')]))

sub_pipeline = Pipeline(id='sub_pipeline', description='Pings a number of hosts')

for host in ['google', 'amazon', 'facebook']:
    sub_pipeline.add(Task(id=f'ping_{host}', description=f'Pings {host}',
                          commands=[RunBash(f'ping -c 3 {host}.com')]))

sub_pipeline.add_dependency('ping_amazon', 'ping_facebook')
sub_pipeline.add(Task(id='ping_foo', description='Pings foo',
                      commands=[RunBash('ping foo')]), ['ping_amazon'])

pipeline.add(sub_pipeline, ['ping_localhost'])

pipeline.add(Task(id='sleep', description='Sleeps for 2 seconds',
                  commands=[RunBash('sleep 2')]), ['sub_pipeline'])


run_pipeline(pipeline)

Here’s the output of the script:

$ python historical.py
Traceback (most recent call last):
  File "C:\Users\david\Anaconda3\lib\multiprocessing\context.py", line 190, in get_context
    ctx = _concrete_contexts[method]
KeyError: 'fork'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "historical.py", line 28, in <module>
    run_pipeline(pipeline)
  File "C:\Users\david\Anaconda3\lib\site-packages\data_integration\ui\cli.py", line 46, in run_pipeline
    for event in execution.run_pipeline(pipeline, nodes, with_upstreams, interactively_started=interactively_started):
  File "C:\Users\david\Anaconda3\lib\site-packages\data_integration\execution.py", line 48, in run_pipeline
    multiprocessing_context = multiprocessing.get_context('fork')
  File "C:\Users\david\Anaconda3\lib\multiprocessing\context.py", line 238, in get_context
    return super().get_context(method)
  File "C:\Users\david\Anaconda3\lib\multiprocessing\context.py", line 192, in get_context
    raise ValueError('cannot find context for %r' % method)
ValueError: cannot find context for 'fork'

If it matters, I’ve also provisioned a PostgreSQL instance for mara:

import mara_db.auto_migration
import mara_db.config
import mara_db.dbs

mara_db.config.databases \
    = lambda: {'mara': mara_db.dbs.PostgreSQLDB(host='localhost', user='postgres', password = '', database='etl_mara')}

mara_db.auto_migration.auto_discover_models_and_migrate()

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

martin-loetzschcommented, Apr 26, 2020

Hi @dyerrington, forking is a central part of Mara data integration. We chose it over threads because it’s more robust and (more importantly) avoids problems of memory leaks and garbage collection. Each task runs in a forked version of the main process, so whenever the task finishes, all allocated resources automatically vanish with the termination of the sub process.

The alternative would be to use a task queue and worker processes (such as in Airflow), but I think that unnecessarily increases the number of moving parts.

Forking unfortunately only works on Posix style operating systems. If you want to run Mara on Windows, please use the Windows Subsystem for Linux (https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux). I know quite a few people who successfully run Mara in that.

0reactions

martin-loetzschcommented, Apr 29, 2020

In your original comment, the Python path was C:\Users\david\Anaconda3.

The second example seems to use /home/dave/anaconda3/lib/python3.7/.

Could it be that in the original comment you didn’t use the Python that you installed in WSL but instead were in a normal Windows shell?

And if not, can you try the normal python that you get with apt-get install python?