Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Investigate reports of failed mapped tasks returning None to downstream tasks

See original GitHub issue

Description

I’ve heard from a contributor that an unstable mapping behavior occurs. The way I heard it was:

in a mapped pipeline
a dask worker unexpectedly dies
the downstream task unexpectedly runs and receives ‘None’ as input, causing a runtime error because of the weird input

I also found reports of this in our slack history (archived here: https://github.com/PrefectHQ/prefect/issues/2655) that implied a link to specific deployment environments and for high volume mapped pipelines.

Note: is this possibly related to https://github.com/PrefectHQ/prefect/issues/2430?

Expected Behavior

What did you expect to happen instead? The upstream mapped task is Failed, and the downstream mapped task does not run.

Reproduction

A minimal example that exhibits the behavior. I have not observed it myself yet, but based on the slack thread it seems a high volume mapping task on an unstable network using DaskKubernetesEnvironment is the best way to reproduce.

Environment

Any additional information about your environment

Optionally run prefect diagnostics from the command line and paste the information here

Issue Analytics

State:
Created 3 years ago
Comments:11 (9 by maintainers)

Top GitHub Comments

2reactions

jcristcommented, Jun 16, 2020

Without a reproducible example, I’m not sure how to progress on this, especially since it may have been resolved by the mapping refactor. +0.5 on closing if others are ok with it, since we don’t have an immediate action plan or reproducer.

1reaction

cicdwcommented, Jul 10, 2020

Good news everyone! I have a reproducible example of this behavior. @jcrist it’s for your favorite part of the codebase - results! It’s specific to the following situation:

zombies occur mid-way through a mapped pipeline on tasks that have retries
there is a reduce task immediately after the zombie-level

It appears that all data that was produced by the successfully mapped children prior to the zombie-death is not properly rehydrated on the other end whenever the process is resurrected for a retry.

Here’s the flow I used locally to test:

import prefect
from prefect import task, Flow

from datetime import timedelta
import os
import time
import sys


@task
def return_list():
    prefect.context['logger'].debug(f'PID: {os.getpid()}')
    return list(range(10))


@task(max_retries=2, retry_delay=timedelta(seconds=0))
def map_task(x):
    if x == 5:
        prefect.context['logger'].critical('Waiting: do it! do it!')
        time.sleep(20)
    return x


@task
def reducer(ll):
    msg = '\n'.join("{i}: {v}".format(i=i, v=v) for i, v in enumerate(ll))
    prefect.context['logger'].debug(msg)


with Flow("zombie") as flow:
    reducer(map_task.map(return_list))

Whenever I saw the waiting log I killed both the flow runner process as well as the heartbeat process for the task. After waiting for Cloud to do its thing, I then saw:

[2020-07-10 03:26:31] 807-- DEBUG - prefect.CloudTaskRunner | Task 'reducer': Calling task.run() method...
[2020-07-10 03:26:31] 29-- DEBUG - prefect.reducer | 0: None
1: None
2: None
3: None
4: None
5: None
6: None
7: 7
8: 8
9: 9

It appears that our load_results logic doesn’t quite work whenever the immediate upstream was a mapped task. I can resolve tomorrow 👍

Top Results From Across the Web

How to inspect mapped tasks' inputs from reduce tasks in Prefect

Mapping over a list preserves the order. ... Just pass the list of inputs again and then we can match the inputs with...

2 Server Error Message Reference - MySQL :: Developer Zone

InnoDB reports this error when a table cannot be created. If the error message refers to error 150, table creation failed because a...

Schedule Flow Tasks - Tableau Help

Easily set up your flow list by selecting your schedule, then select downstream flows to run in the order you choose. In Tableau...

CHAPTER 6 EMERGENCY ACTION PLANS

the Non-Failure and High Flow Conditions (see Section 6-3.2.2-2. ... The tasks and responsibilities of the licensee and the emergency management authorities.

ASSESSMENT OF HIGHER EDUCATION LEARNING ... - OECD

items take the abbreviation of a constructed-response task, or CRT. ... result of lower returns to non-degree higher education compared to full degrees, ......