Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enhance healthcheck to detect runtime issues with package dependencies

See original GitHub issue

Description

Dependency issue with urllib3 not caught during healthcheck.

The Flow can run successfully on a local machine, and even build/deploy to Prefect Cloud.

The problem is that when Prefect Cloud attempts to run this Flow, the Flow Run will reach an error during the scheduling. In our case, a K8 Agent picks up that the Flow should run, and spawns a Pod to run the Flow. However, immediately after the Flow Run starts, it’ll dump these errors to the Pod console (the Flow Run is stuck in a starting state, because the Pod cannot write logs back to Prefect Cloud):

[2020-07-01 13:40:32] DEBUG - prefect.CloudFlowRunner | Failed to retrieve flow state with error: AttributeError("'SSLSocket' object has no attribute 'connection'")
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud

For this specific failure, we had snowflake-connector-python==2.2.8 specified, which seems to have upgrades the urlli3 version to 1.25.9 and that appears to have an API breaking change introduced. Rolling back to snowflake-connector-python==2.2.7 mitigates this specific problem.

But this version dependency was not caught during the healthcheck.

Expected Behavior

When deploying the Flow, I would expect the healthcheck to detect any problems with the code/dependencies before deploying it. In this specific example, the Flow would run and build on a LocalEnvironment because it never had to reach out to PrefectCloud to log anything. But when deploying it through PrefectCloud, it would attempt to write the logs back but reach an exception (which was due to a dependency problem).

Reproduction

from prefect import Flow, task
from prefect.utilities.logging import get_logger
from prefect.environments import LocalEnvironment
from prefect.environments.storage import Docker
from prefect.engine.executors import LocalDaskExecutor
from prefect.engine.results import S3Result


@task
def sample():
    get_logger().info('Executing Task')


with Flow(
    name="Prefect Bug",
    storage=Docker(
        registry_url='containers.local/test_org',
        base_image='containers.local/test_org/prefect:0.12.1-python3.8',
        python_dependencies=[
            'ujson==3.0.0',
            'requests==2.23.0',
            'pandas==1.0.5',
            'numpy==1.19.0',
            'sqlalchemy==1.3.18',
            'hvac==0.10.4',
            'snowflake-connector-python==2.2.8',
            'snowflake-sqlalchemy==1.2.3',
            'pyarrow==0.17.1',
        ],
    ),
    environment=LocalEnvironment(
        executor=LocalDaskExecutor(
            scheduler='threads',
            num_workers=8
        ),
        labels=["test"]
    ),
    result=S3Result(
        bucket='prefect-flow-results',
        boto3_kwargs=dict(
            region_name='us-east-1',
            endpoint_url='https://minio.local/',
        )
    )
) as flow:
    result = sample()

if __name__ == "__main__":
    flow.register(
        project_name="test",
        build=True
    )

Environment

{
  "config_overrides": {
    "cloud": {
      "use_local_secrets": true
    },
    "context": {
      "secrets": false
    }
  },
  "env_vars": [],
  "system_information": {
    "platform": "macOS-10.15.5-x86_64-i386-64bit",
    "prefect_version": "0.12.1",
    "python_version": "3.8.1"
  }
}

Issue Analytics

State:
Created 3 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

2reactions

joshmeekcommented, Jul 10, 2020

@cicdw Going to enhance the healthcheck script in #2944 to account for this 👍

0reactions

cicdwcommented, Aug 16, 2020

Yea, we could do the same thing we do with environments, where they have an attribute specifying the additional dependencies they require that we check in the healthchecks.

Top Results From Across the Web

Health checks in ASP.NET Core | Microsoft Learn

Health checks can test an app's dependencies, such as databases and external service endpoints, to confirm availability and normal functioning.

Implement health check APIs for microservices - IBM

A health check API quickly returns the operational status of your microservice and indicates its ability to connect to dependent services.

Using App Health Checks | VMware Tanzu Docs

An app health check is a monitoring process that continually checks the status of a running app. Developers can configure a health check...

npm-check - npm Package Health Analysis - Snyk

Check for outdated, incorrect, and unused dependencies. Visit Snyk Advisor to see a full health score report for npm-check, including popularity, security, ...

HealthCheck - Amazon Elastic Container Service

An object representing a container health check. Health check parameters that are specified in a container definition override any Docker health checks that ......