question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enhance healthcheck to detect runtime issues with package dependencies

See original GitHub issue

Description

Dependency issue with urllib3 not caught during healthcheck.

The Flow can run successfully on a local machine, and even build/deploy to Prefect Cloud.

The problem is that when Prefect Cloud attempts to run this Flow, the Flow Run will reach an error during the scheduling. In our case, a K8 Agent picks up that the Flow should run, and spawns a Pod to run the Flow. However, immediately after the Flow Run starts, it’ll dump these errors to the Pod console (the Flow Run is stuck in a starting state, because the Pod cannot write logs back to Prefect Cloud):

[2020-07-01 13:40:32] DEBUG - prefect.CloudFlowRunner | Failed to retrieve flow state with error: AttributeError("'SSLSocket' object has no attribute 'connection'")
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud

For this specific failure, we had snowflake-connector-python==2.2.8 specified, which seems to have upgrades the urlli3 version to 1.25.9 and that appears to have an API breaking change introduced. Rolling back to snowflake-connector-python==2.2.7 mitigates this specific problem.

But this version dependency was not caught during the healthcheck.

Expected Behavior

When deploying the Flow, I would expect the healthcheck to detect any problems with the code/dependencies before deploying it. In this specific example, the Flow would run and build on a LocalEnvironment because it never had to reach out to PrefectCloud to log anything. But when deploying it through PrefectCloud, it would attempt to write the logs back but reach an exception (which was due to a dependency problem).

Reproduction

from prefect import Flow, task
from prefect.utilities.logging import get_logger
from prefect.environments import LocalEnvironment
from prefect.environments.storage import Docker
from prefect.engine.executors import LocalDaskExecutor
from prefect.engine.results import S3Result


@task
def sample():
    get_logger().info('Executing Task')


with Flow(
    name="Prefect Bug",
    storage=Docker(
        registry_url='containers.local/test_org',
        base_image='containers.local/test_org/prefect:0.12.1-python3.8',
        python_dependencies=[
            'ujson==3.0.0',
            'requests==2.23.0',
            'pandas==1.0.5',
            'numpy==1.19.0',
            'sqlalchemy==1.3.18',
            'hvac==0.10.4',
            'snowflake-connector-python==2.2.8',
            'snowflake-sqlalchemy==1.2.3',
            'pyarrow==0.17.1',
        ],
    ),
    environment=LocalEnvironment(
        executor=LocalDaskExecutor(
            scheduler='threads',
            num_workers=8
        ),
        labels=["test"]
    ),
    result=S3Result(
        bucket='prefect-flow-results',
        boto3_kwargs=dict(
            region_name='us-east-1',
            endpoint_url='https://minio.local/',
        )
    )
) as flow:
    result = sample()

if __name__ == "__main__":
    flow.register(
        project_name="test",
        build=True
    )

Environment

{
  "config_overrides": {
    "cloud": {
      "use_local_secrets": true
    },
    "context": {
      "secrets": false
    }
  },
  "env_vars": [],
  "system_information": {
    "platform": "macOS-10.15.5-x86_64-i386-64bit",
    "prefect_version": "0.12.1",
    "python_version": "3.8.1"
  }
}

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
joshmeekcommented, Jul 10, 2020

@cicdw Going to enhance the healthcheck script in #2944 to account for this 👍

0reactions
cicdwcommented, Aug 16, 2020

Yea, we could do the same thing we do with environments, where they have an attribute specifying the additional dependencies they require that we check in the healthchecks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Health checks in ASP.NET Core | Microsoft Learn
Health checks can test an app's dependencies, such as databases and external service endpoints, to confirm availability and normal functioning.
Read more >
Implement health check APIs for microservices - IBM
A health check API quickly returns the operational status of your microservice and indicates its ability to connect to dependent services.
Read more >
Using App Health Checks | VMware Tanzu Docs
An app health check is a monitoring process that continually checks the status of a running app. Developers can configure a health check...
Read more >
npm-check - npm Package Health Analysis - Snyk
Check for outdated, incorrect, and unused dependencies. Visit Snyk Advisor to see a full health score report for npm-check, including popularity, security, ...
Read more >
HealthCheck - Amazon Elastic Container Service
An object representing a container health check. Health check parameters that are specified in a container definition override any Docker health checks that ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found