Enhance healthcheck to detect runtime issues with package dependencies
See original GitHub issueDescription
Dependency issue with urllib3 not caught during healthcheck.
The Flow can run successfully on a local machine, and even build/deploy to Prefect Cloud.
The problem is that when Prefect Cloud attempts to run this Flow, the Flow Run will reach an error during the scheduling. In our case, a K8 Agent picks up that the Flow should run, and spawns a Pod to run the Flow. However, immediately after the Flow Run starts, it’ll dump these errors to the Pod console (the Flow Run is stuck in a starting state, because the Pod cannot write logs back to Prefect Cloud):
[2020-07-01 13:40:32] DEBUG - prefect.CloudFlowRunner | Failed to retrieve flow state with error: AttributeError("'SSLSocket' object has no attribute 'connection'")
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Failed to write log with error: 'SSLSocket' object has no attribute 'connection'
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
[2020-07-01 13:40:32] CRITICAL - CloudHandler | Unable to write logs to Prefect Cloud
For this specific failure, we had snowflake-connector-python==2.2.8
specified, which seems to have upgrades the urlli3
version to 1.25.9
and that appears to have an API breaking change introduced. Rolling back to snowflake-connector-python==2.2.7
mitigates this specific problem.
But this version dependency was not caught during the healthcheck.
Expected Behavior
When deploying the Flow, I would expect the healthcheck to detect any problems with the code/dependencies before deploying it. In this specific example, the Flow would run and build on a LocalEnvironment because it never had to reach out to PrefectCloud to log anything. But when deploying it through PrefectCloud, it would attempt to write the logs back but reach an exception (which was due to a dependency problem).
Reproduction
from prefect import Flow, task
from prefect.utilities.logging import get_logger
from prefect.environments import LocalEnvironment
from prefect.environments.storage import Docker
from prefect.engine.executors import LocalDaskExecutor
from prefect.engine.results import S3Result
@task
def sample():
get_logger().info('Executing Task')
with Flow(
name="Prefect Bug",
storage=Docker(
registry_url='containers.local/test_org',
base_image='containers.local/test_org/prefect:0.12.1-python3.8',
python_dependencies=[
'ujson==3.0.0',
'requests==2.23.0',
'pandas==1.0.5',
'numpy==1.19.0',
'sqlalchemy==1.3.18',
'hvac==0.10.4',
'snowflake-connector-python==2.2.8',
'snowflake-sqlalchemy==1.2.3',
'pyarrow==0.17.1',
],
),
environment=LocalEnvironment(
executor=LocalDaskExecutor(
scheduler='threads',
num_workers=8
),
labels=["test"]
),
result=S3Result(
bucket='prefect-flow-results',
boto3_kwargs=dict(
region_name='us-east-1',
endpoint_url='https://minio.local/',
)
)
) as flow:
result = sample()
if __name__ == "__main__":
flow.register(
project_name="test",
build=True
)
Environment
{
"config_overrides": {
"cloud": {
"use_local_secrets": true
},
"context": {
"secrets": false
}
},
"env_vars": [],
"system_information": {
"platform": "macOS-10.15.5-x86_64-i386-64bit",
"prefect_version": "0.12.1",
"python_version": "3.8.1"
}
}
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
@cicdw Going to enhance the healthcheck script in #2944 to account for this 👍
Yea, we could do the same thing we do with environments, where they have an attribute specifying the additional dependencies they require that we check in the healthchecks.