Problems when running model deployment via a custom component
See original GitHub issueThanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Please run down the following list and make sure you’ve tried the usual “quick fixes”:
- Search the issues already opened: https://github.com/googleapis/python-aiplatform/issues
- Search StackOverflow: https://stackoverflow.com/questions/tagged/google-cloud-platform+python
If you are still having issues, please be sure to include as much information as possible:
Environment details
- OS type and version: Vertex AI Notebooks
- Python version: 3.8.2
- pip version: 2.1.1
- `google-cloud-aiplatform 1.1.1
Steps to reproduce
Here’s the notebook: https://colab.research.google.com/drive/18C6nct6m3puwm-PDDAfnljsp2o1DNvq6?usp=sharing.
I am trying to deploy a Vertex AI model to an Endpoint via a custom TFX component. The component looks like so (please refer to the above-mentioned notebook for the full snippet):
@component
def VertexDeployer(
project: Parameter[str],
region: Parameter[str],
model_display_name: Parameter[str],
deployed_model_display_name: Parameter[str]
):
logging.info(f"Endpoint display: {deployed_model_display_name}")
vertex_ai.init(project=project, location=region)
endpoints = vertex_ai.Endpoint.list(
filter=f'display_name={deployed_model_display_name}',
order_by="update_time")
if len(endpoints) > 0:
logging.info(f"Endpoint {deployed_model_display_name} already exists.")
endpoint = endpoints[-1]
else:
endpoint = vertex_ai.Endpoint.create(deployed_model_display_name)
model = vertex_ai.Model.list(
filter=f'display_name={model_display_name}',
order_by="update_time"
)[-1]
endpoint = vertex_ai.Endpoint.list(
filter=f'display_name={deployed_model_display_name}',
order_by="update_time"
)[-1]
deployed_model = endpoint.deploy(
model=model,
# Syntax from here: https://git.io/JBQDP
traffic_split={"0": 100},
machine_type="n1-standard-4",
min_replica_count=1,
max_replica_count=1
)
logging.info(f"Model deployed to: {deployed_model}")
As per the logs, things start fine:
2021-08-02 13:30:58.622 IST
workerpool0-0
INFO:google.cloud.aiplatform.models:Creating Endpoint
2021-08-02 13:30:58.622 IST
workerpool0-0
I0802 08:00:58.622227 139871630878528 base.py:74] Creating Endpoint
2021-08-02 13:30:58.622 IST
workerpool0-0
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/29880397572/locations/us-central1/endpoints/3702996832675168256/operations/7134428861818732544
2021-08-02 13:30:58.623 IST
workerpool0-0
I0802 08:00:58.622441 139871630878528 base.py:78] Create Endpoint backing LRO: projects/29880397572/locations/us-central1/endpoints/3702996832675168256/operations/7134428861818732544
2021-08-02 13:31:00.683 IST
workerpool0-0
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/29880397572/locations/us-central1/endpoints/3702996832675168256
2021-08-02 13:31:00.683 IST
workerpool0-0
I0802 08:01:00.682116 139871630878528 base.py:98] Endpoint created. Resource name: projects/29880397572/locations/us-central1/endpoints/3702996832675168256
2021-08-02 13:31:00.683 IST
workerpool0-0
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
2021-08-02 13:31:00.683 IST
workerpool0-0
I0802 08:01:00.682311 139871630878528 base.py:99] To use this Endpoint in another session:
2021-08-02 13:31:00.683 IST
workerpool0-0
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/29880397572/locations/us-central1/endpoints/3702996832675168256')
2021-08-02 13:31:00.683 IST
workerpool0-0
I0802 08:01:00.682386 139871630878528 base.py:101] endpoint = aiplatform.Endpoint('projects/29880397572/locations/us-central1/endpoints/3702996832675168256')
2021-08-02 13:31:01.048 IST
workerpool0-0
INFO:google.cloud.aiplatform.models:Deploying Model projects/29880397572/locations/us-central1/models/4554203550527258624 to Endpoint : projects/29880397572/locations/us-central1/endpoints/3702996832675168256
2021-08-02 13:31:01.048 IST
workerpool0-0
I0802 08:01:01.048401 139871630878528 base.py:139] Deploying Model projects/29880397572/locations/us-central1/models/4554203550527258624 to Endpoint : projects/29880397572/locations/us-central1/endpoints/3702996832675168256
2021-08-02 13:31:01.157 IST
But then out of the blue, it gets to:
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/29880397572/locations/us-central1/endpoints/3702996832675168256/operations/7217745454925086720
2021-08-02 14:13:41.232 IST
workerpool0-0
I0802 08:43:41.232497 140248418977600 base.py:159] Deploy Endpoint model backing LRO: projects/29880397572/locations/us-central1/endpoints/3702996832675168256/operations/7217745454925086720
2021-08-02 14:14:01.059 IST
service
The replica workerpool0-0 exited with a non-zero status of 1. Termination reason: Error. To find out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=29880397572&resource=ml_job%2Fjob_id%2F8657831634038423552&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%228657831634038423552%22
I have even tried to deploy it separately (code included in the notebook):
vertex_ai.init(project=GOOGLE_CLOUD_PROJECT,
location=GOOGLE_CLOUD_REGION,
staging_bucket="gs://" + GCS_BUCKET_NAME)
model_display_name = "densenet_flowers"
deployed_model_display_name = model_display_name + "_" + TIMESTAMP
endpoints = vertex_ai.Endpoint.list(
filter=f'display_name={deployed_model_display_name}',
order_by="update_time"
)
if len(endpoints) > 0:
print(f"Endpoint {deployed_model_display_name} already exists.")
endpoint = endpoints[-1]
else:
endpoint = vertex_ai.Endpoint.create(deployed_model_display_name)
model = vertex_ai.Model.list(
filter=f'display_name={model_display_name}',
order_by="update_time"
)[-1]
endpoint = vertex_ai.Endpoint.list(
filter=f'display_name={deployed_model_display_name}',
order_by="update_time"
)[-1]
deployed_model = endpoint.deploy(
model=model,
# Syntax from here: https://git.io/JBQDP
traffic_split={"0": 100},
machine_type="n1-standard-4",
min_replica_count=1,
max_replica_count=1,
)
It then leads to:
2021-08-02 13:44:07.857 IST
2021/08/02 08:14:07 No id provided.
Expand all | Collapse all{
insertId: "1vfgsqqg11n9wnl"
jsonPayload: {
levelname: "ERROR"
message: "2021/08/02 08:14:07 No id provided.
"
}
labels: {
compute.googleapis.com/resource_id: "4165408107493202181"
compute.googleapis.com/resource_name: "fluentd-caip-jrshv"
compute.googleapis.com/zone: "us-central1-a"
}
logName: "projects/fast-ai-exploration/logs/aiplatform.googleapis.com%2Fprediction_container"
receiveTimestamp: "2021-08-02T08:14:32.337946313Z"
resource: {
labels: {
endpoint_id: "3702996832675168256"
location: "us-central1"
resource_container: "projects/29880397572"
}
type: "aiplatform.googleapis.com/Endpoint"
}
severity: "ERROR"
timestamp: "2021-08-02T08:14:07.857787819Z"
}
I have tried all of this from a Vertex AI Notebook as well and the issue still persists.
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (4 by maintainers)
I was able to complete the deployment using the standalone APIs by changing the serving image:
But now, when I am trying to make prediction requests I bump into:
Here’s the Colab Notebook for reproducibility. Please note that the GCS Bucket has public read access. But this question still stands:
@andrewferlitsch sure but I cannot host the model forever on GCS because my resources are limited. Could you help me create one?