question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[1.11.0 Release] `serve_failure` failing on master and 1.11.0 branches

See original GitHub issue

E.g. 1.11.0: https://buildkite.com/ray-project/periodic-ci/builds/2524#5e259950-8965-4f40-a39a-bb0f2fbf2c86/138-427 master: https://buildkite.com/ray-project/periodic-ci/builds/2503#69d841c1-1c30-4ea4-8a2e-c586ffe142c8/139-429

I was able to reproduce the issue by starting the test from my laptop. The cluster can be kept open after the failure with --no-terminate. e.g. get aws creds, then run

ANYSCALE_HOST=https://console.anyscale.com ANYSCALE_CLI_TOKEN=... ANYSCALE_CLOUD_ID=... ANYSCALE_PROJECT=... RAY_WHEELS=... RAY_TEST_REPO=https://github.com/ray-project/ray.git RAY_TEST_BRANCH=... python ray/release/e2e.py --category=... --test-config ray/release/long_running_tests/long_running_tests.yaml --test-name serve_failure --keep-results-dir --smoke-test --no-terminate

The failures to reach /api/version in the tests are due to dashboard being terminated. I checked the cluster, and gcs was also terminated. It looks like the process terminations are not due to any crash. However, because dashboard was terminated, the test cannot submit job and finish successfully.

Assigning to @simon-mo as serve oncall.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
mwtiancommented, Jan 27, 2022

Since the issue has been identified to be related to test setup, lowering to P1.

0reactions
mwtiancommented, Jan 27, 2022

Sounds good Jiao.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Changelog - Private Packagist Self-Hosted
Creating a new Bitbucket Server integration resulted in a server error. This has been resolved. Supported Replicated versions: >=2.53.2 <2.54.0. 1.11.0. March ...
Read more >
Zephyr Kernel 1.11.0 — Zephyr Project Documentation
We are pleased to announce the release of Zephyr kernel version 1.11.0. ... Identify daily-built master-branch docs as “Latest” version.
Read more >
GKE release notes archive | Google Kubernetes Engine (GKE)
This page contains a historical archive of all release notes for Google Kubernetes Engine prior to 2020. To view more recent release notes,...
Read more >
title: Changelog — Auspice documentation
Invalid datasets will show an error notification and fallback to the dataset defined by the frontmatter of the narrative. (Bugfix) Zooming in the...
Read more >
Amazon EMR release 6.0.0
Flink, 1.11.0, 1.11.0, -, - ... This is a release to fix issues with Amazon EMR Scaling when it fails to scale up/scale...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found