[1.11.0 Release] `serve_failure` failing on master and 1.11.0 branches
See original GitHub issueE.g. 1.11.0: https://buildkite.com/ray-project/periodic-ci/builds/2524#5e259950-8965-4f40-a39a-bb0f2fbf2c86/138-427 master: https://buildkite.com/ray-project/periodic-ci/builds/2503#69d841c1-1c30-4ea4-8a2e-c586ffe142c8/139-429
I was able to reproduce the issue by starting the test from my laptop. The cluster can be kept open after the failure with --no-terminate
. e.g. get aws creds, then run
ANYSCALE_HOST=https://console.anyscale.com ANYSCALE_CLI_TOKEN=... ANYSCALE_CLOUD_ID=... ANYSCALE_PROJECT=... RAY_WHEELS=... RAY_TEST_REPO=https://github.com/ray-project/ray.git RAY_TEST_BRANCH=... python ray/release/e2e.py --category=... --test-config ray/release/long_running_tests/long_running_tests.yaml --test-name serve_failure --keep-results-dir --smoke-test --no-terminate
The failures to reach /api/version
in the tests are due to dashboard being terminated. I checked the cluster, and gcs
was also terminated. It looks like the process terminations are not due to any crash. However, because dashboard was terminated, the test cannot submit job and finish successfully.
Assigning to @simon-mo as serve oncall.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Changelog - Private Packagist Self-Hosted
Creating a new Bitbucket Server integration resulted in a server error. This has been resolved. Supported Replicated versions: >=2.53.2 <2.54.0. 1.11.0. March ...
Read more >Zephyr Kernel 1.11.0 — Zephyr Project Documentation
We are pleased to announce the release of Zephyr kernel version 1.11.0. ... Identify daily-built master-branch docs as “Latest” version.
Read more >GKE release notes archive | Google Kubernetes Engine (GKE)
This page contains a historical archive of all release notes for Google Kubernetes Engine prior to 2020. To view more recent release notes,...
Read more >title: Changelog — Auspice documentation
Invalid datasets will show an error notification and fallback to the dataset defined by the frontmatter of the narrative. (Bugfix) Zooming in the...
Read more >Amazon EMR release 6.0.0
Flink, 1.11.0, 1.11.0, -, - ... This is a release to fix issues with Amazon EMR Scaling when it fails to scale up/scale...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Since the issue has been identified to be related to test setup, lowering to P1.
Sounds good Jiao.