question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Release 1.11.0] job submission error

See original GitHub issue

On releases/1.11.0 branch, there are job submission errors in rte_ray_client and train_small: https://buildkite.com/ray-project/periodic-ci/builds/2788#6a73aaf1-80f7-40bf-9b8b-0f21c91e6e57/136-545 https://buildkite.com/ray-project/periodic-ci/builds/2788#1bdebe61-370e-4d93-a979-402732826c34/136-542

These seem like mismatched command handling in product vs in the job client. Can anyone advise on the commit to cherrypick to fix this? e.g. would it be #22011, #22209, or something else? cc @edoakes @simon-mo @krfricke. Assigning to @architkulkarni as Platform oncall.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
mwtiancommented, Feb 16, 2022

Picking https://github.com/ray-project/ray/pull/22011 sounds good. One possibility is that rte_ray_client and train_small use Ray client (use_connect: True), and the codepath for that in e2e.py is different. I will send out a PR.

1reaction
mwtiancommented, Feb 16, 2022

IIUC, the previous job command before the wait_cluster.py call installs awscli and copies wait_cluster.py and other local files to the Anyscale session: https://github.com/ray-project/ray/blob/8b1bbfe8e438a06bf2f9fe2cbf65f163d64227dd/release/e2e.py#L506-L512 Because the job fails, the wait_cluster.py file is missing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ray Job Submission: Going from your laptop to production
Ray Job submission is a mechanism to submit locally developed and tested applications to a running remote Ray cluster. It simplifies the user...
Read more >
[v.1.11.0] Release Tracker #72267 - pytorch/pytorch - GitHub
Fixes to regressions against the most recent minor release (e.g. 1.10 for 1.11 release; see module: regression issue list) ...
Read more >
spring - Maven is not using Java 11: error message "Fatal ...
When I try to run the application with Java 8 as the Java version in pom.xml, it works fine. But when I try...
Read more >
How can I update STM32CubeIDE from version 1.10.1 to ...
The "automatic updates" show me a new version 1.11.0. After starting the process it stops with following error: Problem Occured.
Read more >
Readme and Release notes for release 3.5.1.11 (LL ... - IBM
The update steps for a LoadLeveler submit-only machine are similar ... Fixed Loadleveler to prevent duplicate job id error by trying other ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found