question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature][runtime env] `RuntimeEnvAgent` use MD5Sum as the key to cache package which download from URIs

See original GitHub issue

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Currently, runtime_env_agent use parsed URI as key to cache downloaded packages, but it seems that will bring up two problems:

  1. I find two different URIs with the same protocol have the same result which returns by function pares_uri:
In [1]: from ray._private.runtime_env.packaging import parse_uri

In [2]: parse_uri("s3://test/a_file.zip")
Out[2]: (<Protocol.S3: 's3'>, 's3_test_a_file.zip')

In [3]: parse_uri("s3://test/a/file.zip")
Out[3]: (<Protocol.S3: 's3'>, 's3_test_a_file.zip')

In [4]: parse_uri("s3://test/a/file.zip") == parse_uri("s3://test/a_file.zip")
Out[4]: True

This may cause workers can’t run in the right runtime_env.

  1. If we use the hash value of URI to solve problem 1, then we will face problem 2: Let us consider the following usage scenarios:
  • Scenes: User use ray client to connect ray cluster in his local mac, and his python job will read a config file test.yml. Then he put this config in relative path “.”, and uses runtime_env parameters “working_dir” to package and upload config file to ray cluster. Next, he finds his job finished but the result doesn’t conform to his expectations. Then he changes his config file ./test.yml, and runs his code without any modifications. Finally, the result of the job which submits to the ray cluster will be the same as last time.
  • My point: The user can indeed make the modification of the config file take effect by changing the path where the configuration file is located, and at the same time modifying the code that submits the job locally, but this will lead to poor user experience and reduce user productivity.
  • Too long to see, the follow test case will failed in master:
@pytest.mark.skipif(sys.platform == "win32", reason="Fail to create temp dir.")
def test_same_uri(start_cluster):
    cluster, address = start_cluster
    ray.init(address)
    with tempfile.TemporaryDirectory() as tmp_dir, chdir(tmp_dir):
        check_value1 = b"1"
        check_value2 = b"2"
        with zipfile.ZipFile("test.zip", "w") as zf:
            with zf.open("test_file", "w") as f:
                f.write(check_value1)
        with open("test.zip", "rb") as f:
            f1_bytest = f.read()
        os.remove("test.zip")

        with zipfile.ZipFile("test.zip", "w") as zf:
            with zf.open("test_file", "w") as f:
                f.write(check_value2)
        with open("test.zip", "rb") as f:
            f2_bytest = f.read()
        os.remove("test.zip")
        assert f1_bytest != f2_bytest

        gcs_uri = "gcs://test.zip"
        _internal_kv_put(gcs_uri, f1_bytest, overwrite=True)
        assert _internal_kv_get(gcs_uri) == f1_bytest

        @ray.remote(runtime_env={"working_dir": gcs_uri})
        def f1():
            with open("test_file", "rb") as f:
                value = f.read()
            assert value == check_value1
        ray.get(f1.remote())

        _internal_kv_put(gcs_uri, f2_bytest, overwrite=True)
        assert _internal_kv_get(gcs_uri) == f2_bytest

        @ray.remote(runtime_env={"working_dir": gcs_uri})
        def f2():
            with open("test_file", "rb") as f:
                value = f.read()
            assert value == check_value2
        ray.get(f2.remote())

Problem one, I think is a bug, and problem two, I think is very important too. I’ve seen some guys overwriting a python package of the same name same version on our internal PyPI source to save time, not to mention overwrite the s3 file…

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
Catch-Bullcommented, Feb 28, 2022

@architkulkarni @edoakes for 2, We will discuss it with our usage case internally, and we will synchronize the results later.

0reactions
Catch-Bullcommented, Feb 28, 2022

@architkulkarni I wonder if we could keep the current translation in parse_uri but also append a hash to the end of the file name. The prefix would still be human-readable, and the hash would prevent collision.

I think it is difficult to make the local file names be human-readable, case according to aws doc: Creating object key names, s3 key support some special characters, such as space, and if we fixed it one by one, this will bring a lot of work, and more potential bugs…

Read more comments on GitHub >

github_iconTop Results From Across the Web

runtime_env backlog Milestone - GitHub
[Feature] [runtime env] Clean up the command arguments in raylet args ... env] RuntimeEnvAgent use MD5Sum as the key to cache package which...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found