question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

Opt-in for downloading without symlinks

See original GitHub issue

This is possibly a niche use case.

I recently found that some libraries (coremltools, in this case) donā€™t play nice with symlinks even on Unix platforms šŸ˜². This led me to replace this one-liner, which was intended for user communication:

from huggingface_hub import snapshot_download

repo_id = "apple/coreml-stable-diffusion-v1-4"
variant = "original/packages"

downloaded = snapshot_download(repo_id, allow_patterns=f"{variant}/*")

With this one (taken from the blog post):

from huggingface_hub import snapshot_download
from huggingface_hub.file_download import repo_folder_name
from pathlib import Path
import shutil

repo_id = "apple/coreml-stable-diffusion-v1-4"
variant = "original/packages"

def download_model(repo_id, variant, output_dir):
    destination = Path(output_dir) / (repo_id.split("/")[-1] + "_" + variant.replace("/", "_"))
    if destination.exists():
        raise Exception(f"Model already exists at {destination}")
    
    # Download and copy without symlinks
    downloaded = snapshot_download(repo_id, allow_patterns=f"{variant}/*", cache_dir=output_dir)
    downloaded_bundle = Path(downloaded) / variant
    shutil.copytree(downloaded_bundle, destination)

    # Remove all downloaded files
    cache_folder = Path(output_dir) / repo_folder_name(repo_id=repo_id, repo_type="model")
    shutil.rmtree(cache_folder)
    return destination

model_path = download_model(repo_id, variant, output_dir="./models")
print(f"Model downloaded at {model_path}")

Itā€™s not the end of the world, but in this case I really wanted to stress how easy it was to download Core ML checkpoints from the hub and use them downstream for whatever purpose.

If this is something that only affects coremltools, then itā€™s not worthwhile doing anything (Iā€™ll open a PR there when I look into the problem in more depth). Iā€™m raising the issue in case somebody else has observed other use cases that could benefit from a flag to unconditionally use #1067 even if symlinks are supported by the underlying os.

Issue Analytics

  • State:open
  • Created 10 months ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
pcuencacommented, Dec 2, 2022

Yeah, I supposed there might be other scenarios where users need a exact copy of the file structure. Building a docker container sounds like one of them (or deployment tasks, in general). snapshot_download is much better than git clone because you can specify branches or patterns (as in my example above) and donā€™t have to download stuff you donā€™t need or keep a humongous .git directory with all the lfs blobs. Admittedly, we tend to create heavily overloaded repos with multiple variants for different frameworks, floating-point precision, etc.

Happy to propose a PR if you do decide to go this way.

2reactions
Wauplincommented, Dec 2, 2022

Hi @pcuenca , thanks for opening the issue. As you said, this is indeed a quite niche situation. It remind me a discussion (internal link) triggered by @philschmid when he wanted to download a model from the Hub without the cache structure (e.g. the blobs and symlinks) in order to build docker containers (cc @julien-c as well).

The solution you proposed is quite good. It would just require to make sure the cache directory is not populated before download_model as it is completely erased by shutil.rmtree (e.g. users need to know what they are doing šŸ˜„).

About having a flag to disable smylinks (and activate https://github.com/huggingface/huggingface_hub/pull/1067), Iā€™m not against it. I would just wait for more requests before making it a feature of hfh.

Read more comments on GitHub >

github_iconTop Results From Across the Web

npm install without symlinks option not working - Stack Overflow
In this shared folder I try to start a project with ember-generator of Yeoman. For installing modules NPM I use the option "--no-bin-links"...
Read more >
How can I copy a directory structure but ignore symlinks?
According to the documentation of the cp command, you can use -d option: '-d'. Copy symbolic links as symbolic links rather than copying...
Read more >
Ignore symbolic links when downloading - Forum - WinSCP
Is it possible to ignore symbolic links to files when downloading "New and ... Would turning off "Resolve symbolic links" option help?
Read more >
Symlinks in Windows 10! - Windows Developer Blog
Symlinks, or symbolic links, are ā€œvirtualā€ files or folders which reference a physical file or folder located elsewhere, and are an importantĀ ...
Read more >
Easy Symlink creation/management | No more confusing ...
Discovered what Symlinks are? Want to make life a LOT simpler by being able to right-click on files and folders, and just create...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found