Model Zoo Revamp
See original GitHub issueProblem
As of now our model zoo doesn’t make it clear which models are available docs/model_zoo.md
is not well maintained example doesn’t include BERT or MMF. It doesn’t show how users can submit a new model to the zoo (Making a PR is not enough since the S3 bucket is not publicly available and we don’t show users instructions on how to upload things, we assume they know how to use the aws
api)
This creates problems since
- Submitted examples can’t have good test cases, we can’t check in mar files
- Limits code reuse between teams in open source, can’t share mar files with each other and it’s not clear what’s been done before or not
Solution
A better solution needs to make sure the zoo is public, searchable, automatically updated, allow user submissions and needs to worry about preventing user spam like spamming unwanted or harmful objects to an S3 bucket we maintain
Should we use something like pytorch model hub? hf hub? use a homegrown basic s3 hub?
Current Experience
The current experience is the torchserve team maintains an S3 bucket where only they have write access to common models users care about
Pros
- Curated models that work
Cons
- Doesn’t allow community contributions which prevents rich set of examples, higher quality unit tests and growth overall
Pytorch hub
Pros
- PyTorch brand, curated
Cons
- May require some work to support a
mar
file format - Cannot host weights without code review, does not allow arbitrary files to be stored
HuggingFace Hub
Pros
- Can upload arbitrary files including mar files from either a web UI or CLI
- Model Hub discovery is good
- No code review process,
Cons
- anyone can submit (not sure how they deal with spam and harmful content)
Homegrown Hub
Create our own model hub, or maybe standardize mar
format more and revamp torch hub?
Pros
- Most flexible, can support any data format we like
Cons
- Need to host a service so community members can submit and inspect available models
- Need to deal with security, spam and harmful content since if users can submit anything it’s a security risk to just unzip a random file from the internet
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
Hey all, Omar from HF Here 🤗
We’d love to support your use case on the Hugging Face Hub if it makes sense! Just for clarification, the Hub is not constrained to 🤗
transformers
models (or models created withTrainer
). The Hub uses git-based repositories that anyone can create and upload models to, we actually have integrations with different libraries, many of which are not transformers nor NLP-focused.One thing that you might find useful is that model cards have metadata that allow reporting things such as the dataset, metrics, tags, etc. This can help with discoverability and even comparison of evaluation results.
There is also the community Inference API that enables widgets to try out the models directly in the browser (or through HTTP requests), or Spaces for fancier demos such as the ones at https://huggingface.co/pytorch.
Let us know if we can help 😄 🦙
cc @LysandreJik @julien-c
Hi @osanseviero I think this makes sense, I think at least for the hosting and model card part your hub is a good experience. I’m embarrassed to admit I couldn’t find instructions to upload directories or files and populate a simple model card to the hub directly so if you can link me one I can whip out a POC very quickly
For everything else let’s talk more. My email is my first name and last name at fb.com