Trainer's create_model_card creates an invalid yaml metadata `datasets: - null`
See original GitHub issueEnvironment info
- any env
Who can help
- discussed with @julien-c @sgugger and @LysandreJik
Information
- The hub will soon reject push with invalid model card metadata,
- only when
datasets
,model-index
orlicense
are present, their content need to follow the specification cf. https://github.com/huggingface/huggingface_hub/pull/342
To reproduce
Steps to reproduce the behavior:
- Train a model
- Do not association any datasets
- The trained model and the model card are rejected by the server
Expected behavior
trainer.py git push should be successfull, even with the coming patch https://github.com/huggingface/transformers/pull/13514
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (9 by maintainers)
Top Results From Across the Web
Invalid YAML Metadata - Resolved help - Obsidian Forum
I have the following YAML header in my notes but it generates an “Invalid YAML” error in preview. --- Type: [[Notes]] Tags: [[Stanford ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
In the meantime I’ve suggested a fix for the problems for which this issue was created and for the incomplete results I mentioned. Both have their origin in the code of the
TrainingSummary
, so fixing them is not duplicate code 😃We can think more about what validation we want to do where, personally I would see this more in the hf_hub side, in the function that adds metadata (which we will use in the Trainer once it’s merged and in a release of hf_hub).
Hmm, the
datasets: - null
issue is not about missing datasets, it’s about invalid data.in YAML, this is parsed as
datasets = [None]
(python-syntax) whereas it should be an array of string.In my opinion, we will not enforce rejections for missing data any time soon (especially for automatically generated model cards).