Add revision to datasets
See original GitHub issueI’d propose to add the commit hash of the revision to tasks:
from mteb import MTEB
from mteb.abstasks.AbsTaskReranking import AbsTaskReranking
from sentence_transformers import SentenceTransformer
class MindSmallReranking(AbsTaskReranking):
@property
def description(self):
return {
"name": "MindSmallReranking",
"hf_hub_name": "mteb/mind_small",
"description": "Microsoft News Dataset: A Large-Scale English Dataset for News Recommendation Research",
"reference": "https://www.microsoft.com/en-us/research/uploads/prod/2019/03/nl4se18LinkSO.pdf",
"type": "Reranking",
"category": "s2s",
"eval_splits": ["validation"],
"eval_langs": ["en"],
"main_score": "map",
"revision": "75937953179...",
}
model = SentenceTransformer("average_word_embeddings_komninos")
evaluation = MTEB(tasks=[MindSmallReranking()])
evaluation.run(model)
This is then fed into load_dataset
via revision=
& added to the results json file.
This partly addresses https://github.com/embeddings-benchmark/mteb/issues/21
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Add dataset revision to a created dataset
I want to add revision/version to these datasets but don't know how - can't add it via dataset push_to_hub nor via DatasetInfo.
Read more >Teamcenter: Revision Counter on Datasets - YouTube
This video shows you how to turn off an “out of the box” feature to customize a dataset's revision counter. This will keep...
Read more >Revision Counter on Datasets in Teamcenter - Saratech
I'm an application engineer at Saratech, and this video will show you how to add the revision name or letter to the end...
Read more >Editing Datasets with the Data & Insights Data Management ...
To create a revision of a dataset, you will now select Edit using the action bar from either Primer or the data Table....
Read more >Guidance on Documenting Revisions to USGS Scientific ...
This guidance describes a revision process for scientific digital datasets (hereafter referred to as data) and associated metadata that have been released ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As noted in https://github.com/embeddings-benchmark/mteb/pull/41, this Issue can be closed once a revision field has been added to all MTEB datasets
Yeah there are three versions:
If we don’t have dataset revisions, the current implementation automatically downloads the latest dataset, so it wouldn’t be versioned. As each change to a dataset revision is tracked via a git commit in this repo, using just the version of this library (e.g. a commit string) should be enough. We’ll know all dataset revisions at that commit from the code.