question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add revision to datasets

See original GitHub issue

I’d propose to add the commit hash of the revision to tasks:

from mteb import MTEB
from mteb.abstasks.AbsTaskReranking import AbsTaskReranking
from sentence_transformers import SentenceTransformer


class MindSmallReranking(AbsTaskReranking):
    @property
    def description(self):
        return {
            "name": "MindSmallReranking",
            "hf_hub_name": "mteb/mind_small",
            "description": "Microsoft News Dataset: A Large-Scale English Dataset for News Recommendation Research",
            "reference": "https://www.microsoft.com/en-us/research/uploads/prod/2019/03/nl4se18LinkSO.pdf",
            "type": "Reranking",
            "category": "s2s",
            "eval_splits": ["validation"],
            "eval_langs": ["en"],
            "main_score": "map",
            "revision": "75937953179...",
        }

model = SentenceTransformer("average_word_embeddings_komninos")
evaluation = MTEB(tasks=[MindSmallReranking()])
evaluation.run(model)

This is then fed into load_dataset via revision= & added to the results json file.

This partly addresses https://github.com/embeddings-benchmark/mteb/issues/21

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
Muennighoffcommented, Aug 10, 2022

As noted in https://github.com/embeddings-benchmark/mteb/pull/41, this Issue can be closed once a revision field has been added to all MTEB datasets

1reaction
Muennighoffcommented, Aug 9, 2022

Yeah there are three versions:

  • Revision of the model (Probably entirely on the user side, as we don’t implement models here yet)
  • Revision of the dataset
  • Version of this library (& its dependencies)

If we don’t have dataset revisions, the current implementation automatically downloads the latest dataset, so it wouldn’t be versioned. As each change to a dataset revision is tracked via a git commit in this repo, using just the version of this library (e.g. a commit string) should be enough. We’ll know all dataset revisions at that commit from the code.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Add dataset revision to a created dataset
I want to add revision/version to these datasets but don't know how - can't add it via dataset push_to_hub nor via DatasetInfo.
Read more >
Teamcenter: Revision Counter on Datasets - YouTube
This video shows you how to turn off an “out of the box” feature to customize a dataset's revision counter. This will keep...
Read more >
Revision Counter on Datasets in Teamcenter - Saratech
I'm an application engineer at Saratech, and this video will show you how to add the revision name or letter to the end...
Read more >
Editing Datasets with the Data & Insights Data Management ...
To create a revision of a dataset, you will now select Edit using the action bar from either Primer or the data Table....
Read more >
Guidance on Documenting Revisions to USGS Scientific ...
This guidance describes a revision process for scientific digital datasets (hereafter referred to as data) and associated metadata that have been released ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found