Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GLUE test set predictions

See original GitHub issue

🚀 Feature request

Motivation

The run_glue script is super helpful. But it currently doesn’t implement producing predictions on the test datasets for the GLUE tasks. I think this would be extremely helpful for a lot of people. I’m sure there are plenty of people who have implemented this functionality themselves, but I haven’t found any. Since transformers already provides train and dev for GLUE, it would be cool to complete the feature set with providing test set predictions.

Your contribution

I’m personally working on a branch that extends the glue_processors to support the test sets (which are already downloaded by the recommended download_glue.py script. I also update the run_glue.py script to produce the *.tsv files required by the GLUE online submission interface.

I think I’m a couple days out from testing/completing my implementation. I’m also sure plenty of implementations exist of this. If there are no other plans to support this in the works, I’m happy to submit a PR.

Issue Analytics

State:
Created 4 years ago
Reactions:6
Comments:7 (3 by maintainers)

Top GitHub Comments

2reactions

julien-ccommented, May 25, 2020

@AMChierici make sure you run from master, there’s indeed a mode kwarg now.

@shoarora Thanks for this first PR and I did check yours while merging the other (to make sure that the indices in csv parsing, etc. were correct)

1reaction

shoaroracommented, May 25, 2020

@AMChierici I didn’t author #4463, which is what has made it to master to enable this feature. I haven’t played with it yet so sorry I can’t be of more help

Top Results From Across the Web

GLUE Benchmark

A public leaderboard for tracking performance on the benchmark and a dashboard for visualizing the performance of models on the diagnostic set. The...

glue · Datasets at Hugging Face

premise (string) label (class label) idx (int32) "The cat sat on the mat." ‑1 0 "The cat did not sit on the mat." ‑1 1 "The...

GLUE Explained: Understanding BERT Through Benchmarks

The General Language Understanding Evaluation benchmark (GLUE) is a collection of datasets used for training, evaluating, and analyzing NLP ...

Step 6: Evaluate the Model - Amazon SageMaker

Set up the following function to predict each line of the test set. · Run the following code to make predictions of the...

Experimental results on the test set from GLUE server. The ...

... report the experimental results on the dev set of GLUE in Table 2 and submit our predictions to the GLUE test server...