question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bad eval results on RTE and CoLA

See original GitHub issue

I tried fine-tuning ALBERT-base model on the two smallest glue tasks, but got only about 66% accuracy for both. I was using GPU (2080Ti) for it. The script for glue fine-tuning has bug in the evaluation part, and I tried to fix it, but I am quite new to tensorflow so I am not sure if there is still something wrong with the script. Below is the script I am using:

set -ex

OUTPUT_DIR="glue_baseline"

# To start from a custom pretrained checkpoint, set ALBERT_HUB_MODULE_HANDLE
# below to an empty string and set INIT_CHECKPOINT to your checkpoint path.
ALBERT_HUB_MODULE_HANDLE="https://tfhub.dev/google/albert_base/1"
INIT_CHECKPOINT=""

ALBERT_ROOT=pretrained/albert_base


function run_task() {
  COMMON_ARGS="--output_dir="${OUTPUT_DIR}/$1" --data_dir="${ALBERT_ROOT}/glue" --vocab_file="${ALBERT_ROOT}/vocab.txt" --spm_model_file="${ALBERT_ROOT}/30k-clean.model" --do_lower_case --max_seq_length=128 --optimizer=adamw --task_name=$1 --warmup_step=$2 --learning_rate=$3 --train_step=$4 --save_checkpoints_steps=$5 --train_batch_size=$6"
  python3 -m run_classifier \
      ${COMMON_ARGS} \
      --do_train \
      --nodo_eval \
      --nodo_predict \
      --albert_hub_module_handle="${ALBERT_HUB_MODULE_HANDLE}" \
      --init_checkpoint="${INIT_CHECKPOINT}"
  python3 -m run_classifier \
      ${COMMON_ARGS} \
      --nodo_train \
      --do_eval \
      --albert_hub_module_handle="${ALBERT_HUB_MODULE_HANDLE}" \
      --do_predict
}

run_task RTE 200 3e-5 800 100 32

I tried printing the training loss and it seems to have converged, but somehow the eval results are nearly random. The eval accuracy for different checkpoints are different, so I think these checkpoints have been loaded.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8

github_iconTop GitHub Comments

2reactions
MichaelZhouwangcommented, Mar 26, 2020

Got it, thanks! I just got some reasonable results with ALBERT-base on MRPC.

By the way, we must specify either --albert_config_file or --albert_hub_module_handle in the evaluation part, which is not included in the current version.

In addition, I am also curious about the CoLA dev result you got. I found the result in this task to be very sensitive for different random seeds. Look forward for your reply. Thanks!

0reactions
MichaelZhouwangcommented, Mar 26, 2020

Got it, thanks! I just got some reasonable results with ALBERT-base on MRPC.

By the way, we must specify either --albert_config_file or --albert_hub_module_handle in the evaluation part, which is not included in the current version.

Hi, I am also fine-tuning Albert-base v2 on mrpc. Could you please share what dev acc_and_f1 result you’ve got on MRPC dataset? I’m not sure if I tuned it well. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bad eval results on RTE and CoLA issue - PythonTechWorld
Bad eval results on RTE and CoLA. I tried fine-tuning ALBERT-base model on the two smallest glue tasks, but got only about 66%...
Read more >
VA Disability Rates 2023 With 8.7% COLA Increase! (OFFICIAL)
Disabled vets rated 10% or higher will see their VA disability pay increase by 8.7% effective Dec. 1, 2022. See VA Disability Rates...
Read more >
glue · Datasets at Hugging Face
premise (string) label (class label) idx (int32) "The cat sat on the mat." ‑1 0 "The cat did not sit on the mat." ‑1 1 "The...
Read more >
Fine-Tuning Pretrained Language Models:Weight ... - arXiv
High rank correlation means that the ranking of the models is similar between the two evaluation points, and suggests we can stop the...
Read more >
bert-experiments/eval_performance_analysis.ipynb at master · sai ...
... "baseline" / "results.json", # "bad-mimic-pruned, retrained": evaluation_dir / "global_bad_mask_mimic_size_retrained" / "baseline" / "results.json", ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found