[LayoutLM] How to reproduce FUNSD result
See original GitHub issueHello, I have run fine tuning for the Sequence Labeling Task with FUNSD dataset but my I couldn’t achieve the result presented in the paper (precision is only 40%), here are some scripts and log that I used, any idea about what could be wrong? Thank you very much. Training:
#!/bin/bash
python run_seq_labeling.py --data_dir ~/mnt/data \
--model_type layoutlm \
--model_name_or_path ~/mnt/model \
--do_lower_case \
--max_seq_length 512 \
--do_train \
--num_train_epochs 100.0 \
--logging_steps 10 \
--save_steps -1 \
--output_dir ~/mnt/output \
--labels ~/mnt/data/labels.txt \
--per_gpu_train_batch_size 16 \
--fp16
Testing:
#!/bin/bash
python run_seq_labeling.py --do_predict\
--model_type layoutlm\
--model_name_or_path ~/mnt/model\
--data_dir ~/mnt/data\
--output_dir ~/mnt/output\
--labels ~/mnt/data/labels.txt
Some log:
05/14/2020 09:40:45 - INFO - __main__ - ***** Running training *****
05/14/2020 09:40:45 - INFO - __main__ - Num examples = 150
05/14/2020 09:40:45 - INFO - __main__ - Num Epochs = 100
05/14/2020 09:40:45 - INFO - __main__ - Instantaneous batch size per GPU = 16
05/14/2020 09:40:45 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 16
05/14/2020 09:40:45 - INFO - __main__ - Gradient Accumulation steps = 1
05/14/2020 09:40:45 - INFO - __main__ - Total optimization steps = 1000
05/14/2020 09:53:00 - INFO - __main__ - global_step = 1000, average loss = 0.10387736940692412
05/14/2020 10:17:07 - INFO - __main__ - ***** Running evaluation *****
05/14/2020 10:17:07 - INFO - __main__ - Num examples = 52
05/14/2020 10:17:07 - INFO - __main__ - Batch size = 8
05/14/2020 10:17:07 - INFO - __main__ -
precision recall f1-score support
QUESTION 0.41 0.70 0.52 771
HEADER 0.00 0.00 0.00 108
ANSWER 0.39 0.50 0.44 513
micro avg 0.40 0.57 0.47 1392
macro avg 0.37 0.57 0.45 1392
05/14/2020 10:17:07 - INFO - __main__ - ***** Eval results *****
05/14/2020 10:17:07 - INFO - __main__ - f1 = 0.472115668338743
05/14/2020 10:17:07 - INFO - __main__ - loss = 2.9291565077645436
05/14/2020 10:17:07 - INFO - __main__ - precision = 0.400600901352028
05/14/2020 10:17:07 - INFO - __main__ - recall = 0.5747126436781609
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (4 by maintainers)
Top Results From Across the Web
LayoutLM — transformers 4.7.0 documentation - Hugging Face
It obtains state-of-the-art results on several downstream tasks: form understanding: the FUNSD dataset (a collection of 199 annotated forms comprising more ...
Read more >Fine tuning LayoutLMv2 On FUNSD - Kaggle
This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor ...
Read more >Fine-tuning LayoutLMForTokenClassification on FUNSD.ipynb
So let's load in a image of the test set, run our own OCR on it to get the bounding boxes, then run...
Read more >Fine-Tuning LayoutLM v2 For Invoice Recognition
Here is a snippet from the abstract: “Experiment results show that LayoutLMv2 outperforms LayoutLM by a large margin and achieves new state-of-the-art ...
Read more >Extract Key Information from Documents using LayoutLM
Video explains the architecture of LayoutLm and Fine-tuning of LayoutLM model to extract information from documents like Invoices, Receipt, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @nv-quan , It seems that you didn’t set
max_seq_length
during the evaluating stage. So please add--max_seq_length 512
to your testing command and try again.@marythomaa98 thanks a lot, it works when I add --do_lower_case to my test script. And also remove the data/cached_test_model_512