How create dataset format FUNSD ?
See original GitHub issueexample format:
{
"form": [
{
"id": 0,
"text": "Registration No.",
"box": [94,169,191,186],
"linking": [
[0,1]
],
"label": "question",
"words": [
{
"text": "Registration",
"box": [94,169,168,186]
},
{
"text": "No.",
"box": [170,169,191,183]
}
]
},
{
"id": 1,
"text": "533",
"box": [209,169,236,182],
"label": "answer",
"words": [
{
"box": [209,169,236,182
],
"text": "533"
}
],
"linking": [
[0,1]
]
}
]
}
Issue Analytics
- State:
- Created a year ago
- Comments:16 (5 by maintainers)
Top Results From Across the Web
nielsr/FUNSD_layoutlmv2 · Datasets at Hugging Face
The FUNSD dataset is a collection of annotated forms. This dataset loading script is taken from the official LayoutLMv2 implementation, and updated to...
Read more >FUNSD - Guillaume Jaume
A dataset for Text Detection, Optical Character Recognition, Spatial Layout Analysis and Form Understanding. Dataset Overview. A dataset for the document ...
Read more >FUNSD Dataset - Papers With Code
The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/ ...
Read more >[FUNSD] Analyse Dataset - Kaggle
Dataset Strcuture and Format¶. Each form is scanned as an image and stored in training_data/images , the content of each form is also...
Read more >FUNSD+ | A larger and revised FUNSD dataset - Konfuzio
When creating the FUNSD+ dataset we aimed to enlarge the FUNSD. ... JSON formatting example: Visit https://git.konfuzio.com/-/snippets/33 ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
They added a new feature for this https://github.com/heartexlabs/label-studio-ml-backend/tree/master/label_studio_ml/examples/tesseract
Check this PR: https://github.com/heartexlabs/label-studio-converter/pull/127. Now you can export to FUNSD using this script.
Unfortunately we can’t make 100% compatible conversion to FUNSD format, because it has root bboxes and words bboxes and LS doesn’t build root bboxes automatically. So, this converter creates one root bbox with one word inside of it.