layoutlmv3-base-chinese tokenizer could not be loaded.
See original GitHub issueThe resources in chinese version layoutlmv3 is sentencepiece.bpe.model
.
But we seem need vocab.json
and merges.txt
to load the LayoutLMv3Tokenizer
.
So @Dod-o could you provide a function to convert them or confirm whether there is a diff between these two tokenizers?
Issue Analytics
- State:
- Created a year ago
- Comments:9 (1 by maintainers)
Top Results From Across the Web
LayoutLMv3
Construct a LayoutLMv3 tokenizer. Based on RoBERTatokenizer (Byte Pair Encoding or BPE). LayoutLMv3Tokenizer can be used to turn words, word-level bounding ...
Read more >Python: BERT Tokenizer cannot be loaded
I think this should work: from transformers import BertTokenizer TOKENIZER = BertTokenizer.from_pretrained('bert-base-multilingual-uncased', ...
Read more >Fine-Tuning LayoutLM v3 for Invoice Processing
In this tutorial, we will fine-tune Microsoft's latest LayoutLM v3 on invoices ... show that “LayoutLMv3 achieves state-of-the-art performance not only in ...
Read more >Chinese · spaCy Models Documentation
Chinese. Available trained pipelines for Chinese ... nlp = spacy.load("zh_core_web_sm") ... Chinese transformer pipeline (bert-base-chinese).
Read more >Transformers: State-of-the-art Machine Learning ...
Transformer models can also perform tasks on several modalities combined, ... tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") >>> model ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Actually, I create a new tokenzier from
LayoutLMv3Tokenizer
, use spm model to tokenize text, rather than the two files.Maybe you could check whether it’s available, or just do some minor change to
LayoutLMv3Tokenizer
. Then it could fit both languages?@Dod-o