Customize tokenizer in model card's widget
See original GitHub issueI trained a Chinese Roberta model. In the model card, the widget uses a tokenizer defined in config.json(RobertaTokenizer
). But my model uses BertTokenizer
. Can I customize the tokenizer in the widget of the model card just like I can choose any combination of model and tokenizer in a pipeline?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Customize tokenizer in model card's widget - Beginners
I trained a Chinese Roberta model. In the model card, the widget uses a tokenizer defined in config.json( RobertaTokenizer ).
Read more >Create a Tokenizer and Train a Huggingface RoBERTa Model ...
Create and train a byte-level, Byte-pair encoding tokenizer with the same special tokens as RoBERTa; Train a RoBERTa model from scratch using ...
Read more >Basis Theory - Quickly tokenize anything
A compliant and developer-friendly data tokenization and encryption ... Customize the length and format of your tokens using the Liquid syntax, Expressions.
Read more >Digital Wallets and Tokenization | Marqeta Docs
Digital Wallets and Tokenization. A digital wallet is a device or system for storing digitized versions of payment cards.
Read more >Tokenization payment technology guide - Adyen
Does payment tokenization suit my business? Tokenization is suited to any businesses with subscription- based business models or which generate significant ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yes, this is possible. See https://github.com/huggingface/transformers/commit/ed71c21d6afcbfa2d8e5bb03acbb88ae0e0ea56a, you should add a
tokenizer_class
attribute to your config.json with the tokenizer class you want to use.cc @sgugger @LysandreJik I have no idea if this is currently documented or just in the code 🤭
arg, who does that guy think he is? 😂