Add in-layer TF Tokenizer to BPE tokenizers
See original GitHub issueFeature request
As what we have with TFBertTokenizer
, but with models that use Byte Pair Encoding (e.g. TFT5Tokenizer
, TFClipTokenizer
) etc…
They were implemented in keras-nlp
(https://github.com/keras-team/keras-nlp/pull/389) and we can now bring them here.
Motivation
With that feature we will be able to serve almost every model with TF Serving, which will make it much easier to serve models, as we won’t have to write handlers and custom servers.
Having TF BPE Tokenizers is (I think) the last barrier to make transformers
fully TF Serving-compliant.
Your contribution
I can submit a PR, but there are a huge lot of models for which we would need to do that, so I expect a large number of subtasks if you decide to go for it.
Also, as keras-nlp
implemented it (https://github.com/keras-team/keras-nlp/pull/389), should we copy-paste the code for each tokenizer or import from keras-nlp
, while keeping the reference to their repo?
Issue Analytics
- State:
- Created a year ago
- Comments:24 (21 by maintainers)
Top GitHub Comments
Don’t apologize at all - this is something we were struggling with too!
Yeah I’m exploring it and guess what it is not as easy as I thought haha.