Training distiluse with TSDAE
See original GitHub issueI’ll like to further train distiluse-base-multilingual-cased-v1
on a custom dataset using the example provided train_tsdae_from_file.py
. I’ve been able to use it to train both bert-base-uncased
and stsb-xlm-r-multilingual
and actually getting good results with the later, I would like to do the same with distiluse as it gives me a better result with the pretrained model and hopefully it will improve with TSDAE. But I’m getting the following error:
tf-docker /root > python scripts/train_tsdae_from_file.py data/job_text_59k/jobtitle_59k-test.txt
2021-05-28 00:57:12.412477: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Read file: 2000it [00:00, 1297341.17it/s]
2021-05-28 00:57:14 - 1926 train sentences
Traceback (most recent call last):
File "scripts/train_tsdae_from_file.py", line 59, in <module>
word_embedding_model = models.Transformer(model_name)
File "/usr/local/lib/python3.6/dist-packages/sentence_transformers/models/Transformer.py", line 28, in __init__
self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)
File "/usr/local/lib/python3.6/dist-packages/transformers/models/auto/auto_factory.py", line 381, in from_pretrained
return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_utils.py", line 1103, in from_pretrained
f"Error no file named {[WEIGHTS_NAME, TF2_WEIGHTS_NAME, TF_WEIGHTS_NAME + '.index', FLAX_WEIGHTS_NAME]} found in "
OSError: Error no file named ['pytorch_model.bin', 'tf_model.h5', 'model.ckpt.index', 'flax_model.msgpack'] found in directory saved_models/distiluse-base-multilingual-cased-v1/ or `from_tf` and `from_flax` set to False.
Looking the unzipped files it seems that the save format is different as the two former models have the pytorch save file on the same dir as the config while distiluse seem to be formed by several modules. Is it possible to train with TSDAE distiluse?
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
TSDAE pre-training for DistilBERT · Issue #1311
TSDAE will be training using these sentences. ... 'distilbert-base-uncased' # model_name = 'distiluse-base-multilingual-cased-v1' model_name ...
Read more >TSDAE — Sentence-Transformers documentation
This section shows an example, of how we can train an unsupervised TSDAE (Tranformer-based Denoising AutoEncoder) model with pure sentences as training data....
Read more >how to fine-tune "distiluse-base-multilingual-cased" model ...
I am trying to do semantic search but pre-trained model is not accurate ... for a semi-supervised similiarity training objective like TSDAE.
Read more >TSDAE: Using Transformer-based Sequential Denoising ...
In this work, we present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder ( ...
Read more >kwang2049/TSDAE-askubuntu2nli_stsb
This model adapts the knowledge from the NLI and STSb data to the specific domain AskUbuntu. Training procedure of this model: Initialized with...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thank you guys, all comments are worth be kept in mind! The reason why I wanted to startt with distiluse is that it is already giving pretty good results (although a little worse than the original USE-M-Lv3, but I really don’t know how I could fine-tune it). I think I will take your advise and make some tests as I need a multilingual language model I may start wth some version of bert. But I may also try you approach @kwang2049 and see what happens. Hope I can share something useful later!
Hi @eduardofv, starting from PLM models like
bert-base-uncased
andxlm-roberta-base
make more sense than from SBERT models which are finetuned on sentence embedding tasks. Actually, in our own results, we foundbert-base-uncased
->TSDAE->stsb/nli is usually much better thanbert-based-uncased
->stsb/nli->TSDAE.For your question, as @nreimers said, it could be kinda tricky to start from this checkpoint due to (1) DistilBERT from HuggingFace (HF) actually has not been extended to support LM head officially; (2) TSDAE builds decoder via observing the encoder config (from HF) and different pooling sizes can be an issue.
To solve this, (1) one can first make extension to support LM head. For DistilBERT, I have personally done it in this Gist. One can download it and import the included modeling_distilbert.py file to get the support of LM head. And (2) then about the size issue, one can either pop the dense layer or add a new dense layer (mapping 512 to 768).
All together to talk in code, it will look like this: (also, thanks for @ScottishFold007’s hint, to build SBERT model from SBERT checkpoints, one need to use
SentenceTransformer('checkpoint-name' )
than the other way.)