Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MiniLM: releasing all models?

See original GitHub issue

Hi there,

First of all: great work on distilling a strong teacher into a well performing student and eliminating the issue of parameter size discrepancy in teacher-student models! I am always happy to see smaller, usable models.

I was wondering if you plan to release the Small MiniLM model (L6xH384). It says We release the uncased 12-layer and 6-layer MiniLM models with 384 hidden size [...], but I can only find the link to the 12-layer model.

Thanks so much

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

3reactions

nreimerscommented, Jun 9, 2021

Hi @WenhuiWang0824

great, thank you for the great work and releasing the model. The MiniLMv1 models work great for bi-encoders and cross-encoders, so I’m eager to test the v2 models.

It would be great if the models could also be added to the huggingface model hub: https://huggingface.co/microsoft

This would make it easy to load and use the models.

Let me know if you need help putting the models on the hub.

2reactions

wenhui0924commented, Jun 9, 2021

Hi @volker42maru and @maksymbevza,

We have released the monolingual and multilingual minilmv2 models distilled from different teachers. Please find the model links in the MiniLM folder.

Thanks

Top Results From Across the Web

README.md · sentence-transformers/all-MiniLM-L6-v2 at main

We're on a journey to advance and democratize artificial intelligence through open source and open science.

MiniLM: Deep Self-Attention Distillation for Task ... - arXiv

Comparison between the publicly released 6-layer models with 768 hidden size distilled from BERTBASE. We compare task-.

MINILM: Deep Self-Attention Distillation for ... - NIPS papers

Table 2: Comparison between the publicly released 6-layer models with 768 hidden size distilled from BERTBASE. We compare task-agnostic distilled models without ...

Compatible third party NLP models - Elastic

The Elastic Stack machine learning features support transformer models that ... All MiniLM L12 v2 Suitable similarity functions: dot_product , cosine , ...

https://raw.githubusercontent.com/microsoft/unilm/...

[Model Release] September, 2022: [**BEiT ... Both English and multilingual MiniLM models are released. "[MiniLMv2: Multi-Head Self-Attention ...