Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pooling Strategy Question

See original GitHub issue

In the Original S-BERT paper, you mentioned

“Researchers have started to input individual sentences into BERT and to derive fixed size sentence embeddings. The most commonly used approach is to average the BERT output layer (known as BERT embeddings) or by using the output of the first token (the [CLS] token). As we will show, this common practice yields rather bad sentence embeddings, often worse than averaging GloVe embeddings (Pennington et al., 2014).”

When you say the average the BERT output layer do you mean the average pooling of the last layer's hidden state? if so how is it different from what Sentence Transformer does? Doesn’t Sentence Transformers by default use average (mean) pooling on tokens of the last layer’s hidden state (and optionally supports max pooling and CLS pooling if I am not wrong. )

So I am confused when you say before sentence transformers,

researchers used to get fixed dimensions by commonly used approach is to average the BERT output layer (known as BERT embeddings)

Also, the BERT author hints (below) at the average pooling of word (or token) embeddings may not yield a good sentence embedding. All along I was under the impression this is what sentence transformed mean pooling does to get sentence embeddings.

68367897-b7837780-0171-11ea-8047-89cfc89184d8

Could you please clarify?

Issue Analytics

State:
Created a year ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

nreimerscommented, Jul 12, 2022

Not sure if I get the question. The output of BERT is averaged. But BERT needs fine-tuning on suitable data to produce meaningful text embeddings.

0reactions

PrithivirajDamodarancommented, Jul 12, 2022

Understood, thanks

Read more comments on GitHub >

Top Results From Across the Web

Sample pooling strategies for SARS-CoV-2 detection - PMC

A main question about the pooling method is the possible impact on the accuracy of the results. qRT-PCR has an inherent bias itself,...

Pooling strategies for COVID-19 testing

Some simple pooling strategies. While the SARS-CoV-2 virus is new, the problem of testing individuals in a large population is not. Our story...

Whitepaper: Guidance on Pooling of Samples for SARS-CoV-2

Pooled testing strategies can increase the efficiency, speed, and positive predictive value of diagnostics for case identification.

Interim Guidance for Use of Pooling Procedures in SARS-CoV ...

Find guidance for laboratories on the use of pooling strategies for SARS-CoV-2 testing, how to interpret the results, and how to report the...

A review of pooled‐sample strategy: Does complexity lead to ...

A pool size of 10 is recommended in this door-to-door strategy. A similar study designed a simple questionnaire, which included questions ...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

cross encoder fp16 amp training: use_amp=True didn't work

Warnings coming from Huggingface-hub update