question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Filling more than 1 masked token at a time

See original GitHub issue

I am able to use hugging face’s mask filling pipeline to predict 1 masked token in a sentence using the below:

!pip install -q transformers
from __future__ import print_function
import ipywidgets as widgets
from transformers import pipeline

nlp_fill = pipeline('fill-mask')
nlp_fill("I am going to guess <mask> in this sentence")

But does anyone have an opinion on what is the best way to do this if I want to predict 2 masked tokens? e.g. if the sentence is instead "I am going to <mask> <mask> in this sentence"?

If i try and put this exact sentence into nlp_fill I get the error “ValueError: only one element tensors can be converted to Python scalars” so it doesn’t work automatically.

Any help would be much appreciated!

Stack overflow question link

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:5
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
mitramir55commented, May 6, 2021

Hi, I’ve implemented right to left, left to right, and random mask filling in PyTorch for top k ids that the model thinks are the most probable tokens in a sentence in one of my projects. In this implementation, each time we want to generate a mask, the model looks at the previously generated sentences and decides what is the most probable for the next masked position. So if we have 2 masks in a sentence, by setting top_k=5, we’ll have 25 sentences (5 tokens for the first position, and for each of these 5 sentences with one mask we have another 5 tokens for the second mask). It’ll output something like this:(I used Persian models for this. I hope you can see how the masks are being filled) image Then in the next step, we implemented a beam search to choose the most probable sequence of all between all these sentences.

I’d be glad to help HuggingFace on this issue, I can send my code or send a pull request.

1reaction
LysandreJikcommented, Mar 31, 2021

Please see the following issue https://github.com/huggingface/transformers/issues/10158 and PR https://github.com/huggingface/transformers/pull/10222 for an attempt to take a crack at this

Read more comments on GitHub >

github_iconTop Results From Across the Web

Best way of using hugging face's Mask Filling for more than 1 ...
Best way of using hugging face's Mask Filling for more than 1 masked token at a time · Subscribe to RSS.
Read more >
Multiple Mask Tokens - Transformers - Hugging Face Forums
Is there a way to retrieve the probabilities of the words retrieved in the multiple masks? Since experimental support for multi-masking was ...
Read more >
Does BERT's Fill-Mask generate words with multiple tokens?
From tests I made using Hugging Face's pipeline, it seems that it only masks 1 token, despite a word being masked containing multiple...
Read more >
Unmasking BERT: The Key to Transformer Model Performance
Mask 15% of input tokens: Masking in BERT doesn't just mask one token. Instead, it randomly chooses 15% of the input tokens and...
Read more >
MASS: Masked Sequence to Sequence Pre-training for ... - arXiv
pre-training methods mentioned above, BERT is the most prominent one by ... randomly masks multiple tokens rather than just one token at a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found