question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

fill-mask target for full words not enabled?

See original GitHub issue

System Info

- `transformers` version: 4.19.2
- Platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Huggingface_hub version: 0.6.0
- PyTorch version (GPU?): 1.11.0+cu113 (False)
- Tensorflow version (GPU?): 2.8.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

@Narsil and @LysandreJik (?) How can one use Roberta for fill-mask to get the full word candidate and its “full” score for Roberta-large? Open to workaround solutions.

My example: sentence = f"Nitzsch argues against the doctrine of the annihilation of the wicked, regards the teaching of Scripture about eternal {nlp.tokenizer.mask_token} as hypothetical."
Notebook here.

Using pipeline, the output I get is: The specified target token damnationdoes not exist in the model vocabulary. Replacing withĠdamn.

Thanks.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

See notebook above.

Expected behavior

I expect to see "damnation" with its score.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
i-am-neocommented, May 23, 2022

I hear you @Narsil, it sure is non-trivial.

In my case, I would like a large-enough LM (for example, Roberta-large) to generate word candidates to start with, given some regex as hints/constraints, without knowing in advance what the best candidates are, except for those hints. My thinking is that the candidates the LM generates would more or less already fit into the context given to the model. Multiple candidates would be ranked post-fill by their scores.

Re zero-shot-classification, the trouble is without knowing in advance what the correct/best candidates are, it’s more difficult to work it in.

0reactions
github-actions[bot]commented, Jun 20, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is Fill-Mask? - Hugging Face
Fill-Mask. Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those...
Read more >
Using huggingface fill-mask pipeline to get the "score" for a ...
I've been using huggingface to make predictions for masked tokens and it works great. I noticed that for each prediction it gives a...
Read more >
A complete tutorial on masked language modelling using BERT
Masked image modelling is a way to perform word prediction that was originally hidden intentionally in a sentence.
Read more >
Masked-Language Modeling With BERT | by James Briggs
BERT may not know what Autumn, trees, and leaves are — but it does know that given linguistic patterns, and the context of...
Read more >
Negation in the brain: Modulating action representations
(2005) did not report the length of their sentence conditions, ... in the context of a fill-mask task that hides target words from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found