question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inserting special tokens

See original GitHub issue

Say I want to insert special tokens into a piece of text to help the model distinguish certain features.

With BERT, I can use tokens like [unused0], [unused1], .... Are there similar tokens I can use with DeCLUTR?

For context, I’m using special tokens to delineate the boundaries of named entity mentions. For example, inserting special tokens into

Jim bought 300 shares of Acme Corp. in 2006.

yields

Jim bought 300 shares of [unused0] Acme Corp. [unused1] in 2006.

How can I do this with DeCLUTR? Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
JohnGiorgicommented, Mar 20, 2021

I can’t think of why it would, but I would double-check to be sure! Might be worth it to dig into the resize_token_embeddings function.

0reactions
david-wbcommented, Mar 20, 2021

Oh so resizing the model input embeddings would not force a re-train from scratch? If so I will likely try that. Thank you.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Utilities for Tokenizers - Hugging Face
Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding special tokens using the...
Read more >
To Insert Special Tokens - PTC Support
To Insert Special Tokens. 1. In the Text Properties dialog box, under Text, click on the Insert special token drop down menu. The...
Read more >
Adding special tokens | Python - DataCamp
To add these special tokens, you will use the Python string.join() function. string.join() joins a list of strings to a single string using...
Read more >
How to add new special token to the tokenizer? - Stack Overflow
1 Answer 1 · i have added [EOT] token to the tokenizer using add_tokens. then i added [EOT] in data after every turn....
Read more >
How to add some new special tokens to a pretrained tokenizer?
All I want to do is add all standard special tokens in case they aren't there e.g. <s> sep token is not in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found