Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IOB-Format

See original GitHub issue

Hello and thank very much for the new TNER-Version. 😃

It seems like the old syntax/bibs (e.g. “from tner import TrainTransformersNER”) have been discontinued in the new release? (btw: you have 4 Colab Notepad-Files on your homepage that still reference these and they are also not working anymore).

I tried calling the new GridSearcher-Syntax, with my old local dataset in IOB-Format (train.txt & valid.txt), which worked fine usind the previous TNER-Version.

This O is O the O first O Entity B-SOMETHING . 0

This crashes with error message “JSONDecodeError: Expecting value: line 1 column 1 (char 0)” because the program is looking for the label-file which isnt present. So is IOB (or BIO) no longer supported and i have to convert my data into your json-format?

Thanks, Jan

searcher = GridSearcher( checkpoint_dir=‘./ckpt_tner’, dataset=“data/iob”, # either of dataset (huggingface dataset) or local_dataset (custom dataset) should be given model=“roberta-large”, # language model to fine-tune epoch=10, # the total epoch (L in the figure) epoch_partial=5, # the number of epoch at 1st stage (M in the figure) n_max_config=3, # the number of models to pass to 2nd stage (K in the figure) batch_size=16, gradient_accumulation_steps=[4, 8], crf=[True, False], lr=[1e-4, 1e-5], weight_decay=[None, 1e-7], random_seed=[42, 442], lr_warmup_step_ratio=[None, 0.1], max_grad_norm=[None, 10]
) searcher.train()

Issue Analytics

State:
Created a year ago
Comments:12 (8 by maintainers)

Top GitHub Comments

1reaction

asahi417commented, Aug 15, 2022

I noticed that the stopword (and the stoptags) were a legacy code from the previous version, and nothing to do with the latest TNER so I just removed it.

0reactions

asahi417commented, Sep 28, 2022

We confirmed that the IOB formatting was solved but there is another issue, which has nothing to do with the format, so I’ll close this.

Top Results From Across the Web

Inside–outside–beginning (tagging) - Wikipedia

The IOB format (short for inside, outside, beginning), also commonly referred to as the BIO format, is a common tagging format for tagging...

NLP | IOB tags - GeeksforGeeks

What are IOB tags? It is a format for chunks. These tags are similar to part-of-speech tags but can denote the inside, outside,...

BIO / IOB Tagged Text to Original Text | by Jeril Kuriakose

In this post we will see how to convert BIO tagged text to original text. The BIO / IOB format (short for inside,...

Input/Output Block (IOB) Fields - IBM

Input/Output Block (IOB) Format. IOB Format. IOBFLAG1 (1 byte): Set bit positions 0, 1, 6, and 7. One-bits in positions 0 and 1...

Difference between IOB and IOB2 format?

IOB: Here, I is used for a token inside a chunk, O is used for a token outside a chunk and B is...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

IOB-Format

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Add Mac M1 Support

Offsets returned by model.predict are not usable if there is whitespace in the text.