IOB-Format
See original GitHub issueHello and thank very much for the new TNER-Version. 😃
It seems like the old syntax/bibs (e.g. “from tner import TrainTransformersNER”) have been discontinued in the new release? (btw: you have 4 Colab Notepad-Files on your homepage that still reference these and they are also not working anymore).
I tried calling the new GridSearcher-Syntax, with my old local dataset in IOB-Format (train.txt & valid.txt), which worked fine usind the previous TNER-Version.
This O is O the O first O Entity B-SOMETHING . 0
This crashes with error message “JSONDecodeError: Expecting value: line 1 column 1 (char 0)” because the program is looking for the label-file which isnt present. So is IOB (or BIO) no longer supported and i have to convert my data into your json-format?
Thanks, Jan
searcher = GridSearcher(
checkpoint_dir=‘./ckpt_tner’,
dataset=“data/iob”, # either of dataset
(huggingface dataset) or local_dataset
(custom dataset) should be given
model=“roberta-large”, # language model to fine-tune
epoch=10, # the total epoch (L
in the figure)
epoch_partial=5, # the number of epoch at 1st stage (M
in the figure)
n_max_config=3, # the number of models to pass to 2nd stage (K
in the figure)
batch_size=16,
gradient_accumulation_steps=[4, 8],
crf=[True, False],
lr=[1e-4, 1e-5],
weight_decay=[None, 1e-7],
random_seed=[42, 442],
lr_warmup_step_ratio=[None, 0.1],
max_grad_norm=[None, 10]
)
searcher.train()
Issue Analytics
- State:
- Created a year ago
- Comments:12 (8 by maintainers)
I noticed that the stopword (and the stoptags) were a legacy code from the previous version, and nothing to do with the latest TNER so I just removed it.
We confirmed that the IOB formatting was solved but there is another issue, which has nothing to do with the format, so I’ll close this.