question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Search-tag on labels, resets prior annotation on text classification hand labeling with multi_label=True

See original GitHub issue

It appears that the search function on labels in case of hand labeling - text classification with multiple labels clears all prior annotations on close. This creates a major bug, because it is not apparent immediately that the prior annotation labels have been reset since they are out of visible scope. The problems is even more pronounced if you are working with a large number of labels. Steps to reproduce: Create a DatasetForTextClassification with an array of records created using


records = []
for idx, row in df.iterrows():
    records.append(make_record(row))
dataset_rb = rb.DatasetForTextClassification(records)

def make_record(row):
  record = rb.TextClassificationRecord(
          text = row["text"],
          multi_label = True
  )
  return row

Assign a large amount of labels to the dataset


  settings = rb.TextClassificationSettings(label_schema=get_lots_of_labels())

  # apply settings to new or already existing dataset
  rb.configure_dataset("my_dataset_name", settings=settings)

  # logging to the newly created dataset triggers the validation checks
  rb.log(dataset_rb, "my_dataset_name")

Switch to the web app and try hand labeling, use the search on the labels (not the record) for toggling select, try a few search string and clear out search string after making selections, only the most recent labels maintain state, all prior label toggles get reset.

Appears to be a state management issue.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
frascuchoncommented, Sep 13, 2022

Thanks for reporting @dhruvsakalley

We will take a look at this problem as soon as possible

0reactions
dhruvsakalleycommented, Oct 24, 2022

Thanks for confirming, I would like to add that if you reset prior annotations without confirmation, it leads to the possibility of lost work. It might be useful to have an undo in case of accidents like these. Some tools like prodigy keep a track of last n actions in the session and commit as a separate step, which I find very useful as a quick way to go back and change a label based on a new observation or undo a mistake that happened, which makes the annotation flow faster.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Text Classification · Prodigy · An annotation tool for AI ...
A downloadable annotation tool for NLP and computer vision tasks such as named entity recognition, text classification, object detection, image segmentation ...
Read more >
No way to restrict text classification labels to exactly one label ...
No way to restrict text classification labels to exactly one label to assign #1191.
Read more >
Text Annotations using Label Studio and DAGSHub - (NLP
In this awesome tutorial we will explore how to do text annotations using DagsHub and Label Studio. We will be performing an NER...
Read more >
Finding BAD LABELS for TEXT CLASSIFICATION ... - YouTube
Prodigy is a modern annotation tool for collecting training data for machine learning models, developed by the makers of spaCy.
Read more >
Doccano — A Tool To Annotate Text Data To Train Custom ...
“an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence to sequence tasks.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found