question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Advice/Seeking Help] Very Low LRAP Score

See original GitHub issue

Hi,

I’m currently training the multi-label classifier using XL-Net, to classify questions -> to concepts (for an online-learning platform). There’s ~1.7k concepts and 15k+ questions.

here’s the model: model = MultiLabelClassificationModel('xlnet', 'xlnet-base-cased', num_labels=len(lo_id_cols), args={'train_batch_size':8, 'gradient_accumulation_steps':16, 'learning_rate': 3e-5, 'num_train_epochs': 3, 'max_seq_length': 512, "fp16": False, 'overwrite_output_dir': True})

This is my final output: {'LRAP': 0.004577569729953515, 'eval_loss': 0.27328102769863666}

I’m not sure what i’m doing wrong to have such a low LRAP. Would it be possible to have the model train to improve the LRAP ? any help/advice would be appreciated.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
ThilinaRajapaksecommented, Jul 20, 2020

Keep in mind that Transformer models are not good at extreme multiclass/multilabel classification. Even with weights, 1700 classes might just be too many. You can try to model the problem as a hierarchical classification to circumvent this problem. For example, starting with 10 broad categories which further divide into subcategories. This too might not be straightforward in multilabel classification.

There is a new paper by Amazon that seems to tackle the problem but I haven’t looked at it properly yet.

1reaction
AbinayaM02commented, Jul 20, 2020

Calculate the number of datapoints tagged as 1 for each class individually and calculate the proportion for each class. For example, if there are 4 classes with [100, 500, 1000, 10] datapoints each, then class weights will be [10, 2, 1, 100]. In your case, if a single question is tagged with n classes, it will get accounted in each of the n classes individaully.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What Really Drives Advice Seeking Behaviour? Looking ...
Abstract. When faced with a broad range of justiciable problems, people seek advice for around half of them, and advice from lawyers on...
Read more >
sklearn.metrics.label_ranking_average_precision_score
Label ranking average precision (LRAP) is the average over each ground truth label assigned to ... of the ratio of true vs. total...
Read more >
Proposing a Mechanistic Model of Clinician Training and ...
In this paper, we present the Longitudinal Education for Advancing Practice (LEAP) model designed to help span this gap. The LEAP model is...
Read more >
What ACEs/PCEs do you have?
Think of it as a cholesterol score for childhood toxic stress. You get one point for each type of trauma. The higher your...
Read more >
Introduction and Advice Seeking: Planning 1st Legal Grow and lots ...
Surprisingly I got decent quality in small quantity before upgrading to my first HID after 2 runs with my nigrig. With a new...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found