question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to finetune on customized dataset?

See original GitHub issue

Hi, thanks for this beautiful work!

I have a coreference dataset that has format Text, Pronoun, Pronoun-offset, A, A-offset, A-coref, B, B-offset, B-coref

Example:

Tom went out and bought a book. He found it very interesting.

Then, ‘Tom’ is the Pronoun, A is ‘He’, A-coref is True, B is ‘it’, B-coref is False, off-sets indicates each token’s position.

Can I use HMTL to finetune on such dataset? If so, how should I modify the json file, dataset reader? If not, how can I use HMTL to predict the conference pair of the Pronoun in each text? Like predict which word in the text corefers the Pronoun?

I appreciate your help!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
VictorSanhcommented, Feb 13, 2019

Hey!

Thanks for your interest!

I’ve experimenting using transfer learning to fine tune on another coreference dataset, so yes, it should be possible. I think you have two choices here: 1/ You convert your dataset format to a CoNLL20120-coreference format. It should be feasible if I understand correctly that the offsets are the token positions. Then it is pretty much just replacing the dataset paths in the config files and call fine_tune.py (and probably some minor parameters to add such as the pre-trained weights to fine-tune). 2/ You implement a DatasetReader (see hmtl/dataset_readers) that support your dataset format and modify the config file to call this specific dataset reader type (and then call fine_tune.py, it should be seamless once you add another dataset reader).

Victor

0reactions
VictorSanhcommented, Feb 23, 2019

I am not sure to fully understand what you dataset format is and more specifically what does A-coref=False and B-coref=False mean… From what I see, when there are no coreference links, there should not be any cluster id in a conll formatted file, and thus canonical_clusters in the dataset reader should be an empty list, and then gold_clusters should also be an empty list. But naturally, you want to train your model on these “negative” examples so it is able to detect when there are no coreference links.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fine-tuning with custom datasets - Hugging Face
The data and model are both ready to go. You can train the model with Trainer / TFTrainer exactly as in the sequence...
Read more >
Fine-Tuning Hugging Face Model with Custom Dataset
End-to-end example to explain how to fine-tune the Hugging Face model with a custom dataset using TensorFlow and Keras.
Read more >
Fine Tuning YOLOv7 on Custom Dataset - LearnOpenCV
In this blog post, we are fine tuning YOLOv7 object detection model on a custom dataset to detect pot holes on roads in...
Read more >
CNN Fine-tuning on Custom Dataset · Tensorflow 101 (sjchoi86)
import os import numpy as np import tensorflow as tf import matplotlib.pyplot as plt import scipy.misc import scipy.io from tensorflow.examples.tutorials.mnist ...
Read more >
Tutorial: Fine-Tuning a Model on Your Own Data - Haystack
Once you have collected training data, you can fine-tune your base models. We initialize a reader as a base model and fine-tune it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found