Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to finetune on customized dataset?

See original GitHub issue

Hi, thanks for this beautiful work!

I have a coreference dataset that has format Text, Pronoun, Pronoun-offset, A, A-offset, A-coref, B, B-offset, B-coref

Example:

Tom went out and bought a book. He found it very interesting.

Then, ‘Tom’ is the Pronoun, A is ‘He’, A-coref is True, B is ‘it’, B-coref is False, off-sets indicates each token’s position.

Can I use HMTL to finetune on such dataset? If so, how should I modify the json file, dataset reader? If not, how can I use HMTL to predict the conference pair of the Pronoun in each text? Like predict which word in the text corefers the Pronoun?

I appreciate your help!

Issue Analytics

State:
Created 5 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

VictorSanhcommented, Feb 13, 2019

Hey!

Thanks for your interest!

I’ve experimenting using transfer learning to fine tune on another coreference dataset, so yes, it should be possible. I think you have two choices here: 1/ You convert your dataset format to a CoNLL20120-coreference format. It should be feasible if I understand correctly that the offsets are the token positions. Then it is pretty much just replacing the dataset paths in the config files and call fine_tune.py (and probably some minor parameters to add such as the pre-trained weights to fine-tune). 2/ You implement a DatasetReader (see hmtl/dataset_readers) that support your dataset format and modify the config file to call this specific dataset reader type (and then call fine_tune.py, it should be seamless once you add another dataset reader).

Victor

0reactions

VictorSanhcommented, Feb 23, 2019

I am not sure to fully understand what you dataset format is and more specifically what does A-coref=False and B-coref=False mean… From what I see, when there are no coreference links, there should not be any cluster id in a conll formatted file, and thus canonical_clusters in the dataset reader should be an empty list, and then gold_clusters should also be an empty list. But naturally, you want to train your model on these “negative” examples so it is able to detect when there are no coreference links.

Top Results From Across the Web

Fine-tuning with custom datasets - Hugging Face

The data and model are both ready to go. You can train the model with Trainer / TFTrainer exactly as in the sequence...

Fine-Tuning Hugging Face Model with Custom Dataset

End-to-end example to explain how to fine-tune the Hugging Face model with a custom dataset using TensorFlow and Keras.

Fine Tuning YOLOv7 on Custom Dataset - LearnOpenCV

In this blog post, we are fine tuning YOLOv7 object detection model on a custom dataset to detect pot holes on roads in...

CNN Fine-tuning on Custom Dataset · Tensorflow 101 (sjchoi86)

import os import numpy as np import tensorflow as tf import matplotlib.pyplot as plt import scipy.misc import scipy.io from tensorflow.examples.tutorials.mnist ...

Tutorial: Fine-Tuning a Model on Your Own Data - Haystack

Once you have collected training data, you can fine-tune your base models. We initialize a reader as a base model and fine-tune it...