How to finetune on customized dataset?
See original GitHub issueHi, thanks for this beautiful work!
I have a coreference dataset that has format
Text, Pronoun, Pronoun-offset, A, A-offset, A-coref, B, B-offset, B-coref
Example:
Tom went out and bought a book. He found it very interesting.
Then, ‘Tom’ is the Pronoun, A is ‘He’, A-coref is True, B is ‘it’, B-coref is False, off-sets indicates each token’s position.
Can I use HMTL to finetune on such dataset?
If so, how should I modify the json file, dataset reader?
If not, how can I use HMTL to predict the conference pair of the Pronoun
in each text? Like predict which word in the text corefers the Pronoun
?
I appreciate your help!
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Fine-tuning with custom datasets - Hugging Face
The data and model are both ready to go. You can train the model with Trainer / TFTrainer exactly as in the sequence...
Read more >Fine-Tuning Hugging Face Model with Custom Dataset
End-to-end example to explain how to fine-tune the Hugging Face model with a custom dataset using TensorFlow and Keras.
Read more >Fine Tuning YOLOv7 on Custom Dataset - LearnOpenCV
In this blog post, we are fine tuning YOLOv7 object detection model on a custom dataset to detect pot holes on roads in...
Read more >CNN Fine-tuning on Custom Dataset · Tensorflow 101 (sjchoi86)
import os import numpy as np import tensorflow as tf import matplotlib.pyplot as plt import scipy.misc import scipy.io from tensorflow.examples.tutorials.mnist ...
Read more >Tutorial: Fine-Tuning a Model on Your Own Data - Haystack
Once you have collected training data, you can fine-tune your base models. We initialize a reader as a base model and fine-tune it...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey!
Thanks for your interest!
I’ve experimenting using transfer learning to fine tune on another coreference dataset, so yes, it should be possible. I think you have two choices here: 1/ You convert your dataset format to a CoNLL20120-coreference format. It should be feasible if I understand correctly that the offsets are the token positions. Then it is pretty much just replacing the dataset paths in the config files and call
fine_tune.py
(and probably some minor parameters to add such as the pre-trained weights to fine-tune). 2/ You implement aDatasetReader
(seehmtl/dataset_readers
) that support your dataset format and modify the config file to call this specific dataset reader type (and then callfine_tune.py
, it should be seamless once you add another dataset reader).Victor
I am not sure to fully understand what you dataset format is and more specifically what does
A-coref=False
andB-coref=False
mean… From what I see, when there are no coreference links, there should not be any cluster id in a conll formatted file, and thuscanonical_clusters
in thedataset reader
should be an empty list, and thengold_clusters
should also be an empty list. But naturally, you want to train your model on these “negative” examples so it is able to detect when there are no coreference links.