question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A different way of doing the similarity/comparison task?

See original GitHub issue

Hey! Thanks for the awsome work. I was wondering if I could use and update finetune to do the following:

Instead of using (Start, Text1, Delim, Text2, Extract) and (Start, Text2, Delim, Text1, Extract) as in the paper, can we use (Start, Text1, Extract) and (Start, Text2, Extract) separately through the transformer?

This could be thought of as obtaining sentence/document embeddings for Text1 and Text2 separately. Upon doing that, I would like to compare their similarity using a distance metric such as cosine distance. (i.e. train the transformer as a siamese network.)

Would you suggest I build such a model on top of a fork of finetune?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
madisonmaycommented, Sep 17, 2018

Hey @chaitjo, glad you’ve been hacking away over the weekend! I’ve opened up a PR into development for you – don’t worry if things don’t run yet I’ll just use it as a space to leave code comments for now.

At a high level things look good, seems like there are just a few things to clean up. Perhaps most importantly, I think we can find a way around overriding many of the BaseModel methods by structuring things more like the comparison class (and maybe inheriting from that instead of the BaseModel class.)

As far as data goes, a sampled version of the Quora similarity dataset might be a good place to start. We’ve got some scripts in there for training already, but note that you’ll probably have to modify those scripts to convert the pandas Series passed as inputs to numpy arrays – that’s on my backlog to patch up. E.g.:

    dataset = QuoraDuplicate(nrows=100).dataframe
    model = Comparison(verbose=True, n_epochs=3, lm_warmup=0.1)
    trainX1, testX1, trainX2, testX2, trainY, testY = train_test_split(dataset.Text1, dataset.Text2, dataset.Target, test_size=0.3, random_state=42)
    model.fit(list(zip(trainX1.values, trainX2.values)), trainY.values)
    accuracy = np.mean(model.predict(list(zip(testX1.values, testX2.values))) == testY.values)
    class_balance = np.mean(testY.values)
    print('Test Accuracy: {:0.2f} for a {:0.2f} class balance'.format(accuracy, class_balance))
1reaction
chaitjocommented, Sep 5, 2018

Thanks for the response! Will keep you posted on the progress.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Comparing and Contrasting - UNC Writing Center
This handout will help you determine if an assignment is asking for comparing and contrasting, generate similarities and differences, and decide a focus....
Read more >
Tools to Compare and Contrast: Some Alternatives to the You ...
When you are trying to compare and contrast several things, a matrix chart is really helpful. It's basically like a spreadsheet, with several...
Read more >
Compare and Contrast | English Composition 1
Compare and contrast is a rhetorical style that discusses the similarities and differences of two or more things: ideas, concepts, items, places, etc....
Read more >
Activities for Identifying Similarities and Differences
Pickering, and Jane Pollock present four “forms” of identifying similarities and differences: comparing, classifying, creating metaphors, and creating analogies ...
Read more >
Writing for Success: Compare/Contrast | English Composition 1
The key to a good compare-and-contrast essay is to choose two or more subjects that connect in a meaningful way. The purpose of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found