question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing `fit` feature

See original GitHub issue

In any ML task, the assumption is that the test data are not available during training and just available in the prediction phase.

Assume someone wants to categorize reviews using tfidf + Naive Bayes. The required step would be the following ones:

  1. Split train and test
  2. Fit tfidf on the train part and generate (the transform part in scikit-learn) the tfidf values on the train part
  3. Train the model
  4. Generate the tf-idf values on the test part, this time using the already fitted model

The problem is that with the current implementation we don’t have any state (and that brings also many advantages such as simplicity). The tfidf functions do not return any already fitted model, rather the already transformed values.

We need to take a clear position wrt to this point. Having the exact same approach as scikit-learn would not probably make sense, still, we need to consider this fact. Opinions?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
mk2510commented, Sep 13, 2020

so far I didn’t start thinking about the missing fit feature, as fit is in general called on models, which can be fitted to your dataset (at least from my experience so far). In those cases, where we work with models - which can be fitted, like in ‘pca’ or ‘kmeans’ - I think, we can mention it on the getting started page, that we call everytime fit_transform and don’t provide the option to store the fitted pca model. But the way the API is designed, the user probably wouldn’t assume it anyways, I guess 😬 🙈 :octocat:

0reactions
harshraj-wadhwanicommented, May 19, 2022

@mk2510 Any plans to add fit method ? Or is there any way by which I can fit on train data, save to pickle file, and transform on test data ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Google Fit features missing : r/GooglePixel
Have a Pixel 5 and the options for heart rate and respiratory monitoring that used to appear in Google Fit are missing.
Read more >
Taming missing features at serving time
In the following simulation, I train a simple model that scales all features to have mean 0 and variance 1, impute any missing...
Read more >
Missing data
This function provides a useful summary of a dependent variable against explanatory variables. Despite its name, continuous variables are handled nicely.
Read more >
Photoshop is Missing "Fit-to-Canvas" Let's CREATE IT!
I Created the Missing " Fit -to-Canvas" Feature in Photoshop! Learn how to create a button that resizes huge images to fit the...
Read more >
Check for missing argument or incorrect argument data ...
I am trying to fit to arrays and I get the error 'Check for missing argument or incorrect argument data type in call...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found