question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model's _fit should accept Dataset also, not just BatchVectorizer

See original GitHub issue

_fit

Seems more natural for a model to fit on Dataset. Maybe better to use Union[artm.BatchVectorizer, topicnet.cooking_machine.Dataset] instead of just artm.BatchVectorizer (Union — for compatibility)?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
bt2901commented, May 25, 2020

I think you are moving the goalposts. We do not provide guarantees on _fit, but it does not forbid the user to use it. Making this method a bit more flexible does not change that.

Also, training a model without Cubes + Experiment overhead is exactly why one would consider using the method (e.g. for very dirty prototyping or perhaps for cases not covered by Cubes + Experiment yet).

0reactions
Alvantcommented, May 25, 2020

First, _fit is “protected” method, meaning we do not guarantee that it should work nice and easy for the user and that everything will work

Ok, but it doesn’t mean that we shouldn’t think about how to make the method better 🙂

Read more comments on GitHub >

github_iconTop Results From Across the Web

1. Loading Data: BatchVectorizer and Dictionary
Before starting modeling we need to convert you data in the library format. ... if it is not too big and you don't...
Read more >
Why Keras model.fit() is using whole dataset as a batch and ...
This way keras would feed data by batches. You should adjust batch size to ensure, that GPU's memory is enough. Usually batch size...
Read more >
TopicNet/dataset.py at master · machine-intelligence ... - GitHub
When working with any text collection `data_path` for the first time,. there is no such folder: it will be created by Dataset. batch_size...
Read more >
Beyond LDA: State-of-the-art Topic Models With BigARTM
Previously, we looked at the LDA (Latent Dirichlet Allocation) topic modeling library available within MLlib in PySpark. While LDA is a very ...
Read more >
Processing the data - Hugging Face Course
Of course, just training the model on two sentences is not going to yield very good results. To get better results, you will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found