question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CountVectorizer implementation

See original GitHub issue

The existing CountVectorizer code has jit things such as in the forward function

doc_ids = torch.jit.annotate(List[Tensor], [])  # noqa: F821

which we need to do a bit of a work around so that it doesn’t fail at

  File "/root/hummingbird/hummingbird/ml/_container.py", line 63, in forward
    raise RuntimeError("Inputer tensor {} of not supported type {}".format(input_name, type(inputs[i])))

because it’s not a tensor

See this branch

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
interesaaatcommented, Sep 14, 2020

I made no changes, let me delete mine then since it is not used.

1reaction
ksaurcommented, Sep 14, 2020

Hi @hemantr05,

For issue #164, there are two parts:

  • In #293, you are working on the first half tf-idf. This issue is challenging and non-trivial for sure!!
  • For the second part there is CountVectorizer (this issue). As mentioned in #164, we have some internal code already for CountVectorizer that was a bit more time-consuming to integrate, which I can definitely post in the future!

We really appreciate your enthusiasm!! If you finish your current two issues (#293 and #273) you can get started on this third one! 😃 Let me know if you have questions or would like to change which issue you focus on! Thanks again!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Implementing CountVectorizer from Scratch in Python Exclusive
Lets cross check our implementation with the sklearn inbuilt CountVectorizer itself. Checking the Sklearn CountVectorizer(). Looks like we have ...
Read more >
Using CountVectorizer to Extracting Features from Text
CountVectorizer creates a matrix in which each unique word is represented by a column of ... Code: Python implementation of CountVectorizer.
Read more >
sklearn.feature_extraction.text.CountVectorizer
Convert a collection of text documents to a matrix of token counts. This implementation produces a sparse representation of the counts using scipy.sparse....
Read more >
Basics of CountVectorizer | by Pratyaksh Jain
Countvectorizer makes it easy for text data to be used directly in machine learning and deep learning models such as text classification. Let's...
Read more >
CountVectorizer in Python - Educative.io
Scikit-learn's CountVectorizer is used to convert a collection of text documents to a vector of term/token counts. It also enables the ​pre-processing of ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found