Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CountVectorizer implementation

See original GitHub issue

The existing CountVectorizer code has jit things such as in the forward function

doc_ids = torch.jit.annotate(List[Tensor], [])  # noqa: F821

which we need to do a bit of a work around so that it doesn’t fail at

  File "/root/hummingbird/hummingbird/ml/_container.py", line 63, in forward
    raise RuntimeError("Inputer tensor {} of not supported type {}".format(input_name, type(inputs[i])))

because it’s not a tensor

See this branch

Issue Analytics

State:
Created 3 years ago
Comments:6

Top GitHub Comments

1reaction

interesaaatcommented, Sep 14, 2020

I made no changes, let me delete mine then since it is not used.

1reaction

ksaurcommented, Sep 14, 2020

Hi @hemantr05,

For issue #164, there are two parts:

In #293, you are working on the first half tf-idf. This issue is challenging and non-trivial for sure!!
For the second part there is CountVectorizer (this issue). As mentioned in #164, we have some internal code already for CountVectorizer that was a bit more time-consuming to integrate, which I can definitely post in the future!

We really appreciate your enthusiasm!! If you finish your current two issues (#293 and #273) you can get started on this third one! 😃 Let me know if you have questions or would like to change which issue you focus on! Thanks again!

Top Results From Across the Web

Implementing CountVectorizer from Scratch in Python Exclusive

Lets cross check our implementation with the sklearn inbuilt CountVectorizer itself. Checking the Sklearn CountVectorizer(). Looks like we have ...

Using CountVectorizer to Extracting Features from Text

CountVectorizer creates a matrix in which each unique word is represented by a column of ... Code: Python implementation of CountVectorizer.

sklearn.feature_extraction.text.CountVectorizer

Convert a collection of text documents to a matrix of token counts. This implementation produces a sparse representation of the counts using scipy.sparse....

Basics of CountVectorizer | by Pratyaksh Jain

Countvectorizer makes it easy for text data to be used directly in machine learning and deep learning models such as text classification. Let's...

CountVectorizer in Python - Educative.io

Scikit-learn's CountVectorizer is used to convert a collection of text documents to a vector of term/token counts. It also enables the pre-processing of ......