Converted version outputs index of class instead of class
See original GitHub issueIf I train a classifier with non-consecutive numbers for classes, the resulting converted code (C in my case) will not output the classes but the index of the class. In my case I simply don’t have an example for class 1
in all cases, so the classifier will not know this class exists. This creates discrepancies between Python and C.
from sklearn.ensemble import RandomForestClassifier
# linear mapping: x->x
# NB: my goal is not regression, this is just an example
x_train = np.repeat([0,1,2,3,4,5], 100).reshape([-1,1])
y_train = np.repeat([0,1,2,3,4,5], 100)
# however, class 1 is missing in training!
x_train = x_train[y_train!=1]
y_train = y_train[y_train!=1]
clf = RandomForestClassifier().fit(x_train, y_train)
# convert it
code = m2cgen.export_to_c(clf)
result = clf.predict(np.atleast_2d([0,1,2,3,4,5]).T)
# result =[0,0,2,3,4,5]
Calling it in C will give different results
# Pseudocode for C
double result[5] = score([0,1,2,3,4,5])
#result = [0,0,1,2,3,4]
Do you think there is any feasible way to keep original class label?
(see also https://github.com/nok/sklearn-porter/issues/37 having the same problem)
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
String (Java Platform SE 8 ) - Oracle Help Center
Index values refer to char code units, so a supplementary character uses two positions in a String . The String class provides methods...
Read more >3. Data model — Python 3.11.1 documentation
If a class attribute is found that is a user-defined function object, it is transformed into an instance method object whose __self__ attribute...
Read more >as.data.frame: Coerce to a Data Frame - Rdrr.io
Character variables are converted to factor columns unless protected by I . If a data frame is supplied, all classes preceding "data.frame" are...
Read more >Convert pandas dataframe to NumPy array - Stack Overflow
24.0 introduced two new methods for obtaining NumPy arrays from pandas objects: to_numpy() , which is defined on Index , Series , and...
Read more >Character array - MATLAB - MathWorks
C = char( A ) converts the input array, A , to a character array. ... C — Output array character array ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
by the way, here is the final code wrapper that I came up with
Wrapper to keep class labels
@skjerns thanks for reporting this issue! This is indeed an interesting use case and the code generated by
m2cgen
in this scenario produces an array with class probabilities where classes are represented by their corresponding indexes in the original model object.We should think of how to address this properly. Meanwhile I can suggest you the following steps as a workaround:
extern
to link that constant:score
function to access corresponding labels like eg.Please let me know if the proposed solution worked for you.