question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

(OneVsOneClassifier) Not able to convert sklearn model using pipeline to ONNX format for real time inferencing

See original GitHub issue

It is a multi-class classification model with sklearn.

I am using OneVsOneClassifier model to train and predict 150 intents. Its a multi-class classification problem.

Data:

text          intents

text1         int1
text2         int2

I convert these intents in labels using:

le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.fit_transform(y_test)

Expectation:

Without changing the training pipeline or parameters, note the inference time. Currently, it’s slow, ~1second for 1 inference. So to convert pipeline to ONNX format and then use for inferencing on 1 example.

Code:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC,LinearSVC

def create_pipe(clf):
    
    # Each pipeline uses the same column transformer.  
    column_trans = ColumnTransformer(
            [('Text', TfidfVectorizer(), 'text')
             ],
            remainder='drop') 
    
    pipeline = Pipeline([('prep',column_trans),                     
                         ('clf', clf)])
     
    return pipeline

def fit_and_print(pipeline):
    
    pipeline.fit(X_train, y_train)
    y_pred = pipeline.predict(X_test)

    print(metrics.classification_report(y_test, y_pred, 
                                        target_names=le.classes_, 
                                        digits=3))
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
pipeline = create_pipe(clf)
%time fit_and_print(pipeline)

# convert input to df

def create_test_data(x):
    d = {'text' : x}
    df = pd.DataFrame(d, index=[0])
    return df

revs=[]
for idx in [948, 5717, 458]:
     cur = test.loc[idx, 'text']
     revs.append(cur)
print(revs) 

revs=sam['text'].values

%%time
for rev in revs:
    c_res = pipeline.predict(create_test_data(rev))
    print(rev, '=', labels[c_res[0]])

ONNX conversion code

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, StringTensorType

initial_type = [('UTTERANCE', StringTensorType([None, 2]))]
model_onnx = convert_sklearn(pipeline, initial_types=initial_type)

Error

MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.multiclass.OneVsOneClassifier'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.

How to resolve this ? Also how to do prediction after converting to ONNX format?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:21

github_iconTop GitHub Comments

1reaction
xiaowuhucommented, Sep 9, 2022

@pratikchhapolika done. please check.

0reactions
pratikchhapolikacommented, Sep 9, 2022

@pratikchhapolika

sorry to ask for the data format of X_train and others. Is it looks like pandas DataFrame:


or just a python list?

my goal is to make below code work:

clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
column_trans = ColumnTransformer([('Text', TfidfVectorizer(), 'UTTERANCE')], remainder='drop')   
pipeline = Pipeline([('prep', column_trans), ('clf', clf)])   
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(y_pred)
return clf

Hi @xiaowuhu , could you please delete the X_train data from comment above. As it might violate some of my company’s privacy.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to convert sklearn model using pipeline to ONNX format ...
It is a multi-class classification model with sklearn. I am using OneVsOneClassifier model to train and predict 150 intents .
Read more >
Supported scikit-learn Models - ONNX
Name Package Supported ARDRegression linear_model Yes AdaBoostClassifier ensemble Yes AdaBoostRegressor ensemble Yes
Read more >
scikit-learn user guide
1.2.22 Why does Scikit-learn not directly work with, for example, pan- ... onnxmltools Serializes many Scikit-learn pipelines to ONNX for ...
Read more >
NLP Transformers pipelines with ONNX - Towards Data Science
How to build real-world NLP applications with ONNX, not just for ... measuring the inference time of the pipeline with the ONNX model...
Read more >
cuML API Reference — cuml 22.10.00 documentation
Partitions device data into four collated objects, mimicking Scikit-learn's train_test_split. Parameters. Xcudf.DataFrame or cuda_array_interface compliant ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found