(OneVsOneClassifier) Not able to convert sklearn model using pipeline to ONNX format for real time inferencing
See original GitHub issueIt is a multi-class classification model with sklearn.
I am using OneVsOneClassifier
model to train and predict 150 intents
. Its a multi-class classification problem.
Data:
text intents
text1 int1
text2 int2
I convert these intents in labels using:
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.fit_transform(y_test)
Expectation:
Without changing the training pipeline or parameters, note the inference time. Currently, it’s slow, ~1second for 1 inference. So to convert pipeline to ONNX
format and then use for inferencing on 1 example.
Code:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC,LinearSVC
def create_pipe(clf):
# Each pipeline uses the same column transformer.
column_trans = ColumnTransformer(
[('Text', TfidfVectorizer(), 'text')
],
remainder='drop')
pipeline = Pipeline([('prep',column_trans),
('clf', clf)])
return pipeline
def fit_and_print(pipeline):
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(metrics.classification_report(y_test, y_pred,
target_names=le.classes_,
digits=3))
clf = OneVsOneClassifier(LinearSVC(random_state=42, class_weight='balanced'))
pipeline = create_pipe(clf)
%time fit_and_print(pipeline)
# convert input to df
def create_test_data(x):
d = {'text' : x}
df = pd.DataFrame(d, index=[0])
return df
revs=[]
for idx in [948, 5717, 458]:
cur = test.loc[idx, 'text']
revs.append(cur)
print(revs)
revs=sam['text'].values
%%time
for rev in revs:
c_res = pipeline.predict(create_test_data(rev))
print(rev, '=', labels[c_res[0]])
ONNX conversion code
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, StringTensorType
initial_type = [('UTTERANCE', StringTensorType([None, 2]))]
model_onnx = convert_sklearn(pipeline, initial_types=initial_type)
Error
MissingShapeCalculator: Unable to find a shape calculator for type '<class 'sklearn.multiclass.OneVsOneClassifier'>'.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.
How to resolve this ? Also how to do prediction after converting to ONNX format?
Issue Analytics
- State:
- Created a year ago
- Comments:21
Top Results From Across the Web
How to convert sklearn model using pipeline to ONNX format ...
It is a multi-class classification model with sklearn. I am using OneVsOneClassifier model to train and predict 150 intents .
Read more >Supported scikit-learn Models - ONNX
Name Package Supported
ARDRegression linear_model Yes
AdaBoostClassifier ensemble Yes
AdaBoostRegressor ensemble Yes
Read more >scikit-learn user guide
1.2.22 Why does Scikit-learn not directly work with, for example, pan- ... onnxmltools Serializes many Scikit-learn pipelines to ONNX for ...
Read more >NLP Transformers pipelines with ONNX - Towards Data Science
How to build real-world NLP applications with ONNX, not just for ... measuring the inference time of the pipeline with the ONNX model...
Read more >cuML API Reference — cuml 22.10.00 documentation
Partitions device data into four collated objects, mimicking Scikit-learn's train_test_split. Parameters. Xcudf.DataFrame or cuda_array_interface compliant ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@pratikchhapolika done. please check.
Hi @xiaowuhu , could you please delete the X_train data from comment above. As it might violate some of my company’s privacy.