Sklearn-Pandas trained ML Pipeline - online prediction on GCP ?
See original GitHub issueHi Team,
I have a very basic requirement on Sklearn Pandas ML pipeline but I am not able to find any clear answer to this problem any where on the documentations.
Let’s say I have an Sklearn pipeline built using Pandas dataframe as input, with pandas functions as transformations etc. So obviously this pipeline will expect a pandas - dataframe during prediction. Does GCMLE already have capability to convert the key value paired JSON payload to Panda-DF during online prediction mode on CMLE ?
Example code:
#Transformer functions
def select_col_df(df, cols, iscatego):
if iscatego == True:
return df[cols]
else:
return df[[cols]]
def calc_grminusirbyvpatd(df):
df['grminusirbyvpatd'] = ( df['TOTGRQTY'] - df['TOTIRQTY'] ) / df['VPATD']
return(df[['grminusirbyvpatd']])
def calc_difgrirdbytotgrqty(df):
def apply_trans(df):
if not df['TOTGRQTY'] == 0:
return ( df['DIFGRIRD'] / df['TOTGRQTY'] )
else:
return 0
df['difgrirdbytotgrqty'] = df.apply(apply_trans, axis = 1)
return(df[['difgrirdbytotgrqty']])
#1. Hash convert categorical columns
col_pipe = {}
for c_ in X_train.columns:
if X_train[c_].dtype == 'object':
# col_pipe[c_] = Pipeline([
# ('column_selector', CatColSelector(key=c_)),
# ('column_oh', CustomLabelBinarizer())
# ])
col_pipe[c_] = Pipeline([
('col_sel', FunctionTransformer(select_col_df,kw_args={'cols': c_, 'iscatego': True},
validate=False)),
('col_hash', FeatureHasher(n_features=10,input_type='string'))
])
#2. Also add numerical columns into pipeline
for c_ in X_train.columns:
if not X_train[c_].dtype == 'object':
col_pipe[c_] = Pipeline([
('col_sel', FunctionTransformer(select_col_df,kw_args={'cols': c_, 'iscatego': False},
validate=False)),
('std_scaler', StandardScaler())
])
#3. Create a few new columns: "grminusirbyvpatd, difgrirdbytotgrqty"
col_pipe['grminusirbyvpatd'] = Pipeline([
('col_new_1',FunctionTransformer(calc_grminusirbyvpatd, validate=False))
])
col_pipe['difgrirdbytotgrqty'] = Pipeline([
('col_new_2',FunctionTransformer(calc_difgrirdbytotgrqty, validate=False))
])
#4. Combine all features in col_pipe{} with FeatureUnion
feats = FeatureUnion([
(col_, col_pipe[col_]) for col_ in list(col_pipe.keys())
])
#5. Add ML algorithm in Pipeline
final_pipeline = Pipeline([
('features',feats),
('classifier', RandomForestClassifier(random_state = 42,n_estimators = 1000,
oob_score=True,n_jobs=-1,verbose=1)),
])
#6. Train ML model
print("Training ML model")
final_pipeline.fit(X_train, y_train)
Seems to be a very basic requirement if we talk about ML on Cloud, any pointers ?
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Getting online predictions with scikit-learn - AI Platform
The Pipeline module in scikit-learn enables you to apply multiple data transformations before training with an estimator. This encapsulates multiple steps in ...
Read more >Online predictions API using scikit-learn and Cloud Machine ...
This post will explain steps to train a model, store classifier on Google Cloud Storage and use Cloud Machine Learning to create an...
Read more >Scikit-learn Model Serving with Online Prediction Using Cloud ...
You can now upload a model you've already trained onto Google Cloud Storage and use ML Engine's online prediction service to support ...
Read more >Deploying Machine Learning Models on Google Cloud ...
Train on Kaggle; deploy on Google Cloud ... The deployment of a machine learning (ML) model to production starts with actually building the...
Read more >Vertex AI: Custom training job and prediction using managed ...
It assumes that you are familiar with Machine Learning even though the machine learning code for training is provided to you. You will...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@rafiqhasan
It’s been a while and I am not sure if you have already found a way around the potential issues of running pandas functions in a sklearn pipeline in prediction. AI Platform Prediction now offers more flexible ways of deploying sklearn pipelines for prediction: https://cloud.google.com/ml-engine/docs/scikit/custom-prediction-routines
There has been no response from submitter for 1/2 year.