Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to handle a pipeline for a hierarchical setup?

See original GitHub issue

Hi,

I’me currently dealing with the following problem. I have a hierarchy of labels in the form:

labels = {
"parent1": 
{"child1", "child2"}, 
"parent2": {"child1", "child2", "child3"}
}

My pipeline consists on a first parents classifier model and then I have a classifier for each parent’s child labels. I will put it as simple as I can:

class ParentLabelsModel(BaseEstimator, TransformerMixin):
    def __init__(self, base_estimator):
        TransformerMixin.__init__(self)
        BaseEstimator.__init__(self)
        self.base_estimator = base_estimator()

    def fit(self, X, y, *args):
        self.base_estimator.fit(X, y)

    def fit_transform(self, X, y, *args):
        self.base_estimator.fit(X, y)
        return X

    def transform(self, X, *args):
        y = self.base_estimator.predict(X)
        return X, y


class ChildLabelsModel(BaseEstimator, ClassifierMixin):
    def __init__(self, base_estimator, unique_parent_labels):
        BaseEstimator.__init__(self)
        self.unique_parent_labels = unique_parent_labels
        self.child_estimators = {p: base_estimator()
                                 for p in unique_parent_labels}

    def fit(self, X, y, y_, *args):
        X = X.toarray()
        for p in self.unique_parent_labels:
            child_indices = np.argwhere(np.array(y) == p).flatten()
            x_ = X[child_indices, :]
            y__ = y_[child_indices]
            self.child_estimators[p].fit(x_, y__)
        return self

    def predict(self, X, *args):
        parent_y = X[1][0]
        X = X[0].toarray()
        return (np.array([parent_y]),
                np.array([self.child_estimators[parent_y].predict(X)]))


pipeline = Pipeline([
    ("tfidf1", TfidfVectorizer()),
    ("plm", ParentLabelsModel(base_estimator=SGDClassifier)),
    ("clm", ChildLabelsModel(
        base_estimator=SGDClassifier,
        unique_parent_labels=np.unique(ytrain))
    )
])

As you can see, the first ParentsLabelsModel is a transformer, so that I can pass the predicted parent labels to select the specific child_estimator (there is one classifier per parent label).

I’ve been trying to convert both ParentLabelsModel and ChildLabelsModel but quite unsuccessfully. Before pasting the code i’ve written for the update_registered_converter functions, I would like to check if you have any opinion or guidelines to approach this conversion.

Thanks!

Issue Analytics

State:
Created 3 years ago
Comments:8

Top GitHub Comments

2reactions

xaduprecommented, Apr 19, 2021

I made a kind of proof of concept about the numpy API for ONNX. It works with decorators. I also wrote a tutorial about it. Here is a current transformer. Method transform is implemented with ONNX operators but the syntax is similar to numpy.

@onnxsklearn_class("onnx_transform", op_version=13)
class DecorrelateTransformerOnnx(TransformerMixin, BaseEstimator):
    def __init__(self, alpha=0.):
        BaseEstimator.__init__(self)
        TransformerMixin.__init__(self)
        self.alpha = alpha

    def fit(self, X, y=None, sample_weights=None):
        self.pca_ = PCA(X.shape[1])  # pylint: disable=W0201
        self.pca_.fit(X)
        return self

    def onnx_transform(self, X):
        if X.dtype is None:
            raise AssertionError("X.dtype cannot be None.")
        mean = self.pca_.mean_.astype(X.dtype)
        cmp = self.pca_.components_.T.astype(X.dtype)
        return (X - mean) @ cmp

0reactions

xaduprecommented, Nov 24, 2022

Closing the issue, feel free to reopen it.

Top Results From Across the Web

Pipeline architecture - GitLab Docs

Pipelines are the fundamental building blocks for CI/CD in GitLab. This page documents some of the important concepts related to them.

Use Hierarchical Data | FusionCreator Tutorials

Fig. 129: Import the pipeline definition. Enter the title of the pipeline: Hierarchies . Click Browse, and then locate ...

Set pipeline permissions - Azure Pipelines - Microsoft Learn

To set default deployment group permissions, open Deployment groups in the Pipelines tab. Then, select Security. Select Security to manage ...

Team Management - Knowledge Base | Pipeline

Reports to allow teams to report to other teams to preserve multi-level hierarchy. Visibility settings control which records team members ...

Configuration Hierarchy - Jenkins Templating Engine

The Configuration Hierarchy is created by configuring these Governance Tiers on Folders and in the Jenkins Global Configuration. Pipelines using JTE inherit ...