question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to handle a pipeline for a hierarchical setup?

See original GitHub issue

Hi,

I’me currently dealing with the following problem. I have a hierarchy of labels in the form:

labels = {
"parent1": 
{"child1", "child2"}, 
"parent2": {"child1", "child2", "child3"}
}

My pipeline consists on a first parents classifier model and then I have a classifier for each parent’s child labels. I will put it as simple as I can:

class ParentLabelsModel(BaseEstimator, TransformerMixin):
    def __init__(self, base_estimator):
        TransformerMixin.__init__(self)
        BaseEstimator.__init__(self)
        self.base_estimator = base_estimator()

    def fit(self, X, y, *args):
        self.base_estimator.fit(X, y)

    def fit_transform(self, X, y, *args):
        self.base_estimator.fit(X, y)
        return X

    def transform(self, X, *args):
        y = self.base_estimator.predict(X)
        return X, y


class ChildLabelsModel(BaseEstimator, ClassifierMixin):
    def __init__(self, base_estimator, unique_parent_labels):
        BaseEstimator.__init__(self)
        self.unique_parent_labels = unique_parent_labels
        self.child_estimators = {p: base_estimator()
                                 for p in unique_parent_labels}

    def fit(self, X, y, y_, *args):
        X = X.toarray()
        for p in self.unique_parent_labels:
            child_indices = np.argwhere(np.array(y) == p).flatten()
            x_ = X[child_indices, :]
            y__ = y_[child_indices]
            self.child_estimators[p].fit(x_, y__)
        return self

    def predict(self, X, *args):
        parent_y = X[1][0]
        X = X[0].toarray()
        return (np.array([parent_y]),
                np.array([self.child_estimators[parent_y].predict(X)]))


pipeline = Pipeline([
    ("tfidf1", TfidfVectorizer()),
    ("plm", ParentLabelsModel(base_estimator=SGDClassifier)),
    ("clm", ChildLabelsModel(
        base_estimator=SGDClassifier,
        unique_parent_labels=np.unique(ytrain))
    )
])

As you can see, the first ParentsLabelsModel is a transformer, so that I can pass the predicted parent labels to select the specific child_estimator (there is one classifier per parent label).

I’ve been trying to convert both ParentLabelsModel and ChildLabelsModel but quite unsuccessfully. Before pasting the code i’ve written for the update_registered_converter functions, I would like to check if you have any opinion or guidelines to approach this conversion.

Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8

github_iconTop GitHub Comments

2reactions
xaduprecommented, Apr 19, 2021

I made a kind of proof of concept about the numpy API for ONNX. It works with decorators. I also wrote a tutorial about it. Here is a current transformer. Method transform is implemented with ONNX operators but the syntax is similar to numpy.

@onnxsklearn_class("onnx_transform", op_version=13)
class DecorrelateTransformerOnnx(TransformerMixin, BaseEstimator):
    def __init__(self, alpha=0.):
        BaseEstimator.__init__(self)
        TransformerMixin.__init__(self)
        self.alpha = alpha

    def fit(self, X, y=None, sample_weights=None):
        self.pca_ = PCA(X.shape[1])  # pylint: disable=W0201
        self.pca_.fit(X)
        return self

    def onnx_transform(self, X):
        if X.dtype is None:
            raise AssertionError("X.dtype cannot be None.")
        mean = self.pca_.mean_.astype(X.dtype)
        cmp = self.pca_.components_.T.astype(X.dtype)
        return (X - mean) @ cmp
0reactions
xaduprecommented, Nov 24, 2022

Closing the issue, feel free to reopen it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pipeline architecture - GitLab Docs
Pipelines are the fundamental building blocks for CI/CD in GitLab. This page documents some of the important concepts related to them.
Read more >
Use Hierarchical Data | FusionCreator Tutorials
Fig. 129: Import the pipeline definition. Enter the title of the pipeline: Hierarchies . Click Browse, and then locate ...
Read more >
Set pipeline permissions - Azure Pipelines - Microsoft Learn
To set default deployment group permissions, open Deployment groups in the Pipelines tab. Then, select Security. Select Security to manage ...
Read more >
Team Management - Knowledge Base | Pipeline
Reports to allow teams to report to other teams to preserve multi-level hierarchy. Visibility settings control which records team members ...
Read more >
Configuration Hierarchy - Jenkins Templating Engine
The Configuration Hierarchy is created by configuring these Governance Tiers on Folders and in the Jenkins Global Configuration. Pipelines using JTE inherit ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found