Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

plotting different imputing strategy

See original GitHub issue

Note: The idea is inspired from a lecture of andreas muller.

Describe the solution you’d like The idea is to have a visual look on how closely a particular imputer imputes given feature columns.

Is your feature request related to a problem? Please describe. It gives a quick and good visual representation about how different imputation strategy works for the given feature columns of the data.

Examples In the below image I took the iris data and added nan to it across various rows. Then I construct a function which plots on how various imputation strategies impute the given 2 columns col1 and col2 (in case of iris I used petal length and petal width). For iris I used 3 different imputation strategies mentioned in the image.

plot_imputation

The code I used for this visualization is below( note, for now this code is just for demonstration purpose and it can be improved ),

def get_full_and_nan_rows(X, col1, col2):
    """
    returns 2 lists,
    full_rows, which contains the indices of non-nan rows along given 2 columns.
    nan_rows, which contains the indices of nan rows along given 2 columns.
    """
    full_rows = []
    nan_rows = []

    for ind, row in enumerate(X):
        if any(np.isnan(row[[col1, col2]])):
            nan_rows.append(ind)
        else:
            full_rows.append(ind)

    return full_rows, nan_rows


@ignore_warnings(category=ConvergenceWarning)
def plot_2D_imputation(X, y, col1, col2, imputer, xlabel='', ylabel='', title='', figsize=(5,5), alpha=0.6, s=80):
    
    full_rows, nan_rows = get_full_and_nan_rows(X, col1, col2)
    X_imp = imputer.fit_transform(X)

    ax.scatter(X_imp[full_rows, col1], X_imp[full_rows, col2], c=plt.cm.tab10(
        y[full_rows]), alpha=alpha, s=s, marker='o')
    ax.scatter(X_imp[nan_rows, col1], X_imp[nan_rows, col2], c=plt.cm.tab10(
        y[nan_rows]), alpha=alpha, s=s, marker='s')
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.set_title(title)

Issue Analytics

State:
Created 4 years ago
Comments:11 (7 by maintainers)

Top GitHub Comments

1reaction

rebeccabilbrocommented, Jan 27, 2020

Hello @greatsharma and thanks for checking out Yellowbrick! @bbengfort and I are both currently traveling, so it may take us a week or more to respond. We appreciate your patience and your feature suggestion!

0reactions

bbengfortcommented, Jun 10, 2020

Hi @greatsharma sorry, it’s taken me so long to respond - my GitHub emails got pretty buried. In principle, I’m fine with the approach that you mentioned. My only comment is to remove the plot_2d from the function name, so far we’ve chosen to pass 2d or 3d as a parameter to visualizers that do 2d or 3d visualization (see the PCA visualizer for an example). And if the 2d is removed, then plot becomes redundant.

We would be interested in seeing some prototypes of this suggestion as a next step!

Top Results From Across the Web

Imputing Missing Data with Simple and Advanced Techniques

We can use SimpleImputer function from scikit-learn to replace missing values with a fill value. SimpleImputer function has a parameter called strategy that ......

Statistical Imputation for Missing Values in Machine Learning

A popular approach for data imputation is to calculate a statistical value for each column (such as a mean) and replace all missing...

Visualization Methods — Autoimpute documentation

Use this method to plot the boxplots of a given column after multiple imputation. The function also plots the boxplot of the observed...

Chapter 11 Imputation (Missing Data) - Bookdown

A randomly chosen value from an individual in the sample who has similar values on other variables. In other words, find all the...

Imputing missing values before building an estimator

Another option is the IterativeImputer . This uses round-robin linear regression, modeling each feature with missing values as a function of other features,...