question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dealing with nans

See original GitHub issue

For the PPCA demo, I recommend generating two datasets:

1.) First generate a well-structured covariance matrix:

from scipy.linalg import toeplitz import numpy as np K = 10 - toeplitz(np.arange(10))

2.) Now generate a first dataset (a random walk with the given covariance matrix)

data1 = np.cumsum(np.random.multivariate_normal(np.zeros(10), K, 250), axis=0)

3.) Now copy the first dataset

from copy import copy data2 = copy(data1)

4.) Set random entries of data2 to nan (choose some level of sparsity for this, e.g. 10% of the entries)

5.) Now plot data1 (solid line) and data2 (dashed line) and make sure they line up with each other

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jeremymanningcommented, Dec 19, 2016

It looks like there’s still some interpolation going in in reduce.py

desired behavior:

1.) if no nans, use PCA to reduce to the specified number of dimensions 2.) if nans, use PPCA (instead of PCA) to reduce to the specified number of dimensions. some observations may *still * be nans after using PPCA. those should show up as breaks in the line (i.e. don’t explicitly remove them from the plot, but they just won’t be visible). not removing nans is important because the user may want the rows to match up across matrices, and we don’t want to mess with that.

in the matlab version the nans are removed before doing PCA, and then they are added back in prior to plotting. what i’m proposing for the python version is to be a little fancier by using PPCA when possible to reconstruct missing data. since we’re already making an assumption that the data covariance matters in applying PCA to the data, we can leverage the same assumption to fill in parts of missing observations. but for skipped observations (i.e. where no feature is observed for that row of the data matrix) we shouldn’t add in any additional assumptions about the timecourse (we can’t even assume that the user is giving us a timecourse).

in other words, we want the reduced data to have the same number of rows as the original data.

1reaction
jeremymanningcommented, Dec 18, 2016

(This will help us determine if PPCA is correctly interpolating)

Read more comments on GitHub >

github_iconTop Results From Across the Web

What's the best way to handle NaN values?
Inpute them with specific values. · Impute with special metrics, for example, mean or median. · Impute using a method: MICE or KNN....
Read more >
What would the best way to handle NaN values for both ...
What would the best way to handle NaN values for both numerical and categorical data [closed] · 1- Replace it with 0: df.fillna(0,...
Read more >
29. Dealing with NaN | Numerical Programming
We will create a temperature DataFrame, in which some data is not defined, i.e. NaN. We will randomly assign some NaN values into...
Read more >
Dealing With Missing Values in Python – A Complete Guide
Missing Value Treatment in Python – Missing values are usually represented in the form of Nan or null or None in the dataset....
Read more >
Dealing with NaNs and infs - Stable Baselines - Read the Docs
Dealing with NaNs and infs¶. During the training of a model on a given environment, it is possible that the RL model becomes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found