Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RANSAC - residual threshold calculation

See original GitHub issue

Within the RANSAC algorithm, a residual threshold is calculated:

residual_threshold = np.median(np.abs(y - np.median(y))

If more than half of the values of y are equal to the median of y, this returns a residual threshold of 0. In that case, the line

inlier_mask_subset = residuals_subset < residual_threshold

always returns zero inliers, causing a value error since inlier_mask_best is always None.

Issue Analytics

State:
Created 4 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

jeffreywillertcommented, Jan 23, 2020

@alexshacked I made the same change (from “<” to “<=” in my local copy and have found it to work well in practice.

I would argue that this should be the default behavior. While trivial, it is a case that can be observed in practice. I encountered this while looping through client accounts in which the target quantity is constant (or near constant) for a small fraction of accounts. When the target quantity is constant, it’s easy to throw out the case, but it may still be of interest to run the regression when 51% of the observations have the same target value.

1reaction

alexshackedcommented, Jan 23, 2020

The error happens when the RANSACRegressor is created without passing a residual_threshold. In this case RANSAC needs to calculate the residual_threshold from the y vector. The error thrown during fit is:

File "/Users/ashacked/dev/python/scikit-learn/sklearn/linear_model/_ransac.py", line 431, in fit

"RANSAC could not find a valid consensus set. All"

ValueError: RANSAC could not find a valid consensus set. All `max_trials` iterations were skipped because each randomly chosen sub-sample failed the passing criteria. See estimator attributes for diagnostics (n_skips*).

The same error will be thrown if we create the RANSACRegressor with residual_threshold=0.0, and this behaviour is tested in test_ransac.test_ransac_resid_thresh_no_inliers().

And actually passing a Y vector where most of the values are the same (and not passing residual_threshold=0.0) has the same signifficance because then the residual_threshold will be computed from the Y vector and will be again zero. So in short the error generated by this scenario is thrown because the input creates a residual_threshold equal to zero and the current implementation does not tolerate that, as the regression test test_ransac_resid_thresh_no_inliers() clearly demonstrates.

So one possible resolution could be to accept the current behaviour which seems to be “as designed”

On the other hand, we could ask this question. Should we refuse to handle the case where residual_threshold=0.0. This is actually the case when the input to the Regressor is an horizontal line. gradient is zero.

Having more than 50 percent of target values equal to the median, is actualy a reasonable (although trivial) case. Trying to find a regression model for an horizontal line, will pass a Y vector where all Y values are the same. There are no outliers here, the gradient is 0, but maybe the RANSAC model should be able to cope with the trivial border case.

The line in RANSACRegressor.fit() that fails the case where residual_threshold is 0, is: inlier_mask_subset = residuals_subset < residual_threshold if we had a < = instead of =, fit() would have worked. Meaning accepting a sample where the predict target value is equal to the sample target value. Which does not seem far fetched to me. Actually in the Wikipedia page for RANSAC

https://en.wikipedia.org/wiki/Random_sample_consensus, the MATLAB implementation has a " < = " and not a " < "

inlierIdx = find(abs(distance)<=threshDist)

I made this change and worked with inlier_mask_subset = residuals_subset < = residual_threshold Ran test_ransac.py and saw that all tests passed except test_ransac_resid_thresh_no_inliers() which failed as expected because it specifically verifies that we dont accept residual_threshold=0.0 and because of my change we did. @glemaitre , @jnothman what do you think?

Top Results From Across the Web

sklearn.linear_model.RANSACRegressor

RANSAC is an iterative algorithm for the robust estimation of parameters ... Points whose residuals are strictly equal to the threshold are considered...

Lecture 15 Robust Estimation : RANSAC - Penn State

We use a small threshold for R because we want LOTS of corners (fodder for our next step, which is matching). Harris corners...

RANSAC Regression Explained with Python Examples

RANSAC regression algorithm is useful for handling the outliers dataset. ... minimum number of samples, loss function, residual threshold.

Nearest neighbour distance metric approach to determine the ...

The RANSAC threshold distance parameter must have some correlation with the mean or median NNE. How to calculate the the nearest neighbour ...

ransac - Peter Corke

ransac determines the subset of points (inliers) that best fit the model described by the function func and the parameter m. T is...