question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong definition of weights in numpy.polyfit

See original GitHub issue

The documentation below for numpy.polyfit is incorrect/misleading regarding the definition of the optional input weights vector w

http://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html

In least-squares fitting one generally defines the weights vector in such a way that the fit minimizes the squared error (in Numpy notation)

chi2 = np.sum(weights*(p(x) - y)**2)

In common situation where the 1σ errors “sigma” are known one has that the weights are the reciprocal of the variance

weights = 1/sigma**2

see e.g. http://en.wikipedia.org/wiki/Least_squares#Weighted_least_squares or http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd432.htm

However the numpy.polyfit documentation defines the weight as “weights to apply to the y-coordinates”. This definition is not correct. The weights apply to (=multiply) the fit residuals, not only to the y-coordinates.

More importantly, looking at the math in the Numpy (v1.9.1) code, the resulting definition of squared residuals adopted by polyfit is the following, with the optional input weights vector w inside the parenthesis, contrary to standard practice

chi2 = np.sum((w*(p(x) - y))**2)

in such a way that the relation between w and the 1σ errors is

w = 1/sigma

which is different from what everybody will expect.

The confusion in the documentation likely arises from the fact that the Numpy code solves the linear problem below in the last-squares sense, where the w vector does multiply the y-coordinate

(vander*w[:, np.newaxis]).dot(x) == y*w

And solving the above array expression in the least-squares sense is equivalent to minimizing the expression below with w inside the parenthesis

np.sum((w*(vander.dot(x) - y))**2)

A non-optimal solution, to maintain compatibility, would be to change the documentation and clearly define the weight w by including it in the equation for the “squared error” E given in the Notes. One should also make clear that the adopted definition differs from standard practice by giving the relation between weights and error w=1/σ

Even better would be to define a new optional keyword weights, which follows standard practice and satisfies weights = 1/sigma**2. In this case, in the code one should simply calculate w=np.sqrt(weight) of the input weights and the rest of the code applies unmodified.

Issue Analytics

  • State:open
  • Created 9 years ago
  • Reactions:1
  • Comments:17 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
iancrossfieldcommented, Jul 6, 2018

Just ran into this problem myself. I agree 100% with the suggestions to add an alternative “sigma” or “weights” option, and deprecating “w”.

1reaction
charriscommented, Feb 5, 2021

There are lots of uses for weights besides normalizing the variance, for instance, masking or robust least squares (IRLS).

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to include measurement errors in numpy.polyfit
However the numpy.polyfit documentation defines the weight as "weights to apply to the y-coordinates". This definition is not quite correct.
Read more >
numpy.polyfit — NumPy v1.24 Manual
polyfit issues a RankWarning when the least-squares fit is badly conditioned. This implies that the best fit is not well-defined due to numerical...
Read more >
Numpy Polyfit Explained With Examples - Python Pool
The function NumPy.polyfit() helps us by finding the least square polynomial fit. This means finding the best fitting curve to a given set...
Read more >
How polyfit function work in NumPy with examples? - eduCBA
This means that, as a result of numerical error, the best fit is not ... numpy.polyfit( x , y , deg , rcond...
Read more >
np.polyfit: How to Use Numpy polyfit() Method in Python
It fits a polynomial p(X) of degree deg to points (X, Y). Syntax. numpy.polyfit (X, Y, deg, rcond=None, full=False, w ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found