question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Average with negative weights

See original GitHub issue

Problem description

In several contexts, we discussed how to compute weighted averages when the weights are negative. The prime use case for this is an average carbon price over several regions, weighted by the (possibly negative) emissions in each region.

The current implementation of the aggregate_region implements a direct weighted sum:

value * weight / sum(weight)

This can lead to very counter-intuitive results - see more below.

Illustration

Use the following snippet to explore the current behaviour:

def test_aggregate(value, weight):

    TEST_DF = pd.DataFrame([
        ['reg_a', 'Price|Carbon', 'USD/t CO2', value[0]],
        ['reg_b', 'Price|Carbon', 'USD/t CO2', value[1]],
        ['reg_a', 'Emissions|CO2', 'Mt CO2', weight[0]],
        ['reg_b', 'Emissions|CO2', 'Mt CO2', weight[1]],
    ],
        columns=['region', 'variable', 'unit', 2010],
    )

    return pyam.IamDataFrame(TEST_DF, model='model_a', scenario='scen_a')\
        .aggregate_region('Price|Carbon', weight='Emissions|CO2')._data.iloc[0]
  1. If the sum of the weights is zero, the resulting average price is infinite
    test_aggregate(value=[1, 2], weight=[-1, 1])
    >>> inf
    
  2. Depending on the order of the weight vector, it can also be negative infinity.
    test_aggregate(value=[1, 2], weight=[-1, 1])
    >>> -inf
    
  3. If sum(weight) == 0 and value * weight == 0, the returned value is nan, which is dropped when initiatilizing the IamDataFrame.
    test_aggregate(value=[1, 1], weight=[-1, 1])
    >>> Error
    
  4. Other values can lead to situations where the “average” price is outside the range min(value), max(value), e.g.
    test_aggregate(value=[1, 2], weight=[-1, 2])
    >>> 3
    

Possible solutions

  1. Leave as is, because it’s up to the users to know this (including modeling teams uploading to the IIASA Scenario Explorer).

  2. Leave as is, but write a warning message when weights are negative (question is whether users just ignore this).

  3. Raise an error when any weight is negative, forcing users to explicitly choose an alternative approach (average, min, max).

  4. Use the absolute of the weight vector. If there are three regions with emissions -1, 0 and 1, this would mean that the prices of region 1 and 3 are given equal weight and region 2 is ignored. This seems wrong…

  5. Redefine the mathematical expression to

    value * weight / sum(abs(weight))

    However, this can also lead to strange outcomes, e.g., if all regional values are identical, and the weighted average is different.

  6. Apply normalization (recalculate the weight vector such that min(weight) == 0 and max(weight) == 1, leaving the relative distribution as is. This means that the price of the region with the lowest emissions is ignored (because its weight is 0).

  7. Apply some shift/offset to make all weights positive. Suggestion by @byersiiasa: weight = weight + 2 * min(weight). The vector weight=[-1, 2] would become [1, 4].

    test_aggregate(value=[1, 2], weight=[1, 4])
    >>> 1.8
    

    This has the drawback that the offset multiplier is arbitrary - using a multiplier of 3 instead of 2 gives different results…

    test_aggregate(value=[1, 2], weight=[2, 5])
    >>> 1.7142
    
  8. Added Write a warning when weights are negative and add a keyword-argument whether to drop (default) or not any values aggregated with negative weights.

  9. Added Same as 8, but default is to include aggregation with negative weights.

  10. Any other ideas…?

Note

The function numpy.avg() faces the same problem. The devs decided to leave the unintuitive behaviour (solution 1 above), see https://github.com/numpy/numpy/issues/9825.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
khaerucommented, Oct 23, 2020

Principle of minimum surprise would suggest: don’t do something different than numpy, so certainly not (4) through (7). Definitely do not have different behaviours depending on the particular variable names in use, as suggested in the last two comments.

Of (1), (2), and (3), I would exclude (3), since the numpy thread points out this is mathematically valid, albeit perhaps without a common or obvious application in IA modeling. Between (1) and (2) it’s a toss-up and I think depends on the intended user base; if they’re likely to make this mistake and unlikely to read the docs carefully, then a warning could help.

ETA: on closer reading, the example above shows that ±Inf is returned when weights sum to zero, whereas numpy raises ZeroDivisionError. Maybe consider removing this discrepancy.

0reactions
byersiiasacommented, Nov 4, 2020

I agree this all makes from software perspective.

How / where to have the discussion regarding actual implementation to resolve the problematic issue of when the desirable weights , e.g. Emissions, go negative?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Weighted mean or average including negative numbers
As long as the weights are nonnegative and sum to 1 (as your percentages do) it does not matter what the values are....
Read more >
Negative Weighted Averages | CAS - The Carlisle Group
We are often asked about negative weighted averages and negative weight fields in CAS. Where the weight field is negative, the result you ......
Read more >
Negative Weights? | Physics Forums
An average is a "middle" value, but if any weights are negative it is possible for the result to be outside the range...
Read more >
Weighted average problem with negative numbers in data set
Let's say they each contribute 30%, 30%, 20% and 20% towards some final value X. So my weighted average formula is: X =...
Read more >
weighted average with negative weights - SAS Communities
Hi, Is the only way to derive a weighted average when some weights are negative to use proc sql or similar? Can't proc...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found