question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cutoff point calculation seems to be wrong in diagnostic notebook

See original GitHub issue

Hi there, I think there is an issue in the cutoff point date calculation or maybe the description in the documentation is not accurate. Let me elaborate from the example in the notebook section.

df = pd.read_csv('./data/example_wp_log_peyton_manning.csv')
df['ds']=pd.to_datetime(df['ds'], infer_datetime_format=True)  
df.describe(include='all')

This shows that ds ranges from 2007-12-10 00:00:00 to 2016-01-20 00:00:00 with 2905 days in total. You then run:

df_cv = cross_validation(m, '365 days', initial='1825 days', period='365 days')

which generate one cutoff point at 2013-01-20 because initial = period that is fine.

Now I was expecting that the cutoff point was going to be at this date 2012-12-08:

from datetime import timedelta
cutoff_expected = df.ds.min() + timedelta(days=1825)  

So in my mind we are in excess of 43 days which I can’t really figure out where they come from:

datetime.datetime(2013, 1, 20)-datetime.datetime(2012,12,8)

Let me know where my logic is failing. Cheers!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
robomoticcommented, Nov 13, 2020

Hi again, this was a very productive conversation, thanks for taking the time to explain it. I feel much more confident now about how I am using those features! Closing this down and hopefully will be useful to other users when searching. Thanks!

0reactions
blethamcommented, Nov 12, 2020

These are good questions.

  • The fitted parameters are not copied into each CV model, just the model structure (basically settings that can be specified in the Prophet() constructor, plus added seasonalities/regressors/holidays). So it would copy over that I have added a monthly seasonality to the model, but would not copy over the actual parameters of that monthly seasonality (the Fourier coefficients); these would be fit from the data inside that CV fold. I hope that clarifies that. Put otherwise, all of the parameters that are fit in the Stan during model fitting are not copied but are re-fit in each CV fold.

  • It does not do this automatically. There is a bit of a tension here where on the one hand if I’m fitting a model with yearly seasonality but have a CV fold where it was auto-disabled, that fold probably wouldn’t provide a useful estimate of the error of the model when it does have yearly seasonality enabled. But as you note, on the other hand, if I leave yearly seasonality enabled but then try to fit it on a CV fold that is only a few months, that also probably won’t provide a very useful error estimate for a model that actually has enough data to fit the yearly seasonality. As you know, we have taken the cross validation in the 2nd direction (keeping the seasonalities on if they are used in the final model), and to avoid this issue we just raise a warning if the CV fold has less data than the seasonality: https://github.com/facebook/prophet/blob/ad3832bb1957da1ba3efb4f6b0196977fcd13f06/python/fbprophet/diagnostics.py#L152-L159 So if you are doing CV on a model that has yearly seasonality, it will print out this warning if the CV settings are such that there are folds with < 1 year of data.

Read more comments on GitHub >

github_iconTop Results From Across the Web

5.2 Cutoff Point and Its Effects on Sensitivity and Specificity
With a cutoff point of 90 mmHg, we will classify some nonhypertensive individuals as hypertensive, and these will be false positives. We will...
Read more >
On determining the most appropriate test cut-off value - NCBI
There are several criteria for determination of the most appropriate cut-off value in a diagnostic test with continuous results.
Read more >
Guide to Confusion Matrices & Classification Performance ...
In this article, we will explore confusion matrices and how they can be used to determine performance metrics in machine learning classification problems....
Read more >
What Is Precision & Recall? Use in Classification Models
With a threshold of 1.0, we would be in the lower left of the graph because we identify no data points as positives,...
Read more >
An investigation of the false discovery rate and the ... - Journals
From this point of view, what matters is the probability that, ... that we need to specify in order to calculate the false...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found