constant regressor error with simulated_historical_forecasts with an indicator variable used as regressor
See original GitHub issueThe simulated_historical_forecasts
function currently doesn’t account for the fact that in splitting an indicator variable across different cutoffs, you may run into a case where all the values take a zero or one.
This throws an error in the initalize_scales_fn because the check for uniqueness of regressor, given below, fails:
for (name in names(m$extra_regressors)) {
n.vals <- length(unique(df[[name]]))
if (n.vals < 2) {
stop('Regressor ', name, ' is constant.')
}
I handle this by making the following changes in the function:
regressor_names <- names(model$extra_regressors)
# check that regressor we added is not entirely constant
if (!is.null(regressor_names)) { # start of if
# number of unique values for regressors in history.c
num_unique_by_regressor <- sapply(regressor_names, function(x) length(unique(history.c[[x]])))
# which regressors should we remove
regressors_to_remove <- names(which(num_unique_by_regressor < 2))
if (length(regressors_to_remove) > 0) {
# remove the regressors from model
for (name in regressors_to_remove){
m$extra_regressors[[name]] <- NULL
}
# remove attributes for consistency
if (!is.null(attr(m$extra_regressors, which = 'names'))){
attr(m$extra_regressors, which = 'names') <- NULL
}
# remove the regressors from history.c
history.c <- dplyr::select(history.c, -one_of(regressors_to_remove))
}
} # end of if
The entire function then becomes -->
simulated_historical_forecasts <- function(model, horizon, units, k,
period = NULL) {
df <- model$history
horizon <- as.difftime(horizon, units = units)
if (is.null(period)) {
period <- horizon / 2
} else {
period <- as.difftime(period, units = units)
}
# regressor names
regressor_names <- names(model$extra_regressors)
cutoffs <- generate_cutoffs(df, horizon, k, period)
predicts <- data.frame()
for (i in 1:length(cutoffs)) {
cutoff <- cutoffs[i]
# Copy the model
m <- prophet_copy(model, cutoff)
# Train model
history.c <- dplyr::filter(df, ds <= cutoff)
# check that regressor we added is not entirely constant
if (!is.null(regressor_names)) {
# number of unique values for regressors in history.c
num_unique_by_regressor <- sapply(regressor_names, function(x) length(unique(history.c[[x]])))
# which regressors should we remove
regressors_to_remove <- names(which(num_unique_by_regressor < 2))
if (length(regressors_to_remove) > 0) {
# remove the regressors from model
for (name in regressors_to_remove){
m$extra_regressors[[name]] <- NULL
}
# remove attributes for consistency
if (!is.null(attr(m$extra_regressors, which = 'names'))){
attr(m$extra_regressors, which = 'names') <- NULL
}
# remove regressors from history.c
history.c <- dplyr::select(history.c, -one_of(regressors_to_remove))
}
}
# fit model
m <- fit.prophet(m, history.c)
# Calculate yhat
df.predict <- dplyr::filter(df, ds > cutoff, ds <= cutoff + horizon)
columns <- c('ds')
if (m$growth == 'logistic') {
columns <- c(columns, 'cap')
if (m$logistic.floor) {
columns <- c(columns, 'floor')
}
}
columns <- c(columns, regressor_names)
future <- df[columns]
yhat <- stats::predict(m, future)
# Merge yhat, y, and cutoff.
df.c <- dplyr::inner_join(df.predict, yhat, by = "ds")
df.c <- dplyr::select(df.c, ds, y, yhat, yhat_lower, yhat_upper)
df.c$cutoff <- cutoff
predicts <- rbind(predicts, df.c)
}
return(predicts)
}
Issue Analytics
- State:
- Created 6 years ago
- Comments:11 (6 by maintainers)
Top Results From Across the Web
Introduction to Time Series Regression and Forecasting
We will transform time series variables using lags, first ... A natural starting point for a forecasting model is to use past values...
Read more >4 Time Series Models
The Auto-Regressive (AR) model is a regression model in which the independent variables are lagged values of the dependent variable Yt.
Read more >Two-Part Predictors in Regression Models - PMC - NCBI
The first variable in the pair is a dummy-coded indicator that denotes whether the covariate value is relevant (e.g., person is in a ......
Read more >Regression Models with Data‐based Indicator Variables* - Hendry ...
Abstract Ordinary least squares estimation of an impulse-indicator coefficient is inconsistent, but its variance can be consistently estimated.
Read more >Choosing the Correct Type of Regression Analysis
Linear models are the most common and most straightforward to use. If you have a continuous dependent variable, linear regression is probably the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
This is a challenging issue. There are certainly ways to get around this and the solution you post is one. But it isn’t clear to me what the right thing to do is for making the cross validation meaningful. Our goal is to estimate model generalization. If the the external regressor is important, then removing it means we’re now fitting a different model, whose performance is probably not indicative of the generalization performance of the full model.
It seems to me the more reasonable thing to do would be to not try to do cross-validation using segments of the history that do not contain all of the data needed by the model (like both levels of an indicator variable). Since the cross-validation uses histories of increasing length, we should really just start the cross validation at a point in the history that has everything we need. This might mean fewer samples to estimate performance, but like I said above, otherwise we are getting more samples of something that isn’t really the generalization we want to estimate.
@deniznoah That seems like a reasonable use case. We’ll then need to have a way to drop constant extra regressors in fitting.
As for whether or not they are normalized - Not-binary extra regressors are standardized (subtract mean, divide by standard deviation) so they are mean 0 and have standard deviation 1. Binary extra regressors are left as-is. This behavior can be overriden when adding them, see
help(Prophet.add_regressor)
.