Performance of compute residuals
See original GitHub issue@ybagdasa wrote
@CamDavidsonPilon I’m trying to use compute_residuals on a dataframe with 5M observations and after tens of minutes it is unclear whether it will ever finish computing. I suspect the dataframe is probably too large to do the computation as is in a reasonable amount of time. I’d like to avoid significantly scaling down as events constitute a small fraction of the observations and I need the statistics. Is there an existing solution for this?
@ybagdasa, to confirm, you were computing the schoenfeld
residuals?
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
How to Find Residuals in Regression Analysis | Built In
Calculating the residual provides a valuable clue into how well your model fits the data set. To calculate residuals we need to find...
Read more >Residual Calculator | Analyse Linear Regression
The residual calculator helps you to calculate the residuals of a linear regression analysis.
Read more >Interpreting Residual Plots to Improve Your Regression
When you run a regression, Stats iQ automatically calculates and plots residuals to help you understand and improve your regression model.
Read more >Residual analysis - statistics - Encyclopedia Britannica
The analysis of residuals plays an important role in validating the regression ... These residuals, computed from the available data, are treated as...
Read more >AP Stats Unit 2 Notes: Residuals Study Guide - Fiveable
Calculating Residuals. In order to calculate a residual for a given data point, we need the LSRL for that data set and the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@CamDavidsonPilon Yes that’s fitting.
I ended up doing a little workaround where I divied up the data into 50 k samples and then ran those in parallel and combined the coefficients and covariance matrix afterwards using some normal approximation assumptions. Took about 6 hours to run. Not the most ideal, but it worked.
like 6 covariates to fit