improve rdtools import time
See original GitHub issueRdTools is a little slow to import:
In [1]: %time import rdtools
Wall time: 1.04 s
For comparison:
In [1]: %time import pvlib
Wall time: 184 ms
Here’s a breakdown of where import rdtools
is spending time:
Generated with:
(base) C:\Windows\Temp>python -X importtime -c "import rdtools" 2> rdtools.log
(base) C:\Windows\Temp>tuna rdtools.log
We could speed up the import time by changing how we import packages. I notice that statsmodels.api
, pkg_resources
, and h5py
together make up a large chunk of the total but aren’t actually used in the “primary” RdTools functions. What do people think about moving those imports into the functions that need them instead of importing at module scope?
Pros:
- It decreases import time to 600 ms (~40% speedup) on my machine
- It makes those dependencies optional for people that don’t plan on using those functions – e.g. our fleets pipeline doesn’t actually need the statsmodels package because it doesn’t use the OLS and classical decomp functions.
Cons:
- It’s nice to have a list of imports at the top of the module and hiding them in the functions is nonstandard and reduces code clarity
- It violates pep8 (https://www.python.org/dev/peps/pep-0008/#imports)
- It makes the first function invocation slower, and subsequent invocations very marginally slower
- 1 second to import isn’t that big a deal
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
RdTools Overview — RdTools 2.1.4+0.g996f843.dirty ...
RdTools is an open-source library to support reproducible technical analysis of time series data from photovoltaic energy systems.
Read more >Release 2.0.1+0.gc6fd05f.dirty
Inputs: Pandas time series of raw data to be filtered. Output: Boolean mask where `True` ... 7.1.1 0: Import and preliminary calculations.
Read more >RdTools 2.0.6+0.gb6dcdd2.dirty documentation - Read the Docs
RdTools can handle both high frequency (hourly or better) or low ... Import and preliminary calculations; Normalize data using a performance metric ...
Read more >Degradation and soiling example with clearsky workflow
The first step of the rdtools workflow is normalization, which requires a time series of energy yield, a time series of cell temperature,...
Read more >TrendAnalysis object-oriented example
Import and preliminary calculations: In this step data is important and ... 2 }) # Register time series plotting in pandas > 1.0...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My $0.02: unless importing statsmodels becomes a performance headache, I’d stick with it for the long term. There are regression functions in numpy (
numpy.linalg.lstsq
) but my sense is that the long-term intent is to provide regression and similar functions in statsmodels and scipy, with numpy providing the building blocks. And statsmodels provides options such as robust regression that may become desireable for RdTools applications.This is almost a year old and the marginal gains probably don’t justify violating style guides to put imports in functions. I’ll close this for now.