Improve Memory Handling in Research Notebook
See original GitHub issueExpected Behavior
The Research Environment has a garbage-collecting feature.
Actual Behavior
The Research Environment doesn’t have a garbage-collecting feature. If we use data from History requests, we can’t dispose of it to free memory up and continue to run the same notebook.
import gc
import psutil
# instantiate thq quantbook
qb = QuantBook()
# load equity symbol_ids from the object store and add them to the quantbook
symbol_ids = '''AMD,NVDA,AMZN,META,GOOGL,GOOG,SHOP,OXY,
RBLX,MU,XOM,BABA,GM,PLUG,SQ,ORCL,PYPL,DVN,CVX,COIN,JPM,
SLB,AFRM,SBUX,LI,TSM,MRVL,DIS,UAL,MS,ROKU,NEE,U,APA,EQT,
CVNA,PDD,QCOM,AA,RCL,COP,FDX,BA,LVS,PENN,CRM,UPST,ABNB,
SCHW,GE,WMT,NKE,AMAT,AR,DOW,MOS,ON,RUN,DT,PG,MPC,MMM,DASH,
RTX,ASAN,DOCU,ZIM,JD,WDC,ARRY,FTNT,BTU,ENVX,AIG,OKTA,GIS,
BYND,TWLO,NET,TMUS,FISV,MET,MCHP,SE,LTHM,W,VLO,CVS,IBM,PM,
TJX,DDOG,RRC,BX,WYNN,CTVA,ZM,CZR,TTD,Z,CHWY,GME,RIO,PSX,BHP,
CTSH,MTCH,DD,EOG,PBF,ADI,OVV,APP,AXP,EXPE,CF,ADM,COF,YUMC,
LYB,OKE,FSLR,AEM,BLDR,ETSY,CNQ,DLTR,PVH,NOVA,EMR,NUE,AEP,
ICE,S,SPR,PGR,NTR,FANG,LAC,BBY,APH,MAR,STX,HLT,MP,BE,DUK,
ANET,HES,MNST,TTE,CPRI,LYV,BIDU,SPLK,CHK,PRU,SWKS,STLD,CC,SYY,RNG'''.split(',')
for symbol_id in symbol_ids:
qb.AddEquity(symbol_id)
# define a function (closure) to perform the history request
def history_function(qb):
df_history = qb.History(
qb.Securities.Keys,
datetime(2010, 1, 1),
datetime(2022, 1, 1),
Resolution.Daily,
)
print(f"Dataframe size: {df_history.memory_usage().sum() / 1e6:.1f} MB")
del df_history
print(
f"Memory usage before calling the function: {psutil.virtual_memory().used / 1e9:.3f} / "
f"{psutil.virtual_memory().total / 1e9:.1f} GB "
f"({psutil.virtual_memory().percent}%)"
)
# run the function with the history request and subsequently collect the garbage
history_function(qb)
gc.collect()
print(
f"Memory usage after calling the function: {psutil.virtual_memory().used / 1e9:.3f} / "
f"{psutil.virtual_memory().total / 1e9:.1f} GB "
f"({psutil.virtual_memory().percent}%)"
)
# delete the quantbook and subsequently collect the garbage
del qb
gc.collect()
print(
f"Memory usage after deleting the qb: {psutil.virtual_memory().used / 1e9:.3f} / "
f"{psutil.virtual_memory().total / 1e9:.1f} GB "
f"({psutil.virtual_memory().percent}%)"
)
Result:
Memory usage before calling the function: 37.296 / 135.0 GB (28.6%)
Dataframe size: 15.2 MB
Memory usage after calling the function: 38.463 / 135.0 GB (29.5%)
Memory usage after deleting the qb: 38.463 / 135.0 GB (29.5%)
We can see there is no significant or no drop in memory consumption. It’s sort of expected because Python’s gc.collect
doesn’t collect garbage for C# objects.
Potential Solution
N/A
Checklist
- I have completely filled out this template
- I have confirmed that this issue exists on the current
master
branch - I have confirmed that this is not a duplicate issue by searching issues
- I have provided detailed steps to reproduce the issue
Issue Analytics
- State:
- Created a year ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Study shows stronger brain activity after writing on paper ...
A study of university students and recent graduates has revealed that writing on physical paper can lead to more brain activity when remembering ......
Read more >11 memorization techniques to boost your memory
Learn how to improve your memory with these 11 memorization techniques.
Read more >Memory Limit in Jupyter Notebook: How to Manage it like a ...
You can do this by editing the Jupyter configuration file and setting the memory limit.
Read more >Notebook helps manage and improve memory - WSU Insider
But WSU researchers say education and support, along with a simple tool – a memory notebook – could be key to giving those...
Read more >Stronger Brain Activity After Writing on Paper Than ...
Writing by hand increases brain activity in recall tasks over taking notes on a tablet or smartphone. Additionally, those who write by hand ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey @Martin-Molinero, thanks for looking into this and implementing some improvements! I do start to see the memory usage decrease after running this, but it takes some time for it to do so, and not necessarily to the level it would be without restarting the kernel and loading the corresponding data from the object store. Nonetheless, this is an improvement 😃
I am curious, why do we need to collect the garbage 40 times?
Can’t reproduce a memory leak, memory is stable, even creating multiple QuantBooks. There is a built in memory cache for daily/hour resolution data see https://github.com/QuantConnect/Lean/blob/master/Engine/DataFeeds/TextSubscriptionDataSourceReader.cs#L43 which would explain the jump in memory usage once the daily history request are performed which no longer goes down but neither continuous to grow