question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve Memory Handling in Research Notebook

See original GitHub issue

Expected Behavior

The Research Environment has a garbage-collecting feature.

Actual Behavior

The Research Environment doesn’t have a garbage-collecting feature. If we use data from History requests, we can’t dispose of it to free memory up and continue to run the same notebook.

import gc
import psutil

# instantiate thq quantbook
qb = QuantBook()

# load equity symbol_ids from the object store and add them to the quantbook
symbol_ids = '''AMD,NVDA,AMZN,META,GOOGL,GOOG,SHOP,OXY,
RBLX,MU,XOM,BABA,GM,PLUG,SQ,ORCL,PYPL,DVN,CVX,COIN,JPM,
SLB,AFRM,SBUX,LI,TSM,MRVL,DIS,UAL,MS,ROKU,NEE,U,APA,EQT,
CVNA,PDD,QCOM,AA,RCL,COP,FDX,BA,LVS,PENN,CRM,UPST,ABNB,
SCHW,GE,WMT,NKE,AMAT,AR,DOW,MOS,ON,RUN,DT,PG,MPC,MMM,DASH,
RTX,ASAN,DOCU,ZIM,JD,WDC,ARRY,FTNT,BTU,ENVX,AIG,OKTA,GIS,
BYND,TWLO,NET,TMUS,FISV,MET,MCHP,SE,LTHM,W,VLO,CVS,IBM,PM,
TJX,DDOG,RRC,BX,WYNN,CTVA,ZM,CZR,TTD,Z,CHWY,GME,RIO,PSX,BHP,
CTSH,MTCH,DD,EOG,PBF,ADI,OVV,APP,AXP,EXPE,CF,ADM,COF,YUMC,
LYB,OKE,FSLR,AEM,BLDR,ETSY,CNQ,DLTR,PVH,NOVA,EMR,NUE,AEP,
ICE,S,SPR,PGR,NTR,FANG,LAC,BBY,APH,MAR,STX,HLT,MP,BE,DUK,
ANET,HES,MNST,TTE,CPRI,LYV,BIDU,SPLK,CHK,PRU,SWKS,STLD,CC,SYY,RNG'''.split(',')

for symbol_id in symbol_ids:
    qb.AddEquity(symbol_id)

# define a function (closure) to perform the history request
def history_function(qb):
    df_history = qb.History(
        qb.Securities.Keys,
        datetime(2010, 1, 1),
        datetime(2022, 1, 1),
        Resolution.Daily,
    )
    print(f"Dataframe size: {df_history.memory_usage().sum() / 1e6:.1f} MB")
    del df_history

print(
    f"Memory usage before calling the function: {psutil.virtual_memory().used / 1e9:.3f} / "
    f"{psutil.virtual_memory().total / 1e9:.1f} GB "
    f"({psutil.virtual_memory().percent}%)"
)

# run the function with the history request and subsequently collect the garbage
history_function(qb)
gc.collect()

print(
    f"Memory usage after calling the function: {psutil.virtual_memory().used / 1e9:.3f} / "
    f"{psutil.virtual_memory().total / 1e9:.1f} GB "
    f"({psutil.virtual_memory().percent}%)"
)

# delete the quantbook and subsequently collect the garbage
del qb
gc.collect()

print(
    f"Memory usage after deleting the qb: {psutil.virtual_memory().used / 1e9:.3f} / "
    f"{psutil.virtual_memory().total / 1e9:.1f} GB "
    f"({psutil.virtual_memory().percent}%)"
)

Result:

Memory usage before calling the function: 37.296 / 135.0 GB (28.6%)
Dataframe size: 15.2 MB
Memory usage after calling the function: 38.463 / 135.0 GB (29.5%)
Memory usage after deleting the qb: 38.463 / 135.0 GB (29.5%)

We can see there is no significant or no drop in memory consumption. It’s sort of expected because Python’s gc.collect doesn’t collect garbage for C# objects.

Potential Solution

N/A

Checklist

  • I have completely filled out this template
  • I have confirmed that this issue exists on the current master branch
  • I have confirmed that this is not a duplicate issue by searching issues
  • I have provided detailed steps to reproduce the issue

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
cperotlsqcommented, Oct 21, 2022

Hey @Martin-Molinero, thanks for looking into this and implementing some improvements! I do start to see the memory usage decrease after running this, but it takes some time for it to do so, and not necessarily to the level it would be without restarting the kernel and loading the corresponding data from the object store. Nonetheless, this is an improvement 😃

I am curious, why do we need to collect the garbage 40 times?

1reaction
Martin-Molinerocommented, Oct 12, 2022

Can’t reproduce a memory leak, memory is stable, even creating multiple QuantBooks. There is a built in memory cache for daily/hour resolution data see https://github.com/QuantConnect/Lean/blob/master/Engine/DataFeeds/TextSubscriptionDataSourceReader.cs#L43 which would explain the jump in memory usage once the daily history request are performed which no longer goes down but neither continuous to grow image

Read more comments on GitHub >

github_iconTop Results From Across the Web

Study shows stronger brain activity after writing on paper ...
A study of university students and recent graduates has revealed that writing on physical paper can lead to more brain activity when remembering ......
Read more >
11 memorization techniques to boost your memory
Learn how to improve your memory with these 11 memorization techniques.
Read more >
Memory Limit in Jupyter Notebook: How to Manage it like a ...
You can do this by editing the Jupyter configuration file and setting the memory limit.
Read more >
Notebook helps manage and improve memory - WSU Insider
But WSU researchers say education and support, along with a simple tool – a memory notebook – could be key to giving those...
Read more >
Stronger Brain Activity After Writing on Paper Than ...
Writing by hand increases brain activity in recall tasks over taking notes on a tablet or smartphone. Additionally, those who write by hand ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found