Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve Memory Handling in Research Notebook

See original GitHub issue

Expected Behavior

The Research Environment has a garbage-collecting feature.

Actual Behavior

The Research Environment doesn’t have a garbage-collecting feature. If we use data from History requests, we can’t dispose of it to free memory up and continue to run the same notebook.

import gc
import psutil

# instantiate thq quantbook
qb = QuantBook()

# load equity symbol_ids from the object store and add them to the quantbook
symbol_ids = '''AMD,NVDA,AMZN,META,GOOGL,GOOG,SHOP,OXY,
RBLX,MU,XOM,BABA,GM,PLUG,SQ,ORCL,PYPL,DVN,CVX,COIN,JPM,
SLB,AFRM,SBUX,LI,TSM,MRVL,DIS,UAL,MS,ROKU,NEE,U,APA,EQT,
CVNA,PDD,QCOM,AA,RCL,COP,FDX,BA,LVS,PENN,CRM,UPST,ABNB,
SCHW,GE,WMT,NKE,AMAT,AR,DOW,MOS,ON,RUN,DT,PG,MPC,MMM,DASH,
RTX,ASAN,DOCU,ZIM,JD,WDC,ARRY,FTNT,BTU,ENVX,AIG,OKTA,GIS,
BYND,TWLO,NET,TMUS,FISV,MET,MCHP,SE,LTHM,W,VLO,CVS,IBM,PM,
TJX,DDOG,RRC,BX,WYNN,CTVA,ZM,CZR,TTD,Z,CHWY,GME,RIO,PSX,BHP,
CTSH,MTCH,DD,EOG,PBF,ADI,OVV,APP,AXP,EXPE,CF,ADM,COF,YUMC,
LYB,OKE,FSLR,AEM,BLDR,ETSY,CNQ,DLTR,PVH,NOVA,EMR,NUE,AEP,
ICE,S,SPR,PGR,NTR,FANG,LAC,BBY,APH,MAR,STX,HLT,MP,BE,DUK,
ANET,HES,MNST,TTE,CPRI,LYV,BIDU,SPLK,CHK,PRU,SWKS,STLD,CC,SYY,RNG'''.split(',')

for symbol_id in symbol_ids:
    qb.AddEquity(symbol_id)

# define a function (closure) to perform the history request
def history_function(qb):
    df_history = qb.History(
        qb.Securities.Keys,
        datetime(2010, 1, 1),
        datetime(2022, 1, 1),
        Resolution.Daily,
    )
    print(f"Dataframe size: {df_history.memory_usage().sum() / 1e6:.1f} MB")
    del df_history

print(
    f"Memory usage before calling the function: {psutil.virtual_memory().used / 1e9:.3f} / "
    f"{psutil.virtual_memory().total / 1e9:.1f} GB "
    f"({psutil.virtual_memory().percent}%)"
)

# run the function with the history request and subsequently collect the garbage
history_function(qb)
gc.collect()

print(
    f"Memory usage after calling the function: {psutil.virtual_memory().used / 1e9:.3f} / "
    f"{psutil.virtual_memory().total / 1e9:.1f} GB "
    f"({psutil.virtual_memory().percent}%)"
)

# delete the quantbook and subsequently collect the garbage
del qb
gc.collect()

print(
    f"Memory usage after deleting the qb: {psutil.virtual_memory().used / 1e9:.3f} / "
    f"{psutil.virtual_memory().total / 1e9:.1f} GB "
    f"({psutil.virtual_memory().percent}%)"
)

Result:

Memory usage before calling the function: 37.296 / 135.0 GB (28.6%)
Dataframe size: 15.2 MB
Memory usage after calling the function: 38.463 / 135.0 GB (29.5%)
Memory usage after deleting the qb: 38.463 / 135.0 GB (29.5%)

We can see there is no significant or no drop in memory consumption. It’s sort of expected because Python’s gc.collect doesn’t collect garbage for C# objects.

Potential Solution

N/A

Checklist

I have completely filled out this template
I have confirmed that this issue exists on the current master branch
I have confirmed that this is not a duplicate issue by searching issues
I have provided detailed steps to reproduce the issue

Issue Analytics

State:
Created a year ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

cperotlsqcommented, Oct 21, 2022

Hey @Martin-Molinero, thanks for looking into this and implementing some improvements! I do start to see the memory usage decrease after running this, but it takes some time for it to do so, and not necessarily to the level it would be without restarting the kernel and loading the corresponding data from the object store. Nonetheless, this is an improvement 😃

I am curious, why do we need to collect the garbage 40 times?

1reaction

Martin-Molinerocommented, Oct 12, 2022

Can’t reproduce a memory leak, memory is stable, even creating multiple QuantBooks. There is a built in memory cache for daily/hour resolution data see https://github.com/QuantConnect/Lean/blob/master/Engine/DataFeeds/TextSubscriptionDataSourceReader.cs#L43 which would explain the jump in memory usage once the daily history request are performed which no longer goes down but neither continuous to grow

Top Results From Across the Web

Study shows stronger brain activity after writing on paper ...

A study of university students and recent graduates has revealed that writing on physical paper can lead to more brain activity when remembering ......

11 memorization techniques to boost your memory

Learn how to improve your memory with these 11 memorization techniques.

Memory Limit in Jupyter Notebook: How to Manage it like a ...

You can do this by editing the Jupyter configuration file and setting the memory limit.

Notebook helps manage and improve memory - WSU Insider

But WSU researchers say education and support, along with a simple tool – a memory notebook – could be key to giving those...

Stronger Brain Activity After Writing on Paper Than ...

Writing by hand increases brain activity in recall tasks over taking notes on a tablet or smartphone. Additionally, those who write by hand ......