json.Decoder memory leak?
See original GitHub issueHello,
I’ve got a service written in Python that reads data from ElasticSearch frequently 24/7.
Recently I’ve migrated it from orjson
to msgspec.json
. Since then, the service runs out of memory pretty quickly.
Following https://docs.python.org/3/library/tracemalloc.html#pretty-top I’m able to capture the top 10 lines that contributes large memory usage, which turns out it’s the decode(...)
method from msgspec.json.Decoder
Top 10 lines
#1: /app/elasticsearch_repository.py:69: 26821.3 KiB
self.__decoder.decode(
#2: /app/prepare_data_service.py:280: 3649.2 KiB
cache_info_rt = decoder.decode(cache_info_data)
#3: /usr/local/lib/python3.10/multiprocessing/reduction.py:51: 1568.1 KiB
cls(buf, protocol).dump(obj)
#4: /usr/local/lib/python3.10/linecache.py:137: 628.2 KiB
lines = fp.readlines()
#5: /usr/local/lib/python3.10/json/decoder.py:353: 82.7 KiB
obj, end = self.scan_once(s, idx)
#6: /usr/local/lib/python3.10/multiprocessing/queues.py:122: 40.2 KiB
return _ForkingPickler.loads(res)
#7: /usr/local/lib/python3.10/tracemalloc.py:67: 11.9 KiB
return (self.size, self.count, self.traceback)
#8: /app/elasticsearch_repository.py:68: 3.7 KiB
return [
#9: /usr/local/lib/python3.10/http/client.py:1293: 3.6 KiB
self.putrequest(method, url, **skips)
#10: /app/venv/lib/python3.10/site-packages/elasticsearch/client/utils.py:347: 1.8 KiB
return func(*args, params=params, headers=headers, **kwargs)
193 other: 70.2 KiB
Total allocated size: 32880.9 KiB
Here are the structs for decoding:
'''
structs for msgspec
'''
from typing import Dict, List, Optional
from msgspec import Struct
class Query(Struct):
"""
Struct for Query
"""
date: str
depAirport: str
arrAirport: str
class Request(Struct):
"""
Struct for Request
"""
supplier: str
tripType: str
fareClass: str
adultAmount: int
childAmount: int
infantAmount: int
queries: List[Query]
timestamp: int
class Segment(Struct):
"""
Struct for Segment
"""
fareClass: str
depDate: str
depTime: str
flightNo: str
carrier: str
orgAirport: str
arriveDate: str
arriveTime: str
dstAirport: str
class Flight(Struct):
"""
Struct for Flight
"""
segments: List[Segment]
class Price(Struct):
"""
Struct for Price
"""
price: float
tax: float
totalPrice: float
seatsStatus: Optional[str] = None
currencyCode: Optional[str] = None
class Trip(Struct):
"""
Struct for Trip
"""
flights: List[Flight]
prices: Dict[str, Price]
extended = Dict[str, str]
class Result(Struct):
"""
Struct of Result
"""
trips: List[Trip]
class CacheInfo(Struct):
"""
Struct of CacheInfo
"""
request: Request
result: Result
I read from https://jcristharif.com/msgspec/structs.html#struct-nogc that
structs referencing only scalar values (ints, strings, bools, …) won’t contribute to GC load, but structs referencing containers (lists, dicts, structs, …) will.
Is this related? What’s the recommendation to resolve this issue?
Thanks!
Issue Analytics
- State:
- Created a year ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
msgspec
0.7.0 has been released. Wheels are available on pypi now, should be up on conda-forge once the conda-forge bots detect the change.As mentioned above, we should have a release out before the 21st.