Possible memory leak when using Google Sheets API
See original GitHub issueThere seems to be a memory leak when using the google-api-client with GSheets.
Environment:
$ python --version
Python 3.6.4
$ pip show google-api-python-client
Name: google-api-python-client
Version: 1.7.3
Here’s a simple reproducer (without a .client_secret.json
):
#!/usr/bin/env python3
import httplib2
import os
from apiclient import discovery
from memory_profiler import profile
from oauth2client import client, tools
from oauth2client.file import Storage
from time import sleep
SCOPES = "https://www.googleapis.com/auth/spreadsheets.readonly"
# See https://cloud.google.com/docs/authentication/getting-started
CLIENT_SECRET_FILE = ".client_secret.json"
APPLICATION_NAME = "ClientDebug"
DISCOVERY_URL = "https://sheets.googleapis.com/$discovery/rest?version=v4"
def get_credentials():
home_dir = os.path.expanduser("~")
credential_dir = os.path.join(home_dir, ".credentials")
flags = None
if not os.path.exists(credential_dir):
os.makedirs(credential_dir)
credential_path = os.path.join(credential_dir,
"sheets.googleapis.com-clientdebug.json")
store = Storage(credential_path)
credentials = store.get()
if not credentials or credentials.invalid:
flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
flow.user_agent = APPLICATION_NAME
credentials = tools.run_flow(flow, store, flags)
return credentials
@profile(precision=4)
def get_responses(creds):
"""Fetch spreadsheet data."""
sheet_id = "1TowKJrFVbT4Bfp-HFcMh_CZ5anfH0CLfmoqCz9SUr9c"
http = creds.authorize(httplib2.Http())
service = discovery.build("sheets", "v4", http=http,
discoveryServiceUrl=(DISCOVERY_URL), cache_discovery=False)
result = service.spreadsheets().values().get(
spreadsheetId=sheet_id, range="A1:O").execute()
values = result.get("values", [])
print("Got {} rows".format(len(values)))
if __name__ == "__main__":
creds = get_credentials()
for i in range(0, 50):
get_responses(creds)
sleep(2)
For measurements I used memory_profiler
module with following results:
First and second iteration
Got 760 rows
Filename: ./main.py
Line # Mem usage Increment Line Contents
================================================
35 26.5195 MiB 26.5195 MiB @profile(precision=4)
36 def get_responses(creds):
37 """Fetch spreadsheet data."""
38 26.5195 MiB 0.0000 MiB sheet_id = "1TowKJrFVbT4Bfp-HFcMh_CZ5anfH0CLfmoqCz9SUr9c"
39
40 26.5195 MiB 0.0000 MiB http = creds.authorize(httplib2.Http())
41 26.5195 MiB 0.0000 MiB service = discovery.build("sheets", "v4", http=http,
42 29.2891 MiB 2.7695 MiB discoveryServiceUrl=(DISCOVERY_URL), cache_discovery=False)
43 49.5742 MiB 20.2852 MiB result = service.spreadsheets().values().get(
44 49.5742 MiB 0.0000 MiB spreadsheetId=sheet_id, range="A1:O").execute()
45 49.5742 MiB 0.0000 MiB values = result.get("values", [])
46
47 49.5742 MiB 0.0000 MiB print("Got {} rows".format(len(values)))
Got 760 rows
Filename: ./main.py
Line # Mem usage Increment Line Contents
================================================
35 49.5742 MiB 49.5742 MiB @profile(precision=4)
36 def get_responses(creds):
37 """Fetch spreadsheet data."""
38 49.5742 MiB 0.0000 MiB sheet_id = "1TowKJrFVbT4Bfp-HFcMh_CZ5anfH0CLfmoqCz9SUr9c"
39
40 49.5742 MiB 0.0000 MiB http = creds.authorize(httplib2.Http())
41 49.5742 MiB 0.0000 MiB service = discovery.build("sheets", "v4", http=http,
42 49.5742 MiB 0.0000 MiB discoveryServiceUrl=(DISCOVERY_URL), cache_discovery=False)
43 67.9922 MiB 18.4180 MiB result = service.spreadsheets().values().get(
44 67.9922 MiB 0.0000 MiB spreadsheetId=sheet_id, range="A1:O").execute()
45 67.9922 MiB 0.0000 MiB values = result.get("values", [])
46
47 67.9922 MiB 0.0000 MiB print("Got {} rows".format(len(values)))
Last iteration
Got 760 rows
Filename: ./main.py
Line # Mem usage Increment Line Contents
================================================
35 229.6055 MiB 229.6055 MiB @profile(precision=4)
36 def get_responses(creds):
37 """Fetch spreadsheet data."""
38 229.6055 MiB 0.0000 MiB sheet_id = "1TowKJrFVbT4Bfp-HFcMh_CZ5anfH0CLfmoqCz9SUr9c"
39
40 229.6055 MiB 0.0000 MiB http = creds.authorize(httplib2.Http())
41 229.6055 MiB 0.0000 MiB service = discovery.build("sheets", "v4", http=http,
42 229.6055 MiB 0.0000 MiB discoveryServiceUrl=(DISCOVERY_URL), cache_discovery=False)
43 229.6055 MiB 0.0000 MiB result = service.spreadsheets().values().get(
44 229.6055 MiB 0.0000 MiB spreadsheetId=sheet_id, range="A1:O").execute()
45 229.6055 MiB 0.0000 MiB values = result.get("values", [])
46
47 229.6055 MiB 0.0000 MiB print("Got {} rows".format(len(values)))
There’s clearly a memory leak, as the reproducer fetches the same data over and over again, yet the memory consumption keeps rising. Full log can be found here.
As a temporary workaround for one of my long-running applications I use an explicit garbage collector call, which mitigates this issue, at least for now:
...
import gc
...
result = service.spreadsheets().values().get(
spreadsheetId=sheet_id, range="A1:O").execute()
values = result.get("values", [])
gc.collect()
...
I went a little deeper, and the main culprit seems to be in the createMethod
function when creating dynamic method batchUpdate
:
Method 'batchUpdate, approx. __doc__ size: 2886834
<class 'function'>
Filename: /home/fsumsal/venv/googleapiclient/lib64/python3.6/site-packages/googleapiclient/discovery.py
Line # Mem usage Increment Line Contents
================================================
1064 48.7 MiB 48.7 MiB @profile
1065 def _add_basic_methods(self, resourceDesc, rootDesc, schema):
1066 # If this is the root Resource, add a new_batch_http_request() method.
1067 48.7 MiB 0.0 MiB if resourceDesc == rootDesc:
...
1086
1087 # Add basic methods to Resource
1088 48.7 MiB 0.0 MiB if 'methods' in resourceDesc:
1089 66.8 MiB 0.0 MiB for methodName, methodDesc in six.iteritems(resourceDesc['methods']):
1090 56.0 MiB 0.0 MiB fixedMethodName, method = createMethod(
1091 66.8 MiB 18.1 MiB methodName, methodDesc, rootDesc, schema)
1092 66.8 MiB 0.0 MiB print(type(method))
1093 66.8 MiB 0.0 MiB self._set_dynamic_attr(fixedMethodName,
1094 66.8 MiB 0.0 MiB method.__get__(self, self.__class__))
1095 # Add in _media methods. The functionality of the attached method will
1096 # change when it sees that the method name ends in _media.
1097 66.8 MiB 0.0 MiB if methodDesc.get('supportsMediaDownload', False):
1098 fixedMethodName, method = createMethod(
1099 methodName + '_media', methodDesc, rootDesc, schema)
1100 self._set_dynamic_attr(fixedMethodName,
1101 method.__get__(self, self.__class__))
(This method has a huge docstring.)
Nevertheless, there is probably a reference loop somewhere, as the gc.collect()
call manages to collect all those unreachable objects.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:24 (7 by maintainers)
Top GitHub Comments
I have a cron job, on google app engine, that reads data in from a google sheet. I am noticing the same memory leak (or maybe a different memory leak?). I tried the recommend work arounds: 1. creating the “sheets” object only once, and use gc.collect(). Neither worked in my case. As a a test, I changed the few lines of code that read data from a google sheet to read data from a database table, and the memory leak went away.
I never fixed it… in the short term, I used a high mem appengine instance so that would take longer to hit the memory threshold and then, as a long term solution, I switched to airtable instead of google sheets.