Performance
See original GitHub issueWe’re starting to use BigQuery heavily but becoming increasingly ‘bottlenecked’ with the performance of moving moderate amounts of data from BigQuery to python.
Here’s a few stats:
- 29.1s: Pulling 500k rows with 3 columns of data (with cached data) using pandas-gbq
- 36.5s: Pulling the same query with
google-cloud-bigquery- i.e.client.query(query)..to_dataframe() - 2.4s: Pulling very similar data - same types, same size, from our existing MSSQL box hosted in AWS (using
pd.read_sql). That’s on standard drivers, nothing liketurbodbcinvolved
…so using BigQuery with python is at least an order of magnitude slower than traditional DBs.
We’ve tried exporting tables to CSV on GCS and reading those in, which works fairly well for data processes, though not for exploration.
A few questions - feel free to jump in with partial replies:
- Are these results expected, or are we doing something very wrong?
- My prior is that a lot of this slowdown is caused by pulling in HTTP pages, converting to python objects, and then writing those into arrays. Is this approach really scalable? Should
pandas-gbqinvest resources into getting a format that’s query-able in exploratory workflows that can deal with more reasonable datasets? (or at least encourage Google to)
Issue Analytics
- State:
- Created 6 years ago
- Reactions:12
- Comments:45 (20 by maintainers)
Top Results From Across the Web
Performance Definition & Meaning - Merriam-Webster
1 · the execution of an action · something accomplished : deed, feat ; 3 · the action of representing a character in...
Read more >Performance Bicycle - Your Next Best Ride
Shop road, mountain & gravel bikes. Huge selection of parts, components & clothing from Specialized, Shimano & more!
Read more >Performance - Wikipedia
A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the...
Read more >86 Synonyms & Antonyms for PERFORMANCE - Thesaurus.com
Find 86 ways to say PERFORMANCE, along with antonyms, related words, and example sentences at Thesaurus.com, the world's most trusted free thesaurus.
Read more >Performance (1970) - IMDb
Rock superstar Mick Jagger and James Fox star in this stunning reality/fantasy trip set. Play trailer ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Released today (https://cloud.google.com/bigquery/docs/release-notes#february_22_2019) the BigQuery Storage API. It should make getting data into pandas a lot faster: https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas
I’m thinking of adding a parameter:
use_bqstorage=Falsetoread_gbq()to optionally use this API.I decided to do some profiling today to see where all the time is spent, following this Python profiling guide.
pandas_gbq_bench.py:
We see very similar results from google-cloud-bigquery to_dataframe:
Most of the time is spent actually waiting on the BigQuery API. JSON parsing is a much smaller fraction.