Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

document pandas-gbq vision and roadmap

See original GitHub issue

Both pandas-gbq and google-cloud-bigquery are doing many of the same things, and increasingly so (e.g. .to_dataframe() in google-cloud-bigquery)

Are there different use cases? Can we define those?
Should we focus development on one and wrap the other? Even if not wholly, for a subset of functionality?
Is there some direction from Google? @tswast spends a lot of time on both libraries so he is probably best placed to offer guidance

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:12 (3 by maintainers)

Top GitHub Comments

2reactions

tswastcommented, Jul 19, 2021

In the interest in not keeping issues open forever, I’m going to treat this issue as a request to document the project vision/roadmap. That should be useful for contributors and also understanding the purpose of this project compared to using the pandas connector in google-cloud-bigquery directly.

2reactions

tswastcommented, Jan 4, 2019

To make this task more concrete, I’d like to propose the two following sub-tasks:

read_gbq calls google-cloud-bigquery’s to_dataframe under the covers. Now that pandas-gbq uses the same logic as pandas for null handling, I don’t expect any change in behavior.
- I don’t know how we’d implement a progress bar for downloading the dataframe. We may want to upstream the progress bar features (using tqdm) to google-cloud-bigquery library or add some sort of hook so that we can show progress bar.
to_gbq calls google-cloud-bigquery’s load_table_from_dataframe. load_table_from_dataframe uses Parquet rather CSV but is otherwise quite similar. It may work better with struct and array columns.
- Logic for overriding the schema will be trickier as the schema is actually defined in the Parquet file.
- Object columns (and thus nullable types) are not supported by to_parquet in pandas. "Non supported types [for pandas’s to_parquet] include Period and actual Python object types].
- Perhaps we want to wait on implementing the to_gbq logic until we have a better way to handle nullable columns in load_table_from_dataframe?

With the exception of schema overriding, I think it should be possible to implement these subtasks without changing the public interface of pandas-gbq.

Top Results From Across the Web

document pandas-gbq vision and roadmap #149 - GitHub

Built-in user-based authentication (3-legged OAuth, 3LO). Based on conversations with @jonparrott, I think it's probably never that google-cloud ...

Welcome to pandas-gbq's documentation! — pandas-gbq ...

The pandas_gbq module provides a wrapper for Google's BigQuery analytics web service to simplify retrieving results from BigQuery tables using SQL-like queries ...

Pandas-gbq: Google BigQuery Connector for Pandas - Morioh

pandas-gbq is a package providing an interface to the Google BigQuery API from pandas. Library Documentation · Product Documentation. Installation. Install ...

The CREATE MODEL statement | BigQuery ML - Google Cloud

Note: This syntax statement provides a comprehensive list of model types with their model options. When creating a model, use that model specific...

Creating a Roadmap: A Guide to Get You Started - ProductPlan

First, your organization has already determined your product's vision: the big-picture plan for what the product will accomplish in the market and for...