question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: option efficient memory use by downcasting

See original GitHub issue

We use pandas-gbq a lot for our daily analyses. It is known that memory consumption can be a pain, see e.g. https://www.dataquest.io/blog/pandas-big-data/

I have started to write a patch, which could be integrated into an enhancement for read_gbq (rough idea, details TBD):

  • Provide boolean optimize_memory option
  • If True, the source table is inspected with a query to get min, max, presence of nulls and % of unique number of strings for INTEGER and STRING columns, respectively
  • When calling to_dataframe this information is passed to the dtypes option, downcasting integers to the appropriate numpy (u)int type, and converting strings to pandas category type at some threshold (less than 50% of unique values)

I already have a working monkey-patch, which is still a bit rough. If there is enough interest I’d happily make it more robust and submit a PR. Would be my first significant contribution to an open source project, so some help and feedback would be appreciated.

Curious to hear your views on this.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:23 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
tswastcommented, May 17, 2019

Oops, I should read more closely, I think I just proposed your user story. 😃

1reaction
tswastcommented, May 20, 2019

Please let me know if this OK (assuming it can be fixed during a PR).

I can probably clean it up if you mail the PR and check the “allow edits from maintainers” box. Don’t worry about it too much.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ENH: option efficient memory use by downcasting · Issue #275
reflects on existing table or query results to determine current SQL types; does calculation in standardSQL to determine optimal pandas dtypes ...
Read more >
Downcasting - Oracle Help Center
This downcast inquiry into the actual type of an object can be used when the object has been passed to some general facility,...
Read more >
java - Is the performance/memory benefit of short nullified by ...
I.e. when the short is in use, you gain no memory or performance benefit ... Down casting from int to short happens at...
Read more >
New Data Array Layouts in VTK 7.1 - Kitware Inc.
This post will guide you through how to use these tools. ... VTK 7.1 comes significant improvements to the efficiency and interoperability of...
Read more >
Guidelines for the Visual Impact Assessment of Highway ...
The FHWA guidelines were initially used in training classes for personnel in State ... The Intermodal Surface Transportation Efficiency Act of 1991 (ISTEA) ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found