question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Floats can lose precision when loading to BigQuery

See original GitHub issue

The float precision is set here: https://github.com/pydata/pandas-gbq/blob/d251db03b159447331ac9ae63e13d295d75bad70/pandas_gbq/load.py#L22

This is insufficient to represent all 64 bit floats without losing precision. For example 26/59 should be represented as 0.4406779661016949 but under this it is represented as 0.440677966101695.

This was added intentionally here to fix a different issue but it causes us some issues as we need perfect reconciliation between systems. It seems like it should be possible to get the best of both worlds and output the correct number of digits in all cases.

The original suggestion was to use %g but this was changed to %.15g – it’s not clear to me what the rationale is for that, it seems like %g is strictly better but I’m sure I’m missing something.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
dkapitancommented, Sep 7, 2020

@danielchatfield

Not sure if this is helpful, but I think one of the issues as explained here is that a conservative choice is made in the number of significant digits.

A possible solution if you do need to have larger precision is to use .parquet format instead, as suggested here?

0reactions
tswastcommented, Nov 6, 2020

According to https://en.wikipedia.org/wiki/IEEE_754#Character_representation 17 digits are precision are required to preserve the original binary value. 16 digits was not enough in my testing of https://github.com/pydata/pandas-gbq/pull/336

Read more comments on GitHub >

github_iconTop Results From Across the Web

Floats can lose precision when loading to BigQuery · Issue #326
A possible solution if you do need to have larger precision is to use .parquet format instead, as suggested here? 1
Read more >
How can I fix the precision of float value problem by TRUNC or ...
I can use that method to be round the values. However, I want to also know the reason why this issue happens. Is...
Read more >
Data types | BigQuery - Google Cloud
Summation of floating point values might produce surprising results because of limited precision. For example, (1e30 + 1e-20) - 1e30 = 0 ,...
Read more >
8 Google BigQuery Data Types: A Comprehensive Guide
Float (Float 64): Numbers with approximate numeric values and fractional components; Numeric: There is a data type called 'NUMERIC' which is ...
Read more >
Data Types in Google BigQuery (Standard SQL) - Yuichi Otsuka
Looking at the two columns, we see a certain loss of precision if we use FLOAT . This does not happen all the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found