pd.testing.assert_frame_equal doesn't do precision according to the doc
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
import pandas.testing
df1 = pd.DataFrame([0.00016, -0.154526, -0.20580199999999998])
df2 = pd.DataFrame([0.00015981824253685772, -0.15452557802200317, -0.20580188930034637])
pd.testing.assert_frame_equal(df1, df2, check_exact=False, check_less_precise=3)
Problem description
This asserts, despite all columns being identical in the first 3 digits after the decimal point.
AssertionError: DataFrame.iloc[:, 0] are different
DataFrame.iloc[:, 0] values are different (33.33333 %)
[left]: [0.00016, -0.154526, -0.20580199999999998]
[right]: [0.00015981824253685772, -0.15452557802200317, -0.20580188930034637]
It doesn’t assert if check_less_precise=2
is used instead. So something is not right here. Is there some kind of a rounding issue here?
Doc:
check_less_precise : bool or int, default False
Specify comparison precision. Only used when check_exact is False. 5 digits (False) or 3 digits (True) after decimal points are compared. If int, then specify the digits to compare
I understand the doc says check_less_precise
defines how many digits after the decimal point are compared.
Unrelated: The doc should probably say “decimal point” (singular) as there is only one, no? and “specify the digits to compare” is vague, perhaps “In int, then specify how many digits after decimal point to compare”?
Here is a proposed updated doc entry:
Specify comparison precision. Only used when check_exact is False. int: How many digits after the decimal point to compare, False: 5 digits, True: 3 digits.
Expected Output
no assert for up to check_less_precise=4
in this example, the numbers start to diverge at digit 5.
and it’s still unclear whether rounding is performed or not.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None python: 3.7.1.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-43-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_CA.UTF-8 LANG: en_CA.UTF-8 LOCALE: en_CA.UTF-8
pandas: 0.24.0 pytest: 4.0.2 pip: 19.0.1 setuptools: 40.6.3 Cython: 0.29.2 numpy: 1.15.4 scipy: 1.2.0 pyarrow: None xarray: None IPython: 7.2.0 sphinx: None patsy: None dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: 1.2.1 tables: None numexpr: 2.6.9 feather: None matplotlib: 3.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: 4.2.5 bs4: 4.7.1 html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:13 (5 by maintainers)
Any updates?
Any updates? It’s been several updates, but the problem seems to persist.