DataFrame.to_json silently ignores index parameter for most orients.
See original GitHub issuesee https://github.com/pandas-dev/pandas/issues/25513#issuecomment-469264925
Code Sample, a copy-pastable example if possible
from __future__ import print_function
import pandas as pd
df = pd.DataFrame.from_records(
[
{'id': 'abc', 'data': 'qwerty'},
{'id': 'def', 'data': 'uiop'}
],
index='id'
)
print(df.to_json(orient='records', index=True))
# Prints out:
[{"data":"qwerty"},{"data":"uiop"}]
series = df.squeeze()
print(series.to_json(orient='records', index=True))
# Prints out:
["qwerty","uiop"]
Problem description
When creating a DataFrame
that has two columns, one to be used as an index and another for the data, if you call .to_json(orient='records')
the index is omitted. I know that in theory I should be using a Series
for this, but I’m using it to convert a CSV file into JSONL and I don’t know what the CSV file is going to look like ahead of time.
In any case, squeezing the DataFrame
into a Series
doesn’t work either. In fact, the bug in Series.to_json
is even worse, as it produces an array of strings instead of an array of dictionaries.
This bug is present in master.
Expected Output
Expected output is:
[{"id":"abc", "data":"qwerty"},{"id":"def","data":"uiop"}]
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.7.2.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.24.1 pytest: 4.3.0 pip: 19.0.3 setuptools: 40.8.0 Cython: 0.29.6 numpy: 1.16.2 scipy: None pyarrow: 0.12.1 xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.0 pytz: 2018.9 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.7 (dt dec pq3 ext lo64) jinja2: None s3fs: 0.2.0 fastparquet: 0.2.1 pandas_gbq: None pandas_datareader: None gcsfs: None
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:20 (14 by maintainers)
FYI. For those looking for a solution, I got around this by doing a simple:
df['index_name'] = df.index
before callingdf.to_json(...)
I just want to chime in and say, the docs are very subtle w.r.t. this issue: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html
It is very easy to miss the line
Note that index labels are not preserved with this encoding.
The fact that
orient=records
,index=True
is a LEGAL combination of parameters, yet it …doesnt do that, is the unexpected behavior. Pandas basically says “we heard you but nah”. I would preferorient=records
,index=True
be an illegal combination and explicitly tell the user, “you cant do that” [for some reason] rather than just silently ignoring the intent.Perhaps the solution above
df['index_name'] = df.index before calling df.to_json(...)
should be mentioned in those docs to help the reader understand what to do, other than googling and eventually finding this issue.It would also be nice if
index=True
just did this for you, so that it “did” work.