Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.to_json silently ignores index parameter for most orients.

See original GitHub issue

see https://github.com/pandas-dev/pandas/issues/25513#issuecomment-469264925

Code Sample, a copy-pastable example if possible

from __future__ import print_function

import pandas as pd

df = pd.DataFrame.from_records(
    [
        {'id': 'abc', 'data': 'qwerty'},
        {'id': 'def', 'data': 'uiop'}
    ],
    index='id'
)

print(df.to_json(orient='records', index=True))

# Prints out:
[{"data":"qwerty"},{"data":"uiop"}]

series = df.squeeze()
print(series.to_json(orient='records', index=True))

# Prints out:
["qwerty","uiop"]

Problem description

When creating a DataFrame that has two columns, one to be used as an index and another for the data, if you call .to_json(orient='records') the index is omitted. I know that in theory I should be using a Series for this, but I’m using it to convert a CSV file into JSONL and I don’t know what the CSV file is going to look like ahead of time.

In any case, squeezing the DataFrame into a Series doesn’t work either. In fact, the bug in Series.to_json is even worse, as it produces an array of strings instead of an array of dictionaries.

This bug is present in master.

Expected Output

Expected output is:

[{"id":"abc", "data":"qwerty"},{"id":"def","data":"uiop"}]

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.7.2.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.1 pytest: 4.3.0 pip: 19.0.3 setuptools: 40.8.0 Cython: 0.29.6 numpy: 1.16.2 scipy: None pyarrow: 0.12.1 xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.0 pytz: 2018.9 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.7 (dt dec pq3 ext lo64) jinja2: None s3fs: 0.2.0 fastparquet: 0.2.1 pandas_gbq: None pandas_datareader: None gcsfs: None

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:20 (14 by maintainers)

Top GitHub Comments

3reactions

ARDivekarcommented, Jan 16, 2020

FYI. For those looking for a solution, I got around this by doing a simple: df['index_name'] = df.index before calling df.to_json(...)

2reactions

tommyjcarpentercommented, Sep 10, 2021

I just want to chime in and say, the docs are very subtle w.r.t. this issue: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html

It is very easy to miss the line Note that index labels are not preserved with this encoding.

The fact that orient=records, index=True is a LEGAL combination of parameters, yet it …doesnt do that, is the unexpected behavior. Pandas basically says “we heard you but nah”. I would prefer orient=records, index=True be an illegal combination and explicitly tell the user, “you cant do that” [for some reason] rather than just silently ignoring the intent.

Perhaps the solution above df['index_name'] = df.index before calling df.to_json(...) should be mentioned in those docs to help the reader understand what to do, other than googling and eventually finding this issue.

It would also be nice if index=True just did this for you, so that it “did” work.

Top Results From Across the Web

pandas.DataFrame.to_json — pandas 1.5.2 documentation

Whether to include the index values in the JSON string. Not including the index ( index=False ) is only supported when orient is...

pyspark.pandas.DataFrame.to_json - Apache Spark

The index name in pandas-on-Spark is ignored. By default, the index is always lost. options: keyword arguments for additional options specific to PySpark....

Pandas dataframe to json without index - Stack Overflow

Credit to this answer on a similar question: https://stackoverflow.com/a/59438648/1056563 dfj = json.loads(df.to_json(orient='table',index=False)).

Source code for pandas.core.frame - OMFIT

DataFrame.from_dict(data, orient='index') 0 1 2 3 row_1 3 2 1 0 row_2 ... mypy: Too many arguments for "StataWriter" writer = statawriter( ...

databricks.koalas.DataFrame.to_json - Read the Docs

If 'orient' is 'records' write out line delimited json format. Will throw ValueError if incorrect 'orient' since others are not list like. It...