question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrame.to_json silently ignores index parameter for most orients.

See original GitHub issue

see https://github.com/pandas-dev/pandas/issues/25513#issuecomment-469264925

Code Sample, a copy-pastable example if possible

from __future__ import print_function

import pandas as pd

df = pd.DataFrame.from_records(
    [
        {'id': 'abc', 'data': 'qwerty'},
        {'id': 'def', 'data': 'uiop'}
    ],
    index='id'
)

print(df.to_json(orient='records', index=True))

# Prints out:
[{"data":"qwerty"},{"data":"uiop"}]

series = df.squeeze()
print(series.to_json(orient='records', index=True))

# Prints out:
["qwerty","uiop"]

Problem description

When creating a DataFrame that has two columns, one to be used as an index and another for the data, if you call .to_json(orient='records') the index is omitted. I know that in theory I should be using a Series for this, but I’m using it to convert a CSV file into JSONL and I don’t know what the CSV file is going to look like ahead of time.

In any case, squeezing the DataFrame into a Series doesn’t work either. In fact, the bug in Series.to_json is even worse, as it produces an array of strings instead of an array of dictionaries.

This bug is present in master.

Expected Output

Expected output is:

[{"id":"abc", "data":"qwerty"},{"id":"def","data":"uiop"}]

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.7.2.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.1 pytest: 4.3.0 pip: 19.0.3 setuptools: 40.8.0 Cython: 0.29.6 numpy: 1.16.2 scipy: None pyarrow: 0.12.1 xarray: None IPython: None sphinx: None patsy: None dateutil: 2.8.0 pytz: 2018.9 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.7 (dt dec pq3 ext lo64) jinja2: None s3fs: 0.2.0 fastparquet: 0.2.1 pandas_gbq: None pandas_datareader: None gcsfs: None

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:2
  • Comments:20 (14 by maintainers)

github_iconTop GitHub Comments

3reactions
ARDivekarcommented, Jan 16, 2020

FYI. For those looking for a solution, I got around this by doing a simple: df['index_name'] = df.index before calling df.to_json(...)

2reactions
tommyjcarpentercommented, Sep 10, 2021

I just want to chime in and say, the docs are very subtle w.r.t. this issue: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html

It is very easy to miss the line Note that index labels are not preserved with this encoding.

The fact that orient=records, index=True is a LEGAL combination of parameters, yet it …doesnt do that, is the unexpected behavior. Pandas basically says “we heard you but nah”. I would prefer orient=records, index=True be an illegal combination and explicitly tell the user, “you cant do that” [for some reason] rather than just silently ignoring the intent.

Perhaps the solution above df['index_name'] = df.index before calling df.to_json(...) should be mentioned in those docs to help the reader understand what to do, other than googling and eventually finding this issue.

It would also be nice if index=True just did this for you, so that it “did” work.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.DataFrame.to_json — pandas 1.5.2 documentation
Whether to include the index values in the JSON string. Not including the index ( index=False ) is only supported when orient is...
Read more >
pyspark.pandas.DataFrame.to_json - Apache Spark
The index name in pandas-on-Spark is ignored. By default, the index is always lost. options: keyword arguments for additional options specific to PySpark....
Read more >
Pandas dataframe to json without index - Stack Overflow
Credit to this answer on a similar question: https://stackoverflow.com/a/59438648/1056563 dfj = json.loads(df.to_json(orient='table',index=False)).
Read more >
Source code for pandas.core.frame - OMFIT
DataFrame.from_dict(data, orient='index') 0 1 2 3 row_1 3 2 1 0 row_2 ... mypy: Too many arguments for "StataWriter" writer = statawriter( ...
Read more >
databricks.koalas.DataFrame.to_json - Read the Docs
If 'orient' is 'records' write out line delimited json format. Will throw ValueError if incorrect 'orient' since others are not list like. It...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found