Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

to_json should make separators configurable (similar to json.dump)

See original GitHub issue

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

print(pd.DataFrame([1]).to_json(indent=4))

outputs

{
    "0":{
        "0":1
    }
}

Problem description

Any JSON prettyfier (including bare VS Code) wants to replace all : by : . So output of to_json should behave as json.dump does, at least when indent is not None (in which case the user clearly does not care about file size):

If specified, separators should be an (item_separator, key_separator) tuple. The default is (', ', ': ') if indent is None and (',', ': ') otherwise. To get the most compact JSON representation, you should specify (‘,’, '😂 to eliminate whitespace.

https://docs.python.org/3/library/json.html#:~:text=To get the most compact JSON representation

Expected Output

{
    "0": {
        "0": 1
    }
}

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.8.2.final.0 python-bits : 64 OS : Linux OS-release : 4.12.14-lp151.28.32-default machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.3 numpy : 1.18.2 pytz : 2019.3 dateutil : 2.8.1 pip : 20.0.2 setuptools : 46.1.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.2.8 lxml.etree : 4.5.0 html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : 4.8.0 bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.5.0 matplotlib : 3.2.1 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : 1.2.8 numba : None

Issue Analytics

State:
Created 3 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

WillAydcommented, Apr 1, 2020

@ss-is-master-chief if you or whomever wants to work on this, I would suggest looking at the ujson source for the JSON_NO_EXTRA_WHITESPACE checks

https://github.com/pandas-dev/pandas/blob/25d893bb9a5d79afc6a88e84de89fa32b385ae09/pandas/_libs/src/ujson/lib/ultrajsonenc.c#L962

Right now this is a compile-time constant defined in ultrajson.h, but I think we can make a runtime check and just replace with if (enc->indent) > 0 checks.

This would just add spaces when indent is provided, which is a little different than the request but I still think solves the same problem

0reactions

jpmckinneycommented, May 3, 2022

I’d be happy to write a patch that passes through separators to json.dumps, the same as is done for indent.

Personally, it feels simpler to just pass any extra keyword arguments to json.dumps, but no problem with the explicit approach.

Edit: Oh, dang, ujson doesn’t support separators. I guess it’d have to be done like in https://github.com/pandas-dev/pandas/issues/33014#issuecomment-607494037