read_json ignores dictionary as dtype
See original GitHub issueCode Sample, a copy-pastable example if possible
dtypes = {
'created': 'int64',
'eventType' : 'category',
'severity' : 'category'
}
df = pd.read_json('dataset.json', lines=True, dtype=dtypes)
df.info()
Results into:
created int64
eventType object
severity object
Using .astype() instead converts types correctly:
df.astype(dtypes).info()
created int64
eventType category
severity category
Problem description
Should take take appropriate data type during DataFrame loading from disc.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
How to read a json-dictionary type file - Stack Overflow
The json method doesnt work as the json file is not in the format it expects. As we can easily load a json...
Read more >pandas.json_normalize — pandas 1.5.2 documentation
Configures error handling. 'ignore' : will ignore KeyError if keys listed in meta are not always present. 'raise' : will raise KeyError ...
Read more >Different Ways to Change Data Type in pandas
While working in Pandas DataFrame or any table-like data structures we are often required to chang the data type(dtype) of a column also...
Read more >Python | Pandas DataFrame.astype() - GeeksforGeeks
errors : Control raising of exceptions on invalid data for provided dtype. raise : allow exceptions to be raised ignore : suppress exceptions....
Read more >Pandas DataFrame astype() Method - W3Schools
dtype, data type, or a dictionary with data types for each column: ... Default 'raise'. Specifies whether to ignore errors or raise an...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Try it out with following json file:
In case of using these
dtype
key word argment duringread_json
- pandas just ignores this setting (note data types are “object”, not “category” as specified indtypes
dictonary.If we use same
dtypes
dictionary on DataFrame’sastype
method - setting is applied (note correct data types):This raises problems with large datasets, when reading data with correct types decrease usage of RAM drasticly.
Has there been any updates to this. I am experiencing this issue with pandas 1.2.4.