Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pd.read_json yields: OSError: [Errno 22] Invalid argument

See original GitHub issue

Code Sample, a copy-pastable example if possible

data = '/Users/davidleifer/Desktop/Geog500/thesis/data/merged-file.json'
df = pd.read_json(data, lines=True)

Problem description

The JSON file contains Twitter data scraped using their API. I’ve limited the files to 10,000 tweets per file. I clean the files using this process:

Merge files in directory using: cat * > merged-file.json
Remove blank lines in Sublime Text using Find and Replace: ^\n.

Here is an example Tweet (one tweet per line):

{“created_at”:“Thu Nov 02 08:08:01 +0000 2017”,“id”:925997914136002562,“id_str”:“925997914136002562”,“text”:“#RussianGate #FollowTheFacts #Resist #FakePresident #GOP #War #Vote #ClimateChange #Peace #Animals #Women https://t.co/xe7AEdod1Y”,“display_text_range”:[0,105],“source”:“\u003ca href="http://twitter.com" rel="nofollow"\u003eTwitter Web Client\u003c/a\u003e”,“truncated”:false,“in_reply_to_status_id”:null,“in_reply_to_status_id_str”:null,“in_reply_to_user_id”:null,“in_reply_to_user_id_str”:null,“in_reply_to_screen_name”:null,“user”:{“id”:760436942,“id_str”:“760436942”,“name”:“Athoughtz”,“screen_name”:“athoughtz”,“location”:“United States”,“url”:null,“description”:“#RussianGate #FollowTheFacts #Resist #FakePresident #GOP #War #Vote #ClimateChange #Peace #Animals #Women”,“translator_type”:“none”,“protected”:false,“verified”:false,“followers_count”:5063,“friends_count”:5064,“listed_count”:142,“favourites_count”:659,“statuses_count”:62057,“created_at”:“Thu Aug 16 00:11:12 +0000 2012”,“utc_offset”:-25200,“time_zone”:“Arizona”,“geo_enabled”:false,“lang”:“en”,“contributors_enabled”:false,“is_translator”:false,“profile_background_color”:“C0DEED”,“profile_background_image_url”:“http://abs.twimg.com/images/themes/theme1/bg.png”,“profile_background_image_url_https”:“https://abs.twimg.com/images/themes/theme1/bg.png”,“profile_background_tile”:false,“profile_link_color”:“1DA1F2”,“profile_sidebar_border_color”:“C0DEED”,“profile_sidebar_fill_color”:“DDEEF6”,“profile_text_color”:“333333”,“profile_use_background_image”:true,“profile_image_url”:“http://pbs.twimg.com/profile_images/378800000835488491/565d1bd43c8b0a615b8a39887e52ef2c_normal.jpeg”,“profile_image_url_https”:“https://pbs.twimg.com/profile_images/378800000835488491/565d1bd43c8b0a615b8a39887e52ef2c_normal.jpeg”,“default_profile”:true,“default_profile_image”:false,“following”:null,“follow_request_sent”:null,“notifications”:null},“geo”:null,“coordinates”:null,“place”:null,“contributors”:null,“is_quote_status”:false,“quote_count”:0,“reply_count”:0,“retweet_count”:0,“favorite_count”:0,“entities”:{“hashtags”:[{“text”:“RussianGate”,“indices”:[0,12]},{“text”:“FollowTheFacts”,“indices”:[13,28]},{“text”:“Resist”,“indices”:[29,36]},{“text”:“FakePresident”,“indices”:[37,51]},{“text”:“GOP”,“indices”:[52,56]},{“text”:“War”,“indices”:[57,61]},{“text”:“Vote”,“indices”:[62,67]},{“text”:“ClimateChange”,“indices”:[68,82]},{“text”:“Peace”,“indices”:[83,89]},{“text”:“Animals”,“indices”:[90,98]},{“text”:“Women”,“indices”:[99,105]}],“urls”:[],“user_mentions”:[],“symbols”:[],“media”:[{“id”:925997885778378752,“id_str”:“925997885778378752”,“indices”:[106,129],“media_url”:“http://pbs.twimg.com/media/DNnOK8SVQAAUS6Z.jpg”,“media_url_https”:“https://pbs.twimg.com/media/DNnOK8SVQAAUS6Z.jpg”,“url”:“https://t.co/xe7AEdod1Y”,“display_url”:“pic.twitter.com/xe7AEdod1Y”,“expanded_url”:“https://twitter.com/athoughtz/status/925997914136002562/photo/1”,“type”:“photo”,“sizes”:{“medium”:{“w”:600,“h”:585,“resize”:“fit”},“small”:{“w”:600,“h”:585,“resize”:“fit”},“thumb”:{“w”:150,“h”:150,“resize”:“crop”},“large”:{“w”:600,“h”:585,“resize”:“fit”}}}]},“extended_entities”:{“media”:[{“id”:925997885778378752,“id_str”:“925997885778378752”,“indices”:[106,129],“media_url”:“http://pbs.twimg.com/media/DNnOK8SVQAAUS6Z.jpg”,“media_url_https”:“https://pbs.twimg.com/media/DNnOK8SVQAAUS6Z.jpg”,“url”:“https://t.co/xe7AEdod1Y”,“display_url”:“pic.twitter.com/xe7AEdod1Y”,“expanded_url”:“https://twitter.com/athoughtz/status/925997914136002562/photo/1”,“type”:“photo”,“sizes”:{“medium”:{“w”:600,“h”:585,“resize”:“fit”},“small”:{“w”:600,“h”:585,“resize”:“fit”},“thumb”:{“w”:150,“h”:150,“resize”:“crop”},“large”:{“w”:600,“h”:585,“resize”:“fit”}}}]},“favorited”:false,“retweeted”:false,“possibly_sensitive”:false,“filter_level”:“low”,“lang”:“und”,“timestamp_ms”:“1509610081596”} {“created_at”:“Thu Nov 02 08:08:02 +0000 2017”,“id”:925997918795866113,“id_str”:“925997918795866113”,“text”:“RT @CGTNOfficial: Survey released on Chinese public awareness of #climatechange https://t.co/q92jAnobmd”,“source”:“\u003ca href="http://nosudo.co" rel="nofollow"\u003eQxNews-python\u003c/a\u003e”,“truncated”:false,“in_reply_to_status_id”:null,“in_reply_to_status_id_str”:null,“in_reply_to_user_id”:null,“in_reply_to_user_id_str”:null,“in_reply_to_screen_name”:null,“user”:{“id”:1664059166,“id_str”:“1664059166”,“name”:“Question News”,“screen_name”:“QxNews”,“location”:“USA”,“url”:null,“description”:“Interrogare Semper | News bot/humans via retweets | 1 min per retweet”,“translator_type”:“none”,“protected”:false,“verified”:false,“followers_count”:3254,“friends_count”:271,“listed_count”:2786,“favourites_count”:38,“statuses_count”:1018592,“created_at”:“Mon Aug 12 03:35:37 +0000 2013”,“utc_offset”:-25200,“time_zone”:“Pacific Time (US & Canada)”,“geo_enabled”:false,“lang”:“en”,“contributors_enabled”:false,“is_translator”:false,“profile_background_color”:“000000”,“profile_background_image_url”:“http://pbs.twimg.com/profile_background_images/514662332492816384/TuhAkn7d.jpeg”,“profile_background_image_url_https”:“https://pbs.twimg.com/profile_background_images/514662332492816384/TuhAkn7d.jpeg”,“profile_background_tile”:false,“profile_link_color”:“000000”,“profile_sidebar_border_color”:“FFFFFF”,“profile_sidebar_fill_color”:“DDEEF6”,“profile_text_color”:“333333”,“profile_use_background_image”:true,“profile_image_url”:“http://pbs.twimg.com/profile_images/597288578092240896/ePlmSYCH_normal.png”,“profile_image_url_https”:“https://pbs.twimg.com/profile_images/597288578092240896/ePlmSYCH_normal.png”,“profile_banner_url”:“https://pbs.twimg.com/profile_banners/1664059166/1484679111”,“default_profile”:false,“default_profile_image”:false,“following”:null,“follow_request_sent”:null,“notifications”:null},“geo”:null,“coordinates”:null,“place”:null,“contributors”:null,“retweeted_status”:{“created_at”:“Thu Nov 02 07:55:00 +0000 2017”,“id”:925994638019825664,“id_str”:“925994638019825664”,“text”:“Survey released on Chinese public awareness of #climatechange https://t.co/q92jAnobmd”,“source”:“\u003ca href="https://about.twitter.com/products/tweetdeck" rel="nofollow"\u003eTweetDeck\u003c/a\u003e”,“truncated”:false,“in_reply_to_status_id”:null,“in_reply_to_status_id_str”:null,“in_reply_to_user_id”:null,“in_reply_to_user_id_str”:null,“in_reply_to_screen_name”:null,“user”:{“id”:1115874631,“id_str”:“1115874631”,“name”:“CGTN”,“screen_name”:“CGTNOfficial”,“location”:“Beijing, China”,“url”:“http://www.CGTN.com”,“description”:“China Global Television Network, or CGTN, is a multi-language, multi-platform media grouping.”,“translator_type”:“none”,“protected”:false,“verified”:true,“followers_count”:4828619,“friends_count”:53,“listed_count”:4517,“favourites_count”:32,“statuses_count”:39079,“created_at”:“Thu Jan 24 03:18:59 +0000 2013”,“utc_offset”:28800,“time_zone”:“Beijing”,“geo_enabled”:true,“lang”:“en”,“contributors_enabled”:false,“is_translator”:false,“profile_background_color”:“131516”,“profile_background_image_url”:“http://pbs.twimg.com/profile_background_images/378800000169084583/SqpyvnvQ.jpeg”,“profile_background_image_url_https”:“https://pbs.twimg.com/profile_background_images/378800000169084583/SqpyvnvQ.jpeg”,“profile_background_tile”:true,“profile_link_color”:“009999”,“profile_sidebar_border_color”:“FFFFFF”,“profile_sidebar_fill_color”:“EFEFEF”,“profile_text_color”:“333333”,“profile_use_background_image”:true,“profile_image_url”:“http://pbs.twimg.com/profile_images/815049165508112384/wJA8jWZh_normal.jpg”,“profile_image_url_https”:“https://pbs.twimg.com/profile_images/815049165508112384/wJA8jWZh_normal.jpg”,“profile_banner_url”:“https://pbs.twimg.com/profile_banners/1115874631/1483157766”,“default_profile”:false,“default_profile_image”:false,“following”:null,“follow_request_sent”:null,“notifications”:null},“geo”:null,“coordinates”:null,“place”:null,“contributors”:null,“is_quote_status”:false,“quote_count”:0,“reply_count”:0,“retweet_count”:10,“favorite_count”:25,“entities”:{“hashtags”:[{“text”:“climatechange”,“indices”:[47,61]}],“urls”:[{“url”:“https://t.co/q92jAnobmd”,“expanded_url”:“https://news.cgtn.com/news/794d7a4e33597a6333566d54/share_p.html”,“display_url”:“news.cgtn.com/news/794d7a4e3\u2026”,“indices”:[62,85]}],“user_mentions”:[],“symbols”:[]},“favorited”:false,“retweeted”:false,“possibly_sensitive”:false,“filter_level”:“low”,“lang”:“en”},“is_quote_status”:false,“quote_count”:0,“reply_count”:0,“retweet_count”:0,“favorite_count”:0,“entities”:{“hashtags”:[{“text”:“climatechange”,“indices”:[65,79]}],“urls”:[{“url”:“https://t.co/q92jAnobmd”,“expanded_url”:“https://news.cgtn.com/news/794d7a4e33597a6333566d54/share_p.html”,“display_url”:“news.cgtn.com/news/794d7a4e3\u2026”,“indices”:[80,103]}],“user_mentions”:[{“screen_name”:“CGTNOfficial”,“name”:“CGTN”,“id”:1115874631,“id_str”:“1115874631”,“indices”:[3,16]}],“symbols”:[]},“favorited”:false,“retweeted”:false,“possibly_sensitive”:false,“filter_level”:“low”,“lang”:“en”,“timestamp_ms”:“1509610082707”}

I get this error:

OSError Traceback (most recent call last) <ipython-input-4-5322def5edd5> in <module>() ----> 1 df = pd.read_json(data, lines=True)

/Users/davidleifer/anaconda/lib/python3.5/site-packages/pandas/io/json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines) 214 if exists: 215 with _get_handle(filepath_or_buffer, ‘r’, encoding=encoding) as fh: –> 216 json = fh.read() 217 else: 218 json = filepath_or_buffer

OSError: [Errno 22] Invalid argument

Expected Output

Loading the JSON into a pandas dataframe.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.0 nose: 1.3.7 pip: 9.0.1 setuptools: 36.2.7 Cython: 0.24 numpy: 1.13.2 scipy: 0.19.1 statsmodels: 0.6.1 xarray: None IPython: 4.2.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: 1.1.0 tables: 3.3.0 numexpr: 2.6.2 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: None html5lib: 0.999999999 httplib2: 0.9.2 apiclient: 1.5.1 sqlalchemy: 1.0.13 pymysql: None psycopg2: 2.6.2 (dt dec pq3 ext lo64) jinja2: 2.8 boto: 2.48.0 pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Reactions:2
Comments:26 (7 by maintainers)

Top GitHub Comments

8reactions

mariskaascommented, Aug 8, 2018

Same bug with pd.to_json from a CSV file. CSV file is only 700mb, I can in fact change it to json the long way, but it gives a slightly different format than I would like. Pandas version is 0.23.4.

4reactions

fercookcommented, Aug 7, 2018

Hit the same bug with a proper jsonlines file of 13GB on macOS and Pandas 0.23.0. Please reopen the issue

Top Results From Across the Web

OSError: [Errno 22] when I try to .read() a json file

It appears that this is some kind of bug that occurs when the file is too large (my file was ~10GB). Once I...

pandas.read_json — pandas 1.5.2 documentation

Indication of expected JSON string format. Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible...

Handling OSError exception in Python - GeeksforGeeks

Let us see how to handle OSError Exceptions in Python. OSError is a built-in exception in Python and serves as the error class...

[SOLVED] OSError: [Errno 22] Invalid argument:... - YouTube

SOLVED] [FIXED] # OSError : [ Errno 22 ] Invalid argument : 'yourfilename' appears when trying to open/read/write/create a file in #Python 3....

[SOLVED] OSError: [Errno 22] Invalid argument - YouTube

This tutorial will help you fix your File reading issue. # oserror #Howto # invalid #python.