question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pd.read_json yields: OSError: [Errno 22] Invalid argument

See original GitHub issue

Code Sample, a copy-pastable example if possible

data = '/Users/davidleifer/Desktop/Geog500/thesis/data/merged-file.json'
df = pd.read_json(data, lines=True)

Problem description

The JSON file contains Twitter data scraped using their API. I’ve limited the files to 10,000 tweets per file. I clean the files using this process:

  1. Merge files in directory using: cat * > merged-file.json
  2. Remove blank lines in Sublime Text using Find and Replace: ^\n.

Here is an example Tweet (one tweet per line):

{“created_at”:“Thu Nov 02 08:08:01 +0000 2017”,“id”:925997914136002562,“id_str”:“925997914136002562”,“text”:“#RussianGate #FollowTheFacts #Resist #FakePresident #GOP #War #Vote #ClimateChange #Peace #Animals #Women https://t.co/xe7AEdod1Y”,“display_text_range”:[0,105],“source”:“\u003ca href="http://twitter.com" rel="nofollow"\u003eTwitter Web Client\u003c/a\u003e”,“truncated”:false,“in_reply_to_status_id”:null,“in_reply_to_status_id_str”:null,“in_reply_to_user_id”:null,“in_reply_to_user_id_str”:null,“in_reply_to_screen_name”:null,“user”:{“id”:760436942,“id_str”:“760436942”,“name”:“Athoughtz”,“screen_name”:“athoughtz”,“location”:“United States”,“url”:null,“description”:“#RussianGate #FollowTheFacts #Resist #FakePresident #GOP #War #Vote #ClimateChange #Peace #Animals #Women”,“translator_type”:“none”,“protected”:false,“verified”:false,“followers_count”:5063,“friends_count”:5064,“listed_count”:142,“favourites_count”:659,“statuses_count”:62057,“created_at”:“Thu Aug 16 00:11:12 +0000 2012”,“utc_offset”:-25200,“time_zone”:“Arizona”,“geo_enabled”:false,“lang”:“en”,“contributors_enabled”:false,“is_translator”:false,“profile_background_color”:“C0DEED”,“profile_background_image_url”:“http://abs.twimg.com/images/themes/theme1/bg.png”,“profile_background_image_url_https”:“https://abs.twimg.com/images/themes/theme1/bg.png”,“profile_background_tile”:false,“profile_link_color”:“1DA1F2”,“profile_sidebar_border_color”:“C0DEED”,“profile_sidebar_fill_color”:“DDEEF6”,“profile_text_color”:“333333”,“profile_use_background_image”:true,“profile_image_url”:“http://pbs.twimg.com/profile_images/378800000835488491/565d1bd43c8b0a615b8a39887e52ef2c_normal.jpeg”,“profile_image_url_https”:“https://pbs.twimg.com/profile_images/378800000835488491/565d1bd43c8b0a615b8a39887e52ef2c_normal.jpeg”,“default_profile”:true,“default_profile_image”:false,“following”:null,“follow_request_sent”:null,“notifications”:null},“geo”:null,“coordinates”:null,“place”:null,“contributors”:null,“is_quote_status”:false,“quote_count”:0,“reply_count”:0,“retweet_count”:0,“favorite_count”:0,“entities”:{“hashtags”:[{“text”:“RussianGate”,“indices”:[0,12]},{“text”:“FollowTheFacts”,“indices”:[13,28]},{“text”:“Resist”,“indices”:[29,36]},{“text”:“FakePresident”,“indices”:[37,51]},{“text”:“GOP”,“indices”:[52,56]},{“text”:“War”,“indices”:[57,61]},{“text”:“Vote”,“indices”:[62,67]},{“text”:“ClimateChange”,“indices”:[68,82]},{“text”:“Peace”,“indices”:[83,89]},{“text”:“Animals”,“indices”:[90,98]},{“text”:“Women”,“indices”:[99,105]}],“urls”:[],“user_mentions”:[],“symbols”:[],“media”:[{“id”:925997885778378752,“id_str”:“925997885778378752”,“indices”:[106,129],“media_url”:“http://pbs.twimg.com/media/DNnOK8SVQAAUS6Z.jpg”,“media_url_https”:“https://pbs.twimg.com/media/DNnOK8SVQAAUS6Z.jpg”,“url”:“https://t.co/xe7AEdod1Y”,“display_url”:“pic.twitter.com/xe7AEdod1Y”,“expanded_url”:“https://twitter.com/athoughtz/status/925997914136002562/photo/1”,“type”:“photo”,“sizes”:{“medium”:{“w”:600,“h”:585,“resize”:“fit”},“small”:{“w”:600,“h”:585,“resize”:“fit”},“thumb”:{“w”:150,“h”:150,“resize”:“crop”},“large”:{“w”:600,“h”:585,“resize”:“fit”}}}]},“extended_entities”:{“media”:[{“id”:925997885778378752,“id_str”:“925997885778378752”,“indices”:[106,129],“media_url”:“http://pbs.twimg.com/media/DNnOK8SVQAAUS6Z.jpg”,“media_url_https”:“https://pbs.twimg.com/media/DNnOK8SVQAAUS6Z.jpg”,“url”:“https://t.co/xe7AEdod1Y”,“display_url”:“pic.twitter.com/xe7AEdod1Y”,“expanded_url”:“https://twitter.com/athoughtz/status/925997914136002562/photo/1”,“type”:“photo”,“sizes”:{“medium”:{“w”:600,“h”:585,“resize”:“fit”},“small”:{“w”:600,“h”:585,“resize”:“fit”},“thumb”:{“w”:150,“h”:150,“resize”:“crop”},“large”:{“w”:600,“h”:585,“resize”:“fit”}}}]},“favorited”:false,“retweeted”:false,“possibly_sensitive”:false,“filter_level”:“low”,“lang”:“und”,“timestamp_ms”:“1509610081596”} {“created_at”:“Thu Nov 02 08:08:02 +0000 2017”,“id”:925997918795866113,“id_str”:“925997918795866113”,“text”:“RT @CGTNOfficial: Survey released on Chinese public awareness of #climatechange https://t.co/q92jAnobmd”,“source”:“\u003ca href="http://nosudo.co" rel="nofollow"\u003eQxNews-python\u003c/a\u003e”,“truncated”:false,“in_reply_to_status_id”:null,“in_reply_to_status_id_str”:null,“in_reply_to_user_id”:null,“in_reply_to_user_id_str”:null,“in_reply_to_screen_name”:null,“user”:{“id”:1664059166,“id_str”:“1664059166”,“name”:“Question News”,“screen_name”:“QxNews”,“location”:“USA”,“url”:null,“description”:“Interrogare Semper | News bot/humans via retweets | 1 min per retweet”,“translator_type”:“none”,“protected”:false,“verified”:false,“followers_count”:3254,“friends_count”:271,“listed_count”:2786,“favourites_count”:38,“statuses_count”:1018592,“created_at”:“Mon Aug 12 03:35:37 +0000 2013”,“utc_offset”:-25200,“time_zone”:“Pacific Time (US & Canada)”,“geo_enabled”:false,“lang”:“en”,“contributors_enabled”:false,“is_translator”:false,“profile_background_color”:“000000”,“profile_background_image_url”:“http://pbs.twimg.com/profile_background_images/514662332492816384/TuhAkn7d.jpeg”,“profile_background_image_url_https”:“https://pbs.twimg.com/profile_background_images/514662332492816384/TuhAkn7d.jpeg”,“profile_background_tile”:false,“profile_link_color”:“000000”,“profile_sidebar_border_color”:“FFFFFF”,“profile_sidebar_fill_color”:“DDEEF6”,“profile_text_color”:“333333”,“profile_use_background_image”:true,“profile_image_url”:“http://pbs.twimg.com/profile_images/597288578092240896/ePlmSYCH_normal.png”,“profile_image_url_https”:“https://pbs.twimg.com/profile_images/597288578092240896/ePlmSYCH_normal.png”,“profile_banner_url”:“https://pbs.twimg.com/profile_banners/1664059166/1484679111”,“default_profile”:false,“default_profile_image”:false,“following”:null,“follow_request_sent”:null,“notifications”:null},“geo”:null,“coordinates”:null,“place”:null,“contributors”:null,“retweeted_status”:{“created_at”:“Thu Nov 02 07:55:00 +0000 2017”,“id”:925994638019825664,“id_str”:“925994638019825664”,“text”:“Survey released on Chinese public awareness of #climatechange https://t.co/q92jAnobmd”,“source”:“\u003ca href="https://about.twitter.com/products/tweetdeck" rel="nofollow"\u003eTweetDeck\u003c/a\u003e”,“truncated”:false,“in_reply_to_status_id”:null,“in_reply_to_status_id_str”:null,“in_reply_to_user_id”:null,“in_reply_to_user_id_str”:null,“in_reply_to_screen_name”:null,“user”:{“id”:1115874631,“id_str”:“1115874631”,“name”:“CGTN”,“screen_name”:“CGTNOfficial”,“location”:“Beijing, China”,“url”:“http://www.CGTN.com”,“description”:“China Global Television Network, or CGTN, is a multi-language, multi-platform media grouping.”,“translator_type”:“none”,“protected”:false,“verified”:true,“followers_count”:4828619,“friends_count”:53,“listed_count”:4517,“favourites_count”:32,“statuses_count”:39079,“created_at”:“Thu Jan 24 03:18:59 +0000 2013”,“utc_offset”:28800,“time_zone”:“Beijing”,“geo_enabled”:true,“lang”:“en”,“contributors_enabled”:false,“is_translator”:false,“profile_background_color”:“131516”,“profile_background_image_url”:“http://pbs.twimg.com/profile_background_images/378800000169084583/SqpyvnvQ.jpeg”,“profile_background_image_url_https”:“https://pbs.twimg.com/profile_background_images/378800000169084583/SqpyvnvQ.jpeg”,“profile_background_tile”:true,“profile_link_color”:“009999”,“profile_sidebar_border_color”:“FFFFFF”,“profile_sidebar_fill_color”:“EFEFEF”,“profile_text_color”:“333333”,“profile_use_background_image”:true,“profile_image_url”:“http://pbs.twimg.com/profile_images/815049165508112384/wJA8jWZh_normal.jpg”,“profile_image_url_https”:“https://pbs.twimg.com/profile_images/815049165508112384/wJA8jWZh_normal.jpg”,“profile_banner_url”:“https://pbs.twimg.com/profile_banners/1115874631/1483157766”,“default_profile”:false,“default_profile_image”:false,“following”:null,“follow_request_sent”:null,“notifications”:null},“geo”:null,“coordinates”:null,“place”:null,“contributors”:null,“is_quote_status”:false,“quote_count”:0,“reply_count”:0,“retweet_count”:10,“favorite_count”:25,“entities”:{“hashtags”:[{“text”:“climatechange”,“indices”:[47,61]}],“urls”:[{“url”:“https://t.co/q92jAnobmd”,“expanded_url”:“https://news.cgtn.com/news/794d7a4e33597a6333566d54/share_p.html”,“display_url”:“news.cgtn.com/news/794d7a4e3\u2026”,“indices”:[62,85]}],“user_mentions”:[],“symbols”:[]},“favorited”:false,“retweeted”:false,“possibly_sensitive”:false,“filter_level”:“low”,“lang”:“en”},“is_quote_status”:false,“quote_count”:0,“reply_count”:0,“retweet_count”:0,“favorite_count”:0,“entities”:{“hashtags”:[{“text”:“climatechange”,“indices”:[65,79]}],“urls”:[{“url”:“https://t.co/q92jAnobmd”,“expanded_url”:“https://news.cgtn.com/news/794d7a4e33597a6333566d54/share_p.html”,“display_url”:“news.cgtn.com/news/794d7a4e3\u2026”,“indices”:[80,103]}],“user_mentions”:[{“screen_name”:“CGTNOfficial”,“name”:“CGTN”,“id”:1115874631,“id_str”:“1115874631”,“indices”:[3,16]}],“symbols”:[]},“favorited”:false,“retweeted”:false,“possibly_sensitive”:false,“filter_level”:“low”,“lang”:“en”,“timestamp_ms”:“1509610082707”}

I get this error:


OSError Traceback (most recent call last) <ipython-input-4-5322def5edd5> in <module>() ----> 1 df = pd.read_json(data, lines=True)

/Users/davidleifer/anaconda/lib/python3.5/site-packages/pandas/io/json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines) 214 if exists: 215 with _get_handle(filepath_or_buffer, ‘r’, encoding=encoding) as fh: –> 216 json = fh.read() 217 else: 218 json = filepath_or_buffer

OSError: [Errno 22] Invalid argument

Expected Output

Loading the JSON into a pandas dataframe.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.5.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.0 nose: 1.3.7 pip: 9.0.1 setuptools: 36.2.7 Cython: 0.24 numpy: 1.13.2 scipy: 0.19.1 statsmodels: 0.6.1 xarray: None IPython: 4.2.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: 1.1.0 tables: 3.3.0 numexpr: 2.6.2 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: None html5lib: 0.999999999 httplib2: 0.9.2 apiclient: 1.5.1 sqlalchemy: 1.0.13 pymysql: None psycopg2: 2.6.2 (dt dec pq3 ext lo64) jinja2: 2.8 boto: 2.48.0 pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:2
  • Comments:26 (7 by maintainers)

github_iconTop GitHub Comments

8reactions
mariskaascommented, Aug 8, 2018

Same bug with pd.to_json from a CSV file. CSV file is only 700mb, I can in fact change it to json the long way, but it gives a slightly different format than I would like. Pandas version is 0.23.4.

4reactions
fercookcommented, Aug 7, 2018

Hit the same bug with a proper jsonlines file of 13GB on macOS and Pandas 0.23.0. Please reopen the issue

Read more comments on GitHub >

github_iconTop Results From Across the Web

OSError: [Errno 22] when I try to .read() a json file
It appears that this is some kind of bug that occurs when the file is too large (my file was ~10GB). Once I...
Read more >
pandas.read_json — pandas 1.5.2 documentation
Indication of expected JSON string format. Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible...
Read more >
Handling OSError exception in Python - GeeksforGeeks
Let us see how to handle OSError Exceptions in Python. OSError is a built-in exception in Python and serves as the error class...
Read more >
[SOLVED] OSError: [Errno 22] Invalid argument:... - YouTube
SOLVED] [FIXED] # OSError : [ Errno 22 ] Invalid argument : 'yourfilename' appears when trying to open/read/write/create a file in #Python 3....
Read more >
[SOLVED] OSError: [Errno 22] Invalid argument - YouTube
This tutorial will help you fix your File reading issue. # oserror #Howto # invalid #python.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found