JSONDecodingError while archiving a specific website
See original GitHub issueDescribe the bug
I’m getting this JSONDecodingError
on a specific website.
Steps to reproduce
- Ran ArchiveBox with the “default” config (didn’t change the docker-compose.yaml file much, apart from naming networks differently)
- Saw this output during archiving
u@h:~/docker/ArchiveBox$ echo "https://www.zdnet.com/article/new-simjacker-attack-exploited-in-the-wild-to-track-users-for-at-least-two-years/" | /usr/local/bin/docker-compose exec -T archivebox /bin/archive
Traceback (most recent call last):
File "/bin/archive", line 136, in <module>
main(*sys.argv)
File "/bin/archive", line 98, in main
update_archive_data(import_path=import_path, resume=resume)
File "/bin/archive", line 106, in update_archive_data
all_links, new_links = load_links_index(out_dir=OUTPUT_DIR, import_path=import_path)
File "/home/pptruser/app/archivebox/index.py", line 61, in load_links_index
existing_links = parse_json_links_index(out_dir)
File "/home/pptruser/app/archivebox/index.py", line 108, in parse_json_links_index
links = json.load(f)['links']
File "/usr/lib/python3.5/json/__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.5/json/decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 5283 column 44 (char 271619)
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
JSONDecodingError while archiving a specific website #265
I'm getting this JSONDecodingError on a specific website. Steps to reproduce. Ran ArchiveBox with the "default" config (didn't change the docker ...
Read more >Apollo iOS Remove Potential PII from GraphQLResultError ...
In the case there's a decoding issue where a specific path in the Response could not convert to some type, a JSONDecodingError.
Read more >Key Features — ArchiveBox 0.6.3 documentation
Open-source self-hosted web archiving. ... Use the filtering CLI flags on the archivebox list command to export specific Snapshots or ranges.
Read more >swift - Apollo graphql IOS Apollo.JSONDecodingError ...
Hello our swift project service layer changing to graphql. I have a problem with one service i couldn't fix it. when i request...
Read more >Troubleshooting Application Archiving in Xcode
TN2215: describes how to resolve common issues encountered while archiving iOS and Mac applications in Xcode.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Cool, and I’ve just extracted individual archived page URLs from $subdir/index.js, removed the currupted main index.js file and archived the URLs again - fortunately, they were still available.
BTW, there was no power loss or anything like that. I’ve seen in the sources that you’ve had in-place JSON changing code, yeah, agree that SQLite is a better solution.
Closing this in favor of #234