question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

JSONDecodingError while archiving a specific website

See original GitHub issue

Describe the bug

I’m getting this JSONDecodingError on a specific website.

Steps to reproduce

  1. Ran ArchiveBox with the “default” config (didn’t change the docker-compose.yaml file much, apart from naming networks differently)
  2. Saw this output during archiving
u@h:~/docker/ArchiveBox$ echo "https://www.zdnet.com/article/new-simjacker-attack-exploited-in-the-wild-to-track-users-for-at-least-two-years/" | /usr/local/bin/docker-compose exec -T archivebox /bin/archive

Traceback (most recent call last):
  File "/bin/archive", line 136, in <module>
    main(*sys.argv)
  File "/bin/archive", line 98, in main
    update_archive_data(import_path=import_path, resume=resume)
  File "/bin/archive", line 106, in update_archive_data
    all_links, new_links = load_links_index(out_dir=OUTPUT_DIR, import_path=import_path)
  File "/home/pptruser/app/archivebox/index.py", line 61, in load_links_index
    existing_links = parse_json_links_index(out_dir)
  File "/home/pptruser/app/archivebox/index.py", line 108, in parse_json_links_index
    links = json.load(f)['links']
  File "/usr/lib/python3.5/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 5283 column 44 (char 271619)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
gjedeercommented, Oct 18, 2019

Cool, and I’ve just extracted individual archived page URLs from $subdir/index.js, removed the currupted main index.js file and archived the URLs again - fortunately, they were still available.

BTW, there was no power loss or anything like that. I’ve seen in the sources that you’ve had in-place JSON changing code, yeah, agree that SQLite is a better solution.

0reactions
piratecommented, May 9, 2020

Closing this in favor of #234

Read more comments on GitHub >

github_iconTop Results From Across the Web

JSONDecodingError while archiving a specific website #265
I'm getting this JSONDecodingError on a specific website. Steps to reproduce. Ran ArchiveBox with the "default" config (didn't change the docker ...
Read more >
Apollo iOS Remove Potential PII from GraphQLResultError ...
In the case there's a decoding issue where a specific path in the Response could not convert to some type, a JSONDecodingError.
Read more >
Key Features — ArchiveBox 0.6.3 documentation
Open-source self-hosted web archiving. ... Use the filtering CLI flags on the archivebox list command to export specific Snapshots or ranges.
Read more >
swift - Apollo graphql IOS Apollo.JSONDecodingError ...
Hello our swift project service layer changing to graphql. I have a problem with one service i couldn't fix it. when i request...
Read more >
Troubleshooting Application Archiving in Xcode
TN2215: describes how to resolve common issues encountered while archiving iOS and Mac applications in Xcode.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found