question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError: 'link' in capture

See original GitHub issue

When I was running this code:

$ python3.8 test_save.py
{'Server': 'nginx/1.15.8', 'Date': 'Wed, 15 Jul 2020 11:59:50 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache'}
capture: 42.963274240493774 sec.
{'Server': 'nginx/1.15.8', 'Date': 'Wed, 15 Jul 2020 12:01:37 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Cache-Control': 'no-cache'}
capture_or_cache: 97.4388906955719 sec.
Traceback (most recent call last):
  File "test_save.py", line 28, in <module>
    main()
  File "test_save.py", line 24, in main
    measure(fun, url)
  File "test_save.py", line 8, in measure                                                               
    print(f(*arg))
  File "/home/eggplants/.pyenv/versions/3.8.0/lib/python3.8/site-packages/savepagenow/api.py", line 55, in capture
    header_links = parse_header_links(response.headers['Link'])
  File "/home/eggplants/.pyenv/versions/3.8.0/lib/python3.8/site-packages/requests/structures.py", line 54, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'link'

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

5reactions
tlcaputicommented, Jul 31, 2020

I needed to figure out a quick fix for this same problem, and I ended up writing this. It’s not the most exact or beautifully written piece of code in the world, but it works for my purposes. Maybe it’ll work for yours.


# MIT License

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import datetime
from time import sleep


def archive_url(
    url, 
    timeout=100, 
    user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
    ):

    """Submits a URL to WebArchive's Save Page Now Feature (working as of 2020-07-31 on Python 3.6.10)
    
    Keyword arguments:
    url -- The url you want to archive
    timeout -- Max number of seconds you're willing to wait
    user_agent -- You can pass a custom user agent here

    """

    # POST Request
    headers = {
        'authority': 'web.archive.org',
        'cache-control': 'max-age=0',
        'upgrade-insecure-requests': '1',
        'origin': 'https://web.archive.org',
        'content-type': 'application/x-www-form-urlencoded',
        'user-agent': user_agent,
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-user': '?1',
        'sec-fetch-dest': 'document',
        'referer': 'https://web.archive.org/save',
        'accept-language': 'en-US,en;q=0.9,de;q=0.8',
    }

    data = {
        'url': url,
        'capture_all': 'on'
    }

    r = requests.post(f'https://web.archive.org/save/{url}', headers=headers, data=data)

    # BS4 get SCRIPTS and find watchJob arguments
    soup = BeautifulSoup(r.content, 'html.parser')
    scripts = soup.find_all("script")

    job_id = None
    for script in scripts:
        string = script.string
        if string and "watchJob" in string:
            args_string_list = string.strip().split('"')
            job_id = args_string_list[1]
            break

    assert job_id is not None, "Couldn't find job_id in html"


    # Request status of the job
    out_url = None
    was_pending = False
    wait_time = 0
    while wait_time < timeout:

        
        r = requests.get(f"https://web.archive.org/save/status/{job_id}?_t={datetime.datetime.now().timestamp()}", headers=headers)
        rj = r.json()

        if rj.get('status', 'none') == "pending":
            was_pending = True

        if rj.get('status', 'none') == "success":
            original_url = rj.get('original_url', 'none')
            ext_url = f"/web/{rj['timestamp']}/{rj['original_url']}"
            out_url = urljoin('https://web.archive.org', ext_url)
            break
        

        seconds_to_wait = int(r.headers.get("Retry-After", 5))
        print(f"[{wait_time} seconds elapsed] Waiting for archive to complete...")
        wait_time += seconds_to_wait
        sleep(seconds_to_wait)

    assert out_url is not None, f"Process did not complete after {timeout} seconds"

    out = {
        "original_url": original_url,
        "archive_url": out_url,
        "from_cache": was_pending == False
    }

    return out

if __name__ == "__main__":
    url = "https://ultimateframedata.com/"
    print(archive_url(url))

1reaction
palewirecommented, Sep 8, 2020

I’ve pushed a change as proposed here live in version 1.1.0. @dannguyen and @eggplants, tell me if it fixes things for you.

https://pypi.org/project/savepagenow/1.1.0/

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python KeyError Exceptions and How to Handle Them
The Python KeyError is a type of LookupError exception and denotes that there was an issue retrieving the key you were looking for....
Read more >
Catch KeyError in Python - Stack Overflow
If it's raising a KeyError with no message, then it won't print anything. If you do... try: connection = manager.connect("I2Cx") except ...
Read more >
Catch KeyError in Python - SyntaxFix
If I run the code: connection = manager.connect("I2Cx"). The program crashes and reports a KeyError because I2Cx doesn't exist (it should be I2C)....
Read more >
How to Fix KeyError Exceptions in Python - Rollbar
The Python KeyError is an exception that occurs when an attempt is made to access an item in a dictionary that does not...
Read more >
Python read MVNX script: KeyError - Xsens-BASE
Python read MVNX script: KeyError: 'footContacts' occurs when using the 'get_foot_contacts' command. When using the load_mvnx python script the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found