question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`get_short_id` incorrect for pre-March 2007 arXiv identifiers: missing archive

See original GitHub issue

Error:

/usr/local/lib/python3.7/dist-packages/arxiv/arxiv.py in results(self, search)
    552             ))
    553             page_url = self._format_url(search, offset, page_size)
--> 554             feed = self._parse_feed(page_url, first_page)
    555             if first_page:
    556                 # NOTE: this is an ugly fix for a known bug. The totalresults

/usr/local/lib/python3.7/dist-packages/arxiv/arxiv.py in _parse_feed(self, url, first_page)
    635         # Feed was never returned in self.num_retries tries. Raise the last
    636         # exception encountered.
--> 637         raise err
    638 
    639 

HTTPError: arxiv.HTTPError(Page request resulted in HTTP 400)

Code for parsing id from arxiv result object- id = urlparse(result.entry_id).path.split('/')[-1].split('v')[0]

code to reproduce -

ids = ['1911.10854', '1905.00256', '0112019', '1202.2184', '1708.03109', '0205137', '1610.08147', '2003.05245', '0406182', '0708.3630', '0503148', '1111.6170', '1612.04479', '0307110', '0306127', '1307.2727', '0402059', '1012.4706', '1906.01999', '0101032']

papers = arxiv.Search(id_list=ids).get()

invalid ids are '0112019', '0205137' etc

respective pdf urls still accessible, for example : https://arxiv.org/pdf/quant-ph/0112019.pdf

The same error is referenced in another open issue but from the perspective of huge id arrays. [ issue ID : #15]

Apologies if I simply lack sufficient knowledge about identifier naming conventions but it should download from all research fields right?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
lukasschwabcommented, Jul 13, 2021

@sidphbot patch is included in 1.4.0.

1reaction
lukasschwabcommented, Jul 13, 2021

Aha! It looks like old-form IDs can be requested; they just need to be fully-qualified with the archive and (where applicable) subject class.

Explanation

The old-form arXiv ID is a combination of a subject component, a date component, and a counter component.

Diagram breaking down the old-form arXiv ID into its components

0112019 is the 019th paper submitted on the 12th month of 2001… but, because the counts are archive-specific, the numeric component isn’t unique. There is a 0112019 in quantum physics, but there may also be a 0112019 in astrophysics and a 0112019 in math.

This old format only uniquely identifies a paper if we specify which archive’s count it refers to. In this case, we want quant-ph

The fully-qualified ID for 0112019 is quant-ph/0112019. Accordingly, the following code works:

>>> import arxiv
>>> next(arxiv.Search(id_list=['quant-ph/0112019v1']).results())
[arxiv.Result(entry_id='http://arxiv.org/abs/quant-ph/0112019v1', updated=datetime.datetime(2001, 12, 4, 2, 54, 2, tzinfo=datetime.timezone.utc), published=datetime.datetime(2001, 12, 4, 2, 54, 2, tzinfo=datetime.timezone.utc), title='Classical entanglement', authors=[arxiv.Result.Author('Douglas G. Danforth')], summary='Classical systems can be entangled. Entanglement is defined by coincidence\ncorrelations. Quantum entanglement experiments can be mimicked by a mechanical\nsystem with a single conserved variable and 77.8% conditional efficiency.\nExperiments are replicated for four particle entanglement swapping and GHZ\nentanglement.', comment=None, journal_ref=None, doi=None, primary_category='quant-ph', categories=['quant-ph'], links=[arxiv.Result.Link('http://arxiv.org/abs/quant-ph/0112019v1', title=None, rel='alternate', content_type=None), arxiv.Result.Link('http://arxiv.org/pdf/quant-ph/0112019v1', title='pdf', rel='related', content_type=None)])]

But the short ID reported by this client library is incorrect:

>>> r = next(arxiv.Search(id_list=['quant-ph/0112019v1']).results())
>>> r.entry_id
'http://arxiv.org/abs/quant-ph/0112019v1'

Instead of just taking the last path element here, I should be taking the full contents of the path following http://arxiv.org/abs/:

https://github.com/lukasschwab/arxiv.py/blob/ea93efa9f369da995f657856447f4ad998f9076f/arxiv/arxiv.py#L169-L176

@sidphbot if you’re working from hardcoded IDs, adding the archives should solve this issue for you.

If you’re re-querying incorrect IDs returned by this client library, I’ll have a patch out shortly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding the arXiv identifier | arXiv e-print repository
Identifiers up to March 2007 (9107-0703)​​ Instead, each archive represents a subject class, e.g., hep-ex, hep-lat, hep-ph, and hep-th. The astro ...
Read more >
Author Identifiers | arXiv e-print repository
Author Identifiers. It is a long-term goal of arXiv to accurately identify and disambiguate all authors of all articles in arXiv.
Read more >
To replace an article | arXiv e-print repository
We ask that articles be replaced no more than once per week. Note that if your article or replacement has not yet been...
Read more >
arXiv identifier scheme - information for interacting services
This includes archives where the identifier has optional subject-class information ( math , cs , nlin , q-bio ), archives where the subject- ......
Read more >
Considerations for TeX Submissions | arXiv e-print repository
All TeX-type submissions receive the arXiv watermark, including the canonical identifier, version number, primary classification, and a link ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found