Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected query output

See original GitHub issue

When trying to retrieve information via simple queries, I consistently got outputs that I did not expect. Specifically, the publications which are referred to by the keywords are not returned in the result of the query. I do however get a return with the right publication data via a manual HTTP GET request.

Example code:

from crossref.restful import Works 

keyword = 'Albert Einstein Elektrodynamik bewegter Körper'

works = Works()
result = works.query(keyword)
for entry in result:
    print(entry)
    break
>> {'indexed': {'date-parts': [[2019, 11, 19]], 'date-time': '2019-11-19T19:11:52Z', 'timestamp': 1574190712445}, 'reference-count': 0, 'publisher': 'Maney Publishing', 'issue': '1', 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'short-container-title': ['Journal of the American Institute for Conservation'], 'published-print': {'date-parts': [[1980]]}, 'DOI': '10.2307/3179679', 'type': 'journal-article', 'created': {'date-parts': [[2006, 4, 18]], 'date-time': '2006-04-18T05:15:34Z', 'timestamp': 1145337334000}, 'page': '21', 'source': 'Crossref', 'is-referenced-by-count': 0, 'title': ['A Semi-Rigid Transparent Support for Paintings Which Have Both Inscriptions on Their Fabric Reverse and Acute Planar Distortions'], 'prefix': '10.1179', 'volume': '20', 'author': [{'given': 'Albert', 'family': 'Albano', 'sequence': 'first', 'affiliation': []}], 'member': '138', 'container-title': ['Journal of the American Institute for Conservation'], 'deposited': {'date-parts': [[2015, 6, 26]], 'date-time': '2015-06-26T01:05:23Z', 'timestamp': 1435280723000}, 'score': 4.5581737, 'issued': {'date-parts': [[1980]]}, 'references-count': 0, 'journal-issue': {'published-print': {'date-parts': [[1980]]}, 'issue': '1'}, 'URL': 'http://dx.doi.org/10.2307/3179679', 'ISSN': ['0197-1360'], 'issn-type': [{'value': '0197-1360', 'type': 'print'}]}

I get this kind of output which has nothing to do with my input keyword with different keywords, too. I have tried modifying the order of the result [result.order(‘desc’)] but that does not seem to change anything.

When I then do the same request via HTTP GET and the normal API URL, I get the expected output as the first result:

import requests

keyword = 'Albert Einstein Elektrodynamik bewegter Körper'

keyword = '+'.join(keyword.split())
url = 'https://api.crossref.org/works?query=' + keyword
result = requests.get(url = url)
# Take first result
result = result.json()['message']['items'][0]
print(result)

>> {'indexed': {'date-parts': [[2020, 5, 25]], 'date-time': '2020-05-25T14:23:45Z', 'timestamp': 1590416625775}, 'publisher-location': 'Wiesbaden', 'reference-count': 0, 'publisher': 'Vieweg+Teubner Verlag', 'isbn-type': [{'value': '9783663193722', 'type': 'print'}, {'value': '9783663195108', 'type': 'electronic'}], 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'published-print': {'date-parts': [[1923]]}, 'DOI': '10.1007/978-3-663-19510-8_3', 'type': 'book-chapter', 'created': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:43Z', 'timestamp': 1386295723000}, 'page': '26-50', 'source': 'Crossref', 'is-referenced-by-count': 5, 'title': ['Zur Elektrodynamik bewegter Körper'], 'prefix': '10.1007', 'author': [{'given': 'A.', 'family': 'Einstein', 'sequence': 'first', 'affiliation': []}], 'member': '297', 'container-title': ['Das Relativitätsprinzip'], 'link': [{'URL': 'http://link.springer.com/content/pdf/10.1007/978-3-663-19510-8_3', 'content-type': 'unspecified', 'content-version': 'vor', 'intended-application': 'similarity-checking'}], 'deposited': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:45Z', 'timestamp': 1386295725000}, 'score': 53.638336, 'issued': {'date-parts': [[1923]]}, 'ISBN': ['9783663193722', '9783663195108'], 'references-count': 0, 'URL': 'http://dx.doi.org/10.1007/978-3-663-19510-8_3'}

The output that I have retrieved with the tool in this repository has nothing to do with my query keyword. Do you have an idea about how I can fix this? I would be very grateful for every kind of help.

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

2reactions

fabiobatalhacommented, May 25, 2021

The difference between your approach and the API, is that, the API uses some other parameters in the query to allow users to download all the documents related to the given query.

In both approaches there is a total of 290890 matched documents. You can see it testing both urls, and looking the attribute total-results.

API: https://api.crossref.org/works?query=Albert+Einstein+Elektrodynamik+bewegter+Körper&cursor=*&rows=100 Your approach: https://api.crossref.org/works?query=Albert+Einstein+Elektrodynamik+bewegter+Körper

As you can see, the differences between the urls are the parameters (rows=100 and cursor=*) where :

rows=100 do not change the order of the result
cursor=* changes the order of the result (I don’t know why the Crossref API have this behavior, from an user point of view; even if I want to have access to all the results, it is surely expected to still have the results sorted by relevance)

1reaction

OBrinkcommented, May 27, 2021

@Ankush-Chander Thank you very much! That helps me getting exactly what I need.