Unexpected query output
See original GitHub issueWhen trying to retrieve information via simple queries, I consistently got outputs that I did not expect. Specifically, the publications which are referred to by the keywords are not returned in the result of the query. I do however get a return with the right publication data via a manual HTTP GET request.
Example code:
from crossref.restful import Works
keyword = 'Albert Einstein Elektrodynamik bewegter Körper'
works = Works()
result = works.query(keyword)
for entry in result:
print(entry)
break
>> {'indexed': {'date-parts': [[2019, 11, 19]], 'date-time': '2019-11-19T19:11:52Z', 'timestamp': 1574190712445}, 'reference-count': 0, 'publisher': 'Maney Publishing', 'issue': '1', 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'short-container-title': ['Journal of the American Institute for Conservation'], 'published-print': {'date-parts': [[1980]]}, 'DOI': '10.2307/3179679', 'type': 'journal-article', 'created': {'date-parts': [[2006, 4, 18]], 'date-time': '2006-04-18T05:15:34Z', 'timestamp': 1145337334000}, 'page': '21', 'source': 'Crossref', 'is-referenced-by-count': 0, 'title': ['A Semi-Rigid Transparent Support for Paintings Which Have Both Inscriptions on Their Fabric Reverse and Acute Planar Distortions'], 'prefix': '10.1179', 'volume': '20', 'author': [{'given': 'Albert', 'family': 'Albano', 'sequence': 'first', 'affiliation': []}], 'member': '138', 'container-title': ['Journal of the American Institute for Conservation'], 'deposited': {'date-parts': [[2015, 6, 26]], 'date-time': '2015-06-26T01:05:23Z', 'timestamp': 1435280723000}, 'score': 4.5581737, 'issued': {'date-parts': [[1980]]}, 'references-count': 0, 'journal-issue': {'published-print': {'date-parts': [[1980]]}, 'issue': '1'}, 'URL': 'http://dx.doi.org/10.2307/3179679', 'ISSN': ['0197-1360'], 'issn-type': [{'value': '0197-1360', 'type': 'print'}]}
I get this kind of output which has nothing to do with my input keyword with different keywords, too. I have tried modifying the order of the result [result.order(‘desc’)] but that does not seem to change anything.
When I then do the same request via HTTP GET and the normal API URL, I get the expected output as the first result:
import requests
keyword = 'Albert Einstein Elektrodynamik bewegter Körper'
keyword = '+'.join(keyword.split())
url = 'https://api.crossref.org/works?query=' + keyword
result = requests.get(url = url)
# Take first result
result = result.json()['message']['items'][0]
print(result)
>> {'indexed': {'date-parts': [[2020, 5, 25]], 'date-time': '2020-05-25T14:23:45Z', 'timestamp': 1590416625775}, 'publisher-location': 'Wiesbaden', 'reference-count': 0, 'publisher': 'Vieweg+Teubner Verlag', 'isbn-type': [{'value': '9783663193722', 'type': 'print'}, {'value': '9783663195108', 'type': 'electronic'}], 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'published-print': {'date-parts': [[1923]]}, 'DOI': '10.1007/978-3-663-19510-8_3', 'type': 'book-chapter', 'created': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:43Z', 'timestamp': 1386295723000}, 'page': '26-50', 'source': 'Crossref', 'is-referenced-by-count': 5, 'title': ['Zur Elektrodynamik bewegter Körper'], 'prefix': '10.1007', 'author': [{'given': 'A.', 'family': 'Einstein', 'sequence': 'first', 'affiliation': []}], 'member': '297', 'container-title': ['Das Relativitätsprinzip'], 'link': [{'URL': 'http://link.springer.com/content/pdf/10.1007/978-3-663-19510-8_3', 'content-type': 'unspecified', 'content-version': 'vor', 'intended-application': 'similarity-checking'}], 'deposited': {'date-parts': [[2013, 12, 6]], 'date-time': '2013-12-06T02:08:45Z', 'timestamp': 1386295725000}, 'score': 53.638336, 'issued': {'date-parts': [[1923]]}, 'ISBN': ['9783663193722', '9783663195108'], 'references-count': 0, 'URL': 'http://dx.doi.org/10.1007/978-3-663-19510-8_3'}
The output that I have retrieved with the tool in this repository has nothing to do with my query keyword. Do you have an idea about how I can fix this? I would be very grateful for every kind of help.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)

Top Related StackOverflow Question
The difference between your approach and the API, is that, the API uses some other parameters in the query to allow users to download all the documents related to the given query.
In both approaches there is a total of 290890 matched documents. You can see it testing both urls, and looking the attribute total-results.
API: https://api.crossref.org/works?query=Albert+Einstein+Elektrodynamik+bewegter+Körper&cursor=*&rows=100 Your approach: https://api.crossref.org/works?query=Albert+Einstein+Elektrodynamik+bewegter+Körper
As you can see, the differences between the urls are the parameters (rows=100 and cursor=*) where :
@Ankush-Chander Thank you very much! That helps me getting exactly what I need.