getpapers has many fewer hits than EUPMC interface
See original GitHub issuefrom a correspondent:
I installed ContentMine on my Mac laptop. I tried to do content mine to my research topic – “Postdoc career outcome”. I was able to get 78 open access full-text papers. See the logs of “getpapers” output below,
$ getpapers -q "postdoc career outcome" -o PDcareer -x
info: Searching using eupmc API
info: Found 78 open access results
Retrieving results [==============================] 100% (eta 0.0s)
info: Done collecting results
info: Saving result metadata
info: Full EUPMC result metadata written to eupmc_results.json
info: Individual EUPMC result metadata records written
info: Extracting fulltext HTML URL list (may not be available for all articles)
info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt
info: Got XML URLs for 78 out of 78 results
info: Downloading fulltext XML files
Downloading files [=======================] 100% (78/78) [4.2s elapsed, eta 0.0]
info: All downloads succeeded!
I did the same search through “Europe PMC” web interface. I got total 297 results, in which 296 are full-text articles and 172 are open-access articles. See the screenshot below,
My questions are:
-
Why “getpapers” extracted far fewer papers than “EUPMC” provides, 78 vs. 172 (or 296)? Is it caused by limited coverage of journal scrapers?
-
Not all the extracted papers are relevant to my research topic. So manual filtering may be needed. Is it possible to provide “getpapers” a list of PMC IDs for paper extraction?
-
For my research topic, I really need to get researcher's name, affiliation, contribution, and bibliometrics (citation number, H-index, journal impact factor) from journal papers. This cannot be done through standard content mine, which extract information about sequence, gene, species, and word count. How do I develop my own “ami2” plugins for extracting facts that I’m interested?
Thank you so much for developing this great open-source software! I’m looking forward to hearing from you soon.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:1
- Comments:7 (1 by maintainers)
Top Results From Across the Web
Issues · ContentMine/getpapers - GitHub
getpapers 'JavaScript heap out of memory error' ... Syntax error with getpapers minimal usage ... getpapers has many fewer hits than EUPMC interface....
Read more >Correctness – A paradigm for sustainable software development
1. OOP is a higher-level paradigm than FP, so people comparing them directly usually are missing the point to begin with. OOP systems...
Read more >Science Careers Classified Advertising
fied candidates must have a Ph.D. in Neuroscience (or related field of biology). Experience with molecular bi-.
Read more >jupyter - OUseful.Info, the blog…
There are multiple ways of running Jupyter notebooks, including the ... a year or so ago: Sports Data and R – Scope for...
Read more >Method and system for managing games of bingo
The Bingo game is then played with them called Bingo numbers being entered into the computer. When a player calls BINGO, the hall...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We get our results directly from the EUPMC API, so this sounds like an API bug. @tarrow can you follow up with EUPMC?
It seems that Europe PMC (EUPMC) has listened to complaints about sudden API changes and has modified its procedures.
I have just stumbled upon the EUPMC SOAP Web Service Reference Guide. There, in Introduction (p. 6 of the document, p.7 of the PDF file), it says:
You can thus