Unexpected metadata for non-existent ebooks
See original GitHub issueThe metadata database contains records for non-existent ebooks.
Exists: https://www.gutenberg.org/ebooks/1 Doesn’t exist: https://www.gutenberg.org/ebooks/182
Example code:
from __future__ import print_function
from gutenberg.query import get_metadata
def get_all_metadata(etextno):
for feature_name in ['author', 'formaturi', 'language',
'rights', 'subject', 'title']:
print("{}\t{}\t{}".format(
etextno,
feature_name,
get_metadata(feature_name, etextno)))
print()
get_all_metadata(1) # US Declaration of Independence
get_all_metadata(182) # no such ebook
Actual output:
1 author frozenset([u'United States President (1801-1809)'])
1 formaturi frozenset([u'http://www.gutenberg.org/ebooks/1.txt.utf-8', u'http://www.gutenberg.org/ebooks/1.e
pub.noimages', u'http://www.gutenberg.org/6/5/2/6527/6527-t/6527-t.tex', u'http://www.gutenberg.org/ebooks/1.html.noimag
es', u'http://www.gutenberg.org/files/1/1.zip', u'http://www.gutenberg.org/ebooks/1.epub.images', u'http://www.gutenberg
.org/ebooks/1.rdf', u'http://www.gutenberg.org/ebooks/1.kindle.noimages', u'http://www.gutenberg.org/files/1/1.txt', u'h
ttp://www.gutenberg.org/ebooks/1.html.images', u'http://www.gutenberg.org/6/5/2/6527/6527-t.zip', u'http://www.gutenberg
.org/ebooks/1.kindle.images'])
1 language frozenset([u'en'])
1 rights frozenset([u'Public domain in the USA.'])
1 subject frozenset([u'E201', u'United States. Declaration of Independence', u'United States -- History -- Revolut
ion, 1775-1783 -- Sources', u'JK'])
1 title frozenset([u'The Declaration of Independence of the United States of America'])
182 author frozenset([])
182 formaturi frozenset([])
182 language frozenset([u'en'])
182 rights frozenset([u'None'])
182 subject frozenset([])
182 title frozenset([])
Expected output:
I’d expect get_metadata("language", 182)
and get_metadata("rights", 182)
to both return frozenset([])
instead of frozenset([u'en'])
and frozenset([u'None'])
.
Or better, as there’s no such ebook, perhaps it should it return None or raise an exception: maybe IndexError or something custom like NoEbookIndex, or just don’t add it to the database in the first place and let that raise whatever it would raise when an index is not found.
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (8 by maintainers)
Top Results From Across the Web
How to load ebooks on a modern Kindle e-reader
To fix that, just import the book into Calibre and right click and select change metadata. You can then manually write in the...
Read more >Metadata Guidelines for Books
Metadata Guidelines for Books. Authors, publishers, and selling partners are responsible for adhering to our content guidelines. We invest significant time ...
Read more >nielsen book - the importance of metadata for discoverability ...
In 2012 and 2016 Nielsen Book conducted research into the relationship between physical book sales in the UK and bibliographic metadata.
Read more >calibre.pdf
It can go out to the Internet and fetch metadata for your books. It can download newspapers and convert them into e-books for...
Read more >Ebook Industry News Feed: News from the world of digital books
Amazon's public statements are often obtuse or non-existent, ... ebooks and metadata into the digital supply chain for Hachette Book Group.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The code in the repository is slightly ahead of the PyPI version. A number of the recent improvements (including more metadata extractors) are being held up by the need to decide if we’re going to keep support for Python 3.2 or not.
To run the latest version of the code, you can follow the instructions in the readme for installing from source. If you run into issues with that, you’ll probably want to create an issue (or post a StackOverflow question and then create an issue pointing to it, or any other way to ask for help, of course.)
I’ve done a bit of work towards resolving this issue on the filter-phantom-books branch. We now have a unit test that reproduces the problem and I’ve implemented the approach discussed above: identify the phantoms at metadata cache creation time and remove them from the cache.
However, for some reason, deleting items from the graph doesn’t seem to work (code). Does anyone have an idea what could be going wrong here? @hugovk @MasterOdin @ikarth