Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected metadata for non-existent ebooks

See original GitHub issue

The metadata database contains records for non-existent ebooks.

Exists: Doesn’t exist:

Example code:

from __future__ import print_function
from gutenberg.query import get_metadata

def get_all_metadata(etextno):
    for feature_name in ['author', 'formaturi', 'language',
                         'rights', 'subject', 'title']:
            get_metadata(feature_name, etextno)))

get_all_metadata(1)  # US Declaration of Independence
get_all_metadata(182)  # no such ebook

Actual output:

1       author  frozenset([u'United States President (1801-1809)'])
1       formaturi       frozenset([u'', u'
pub.noimages', u'', u'
es', u'', u'', u'http://www.gutenberg
.org/ebooks/1.rdf', u'', u'', u'h
ttp://', u'', u'http://www.gutenberg
1       language        frozenset([u'en'])
1       rights  frozenset([u'Public domain in the USA.'])
1       subject frozenset([u'E201', u'United States. Declaration of Independence', u'United States -- History -- Revolut
ion, 1775-1783 -- Sources', u'JK'])
1       title   frozenset([u'The Declaration of Independence of the United States of America'])

182     author  frozenset([])
182     formaturi       frozenset([])
182     language        frozenset([u'en'])
182     rights  frozenset([u'None'])
182     subject frozenset([])
182     title   frozenset([])

Expected output:

I’d expect get_metadata("language", 182) and get_metadata("rights", 182) to both return frozenset([]) instead of frozenset([u'en']) and frozenset([u'None']).

Or better, as there’s no such ebook, perhaps it should it return None or raise an exception: maybe IndexError or something custom like NoEbookIndex, or just don’t add it to the database in the first place and let that raise whatever it would raise when an index is not found.

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

ikarthcommented, Nov 13, 2016

The code in the repository is slightly ahead of the PyPI version. A number of the recent improvements (including more metadata extractors) are being held up by the need to decide if we’re going to keep support for Python 3.2 or not.

To run the latest version of the code, you can follow the instructions in the readme for installing from source. If you run into issues with that, you’ll probably want to create an issue (or post a StackOverflow question and then create an issue pointing to it, or any other way to ask for help, of course.)

c-wcommented, Aug 3, 2020

I’ve done a bit of work towards resolving this issue on the filter-phantom-books branch. We now have a unit test that reproduces the problem and I’ve implemented the approach discussed above: identify the phantoms at metadata cache creation time and remove them from the cache.

However, for some reason, deleting items from the graph doesn’t seem to work (code). Does anyone have an idea what could be going wrong here? @hugovk @MasterOdin @ikarth

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to load ebooks on a modern Kindle e-reader
To fix that, just import the book into Calibre and right click and select change metadata. You can then manually write in the...
Read more >
Metadata Guidelines for Books
Metadata Guidelines for Books. Authors, publishers, and selling partners are responsible for adhering to our content guidelines. We invest significant time ...
Read more >
nielsen book - the importance of metadata for discoverability ...
In 2012 and 2016 Nielsen Book conducted research into the relationship between physical book sales in the UK and bibliographic metadata.
Read more >
It can go out to the Internet and fetch metadata for your books. It can download newspapers and convert them into e-books for...
Read more >
Ebook Industry News Feed: News from the world of digital books
Amazon's public statements are often obtuse or non-existent, ... ebooks and metadata into the digital supply chain for Hachette Book Group.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found