question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

lyrics: Tolerate more tags when gathering lyrics texts in Google backend

See original GitHub issue

Problem

I finally got beets working again under Linux Mint 20.2 with the recently-released version beets 1.5.0 (thank you to everyone involved with the beats team that helped make that long-awaited update happen!). I am a user of the Lyrics plug-in, and took the opportunity to obtain a Google API key to help improve the lyrics scraping.

I’ve imported a handful of albums so far, and while most of the functionality, including lyrics scraping, is working well, there are times when portions of the lyrics are being missed for certain songs. Though I continue to investigate this, the testing I have been able to do so far (combined with including the verbose (-vv) mode parameter) seems to indicate this primarily happens with AZLyrics (www.azlyrics.com).

One album I have been using that consistently reproduces the error is Molly Hatchet’s first album (self-titled “Molly Hatchet”). Verbose mode of the command output shows that Google used AZLyrics for most songs, with SongLyrtics (www.songlyrics.com) being used in a few cases. For the lyrics being provided from AZLyrics, a few songs did not have any issues at all:

  • Gator Country
  • Dreams I’ll Never See

However, quite a few had only a portion of the complete lyrics, as shown in Puddletag. These include:

  • Bounty Hunter
  • The Creeper
  • The Price You Pay
  • I’ll Be Runing
  • Cheatin’ Woman
  • Trust Your Old Friend

I haven’t tried searching through source code on Github, but one thing I noticed anecdotally is that the songs having this problem seemed to have identifiers within the lyrics that used brackets (“[” and “]”) to denote a subset of the lyrics, such as “[Chorus:]” or “[LEAD BREAK]”. See The Price You Pay as an example. With that song, beets returned lyrics starting after the “[LEAD BREAK]” identifier:

I shot a man in Macon over a poker game,
I killed another in Atlanta just to build my fame,
...

The 2 songs on the album not encountering the lyrics problem did not have any such bracketed-identifiers in their lyrics. If I can provide any documentation in addition to the configuration file (below), please let me know what might help.

Setup

  • OS: Linux Mint 20.2
  • Python version: 3.8.10
  • beets version: 1.5.0
  • Turning off plugins made problem go away (yes/no): (N/A, since lyrics are provided by a plug-in)

My configuration (output of beet config) is:

directory: ~/Music
library: ~/data/musiclibrary.blb

id3v23: yes
plugins: lyrics fetchart zero embedart scrub

import:
    move: yes
    write: yes
    log: /home/dan/Music/import-log.txt
    timid: yes

zero:
    fields: comments day
    
lyrics:
    force: yes
    google_API_key: <My key here - not included to avoid potential hacking>
    sources: musixmatch google

embedart:
    maxwidth: 300
    remove_art_file: yes

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:16 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
sampsyocommented, Sep 12, 2021

I should probably add some meta-commentary here: the Google backend, unlike the other lyrics backends in this plugin, is necessarily very heuristic. That is, there will never be a consistent set of rules that works on all lyrics pages on the Web; all we can do is our best: iteratively improve the parser when we run across examples that need help and are fixable. But there are an infinite number of such fixes that are possible and perfection, in the end, is unattainable.

1reaction
dannn-ocommented, Sep 10, 2021

So, I imported a song off of the album that doesn’t encounter the problem, and as I think was expected from analysis above, the entire lyrics’ text was contained within a single string. That’s why the “stripped_strings” function was successful, because there was only one text blob in competition (actually, there was a 2nd string identifying the album, song, and AZLyrics.com site, which was quite small in comparison).

I analyzed lyrics for the song I’ve been using to reproduce the problem, and the lyrics definitely are broken up into 6 different blobs (not counting the album/song/AZlyrics string) before the “stripped_strings” function is invoked. In looking at the lyrics at the azlyrics.com web site, the points at which the lyrics separate into the 6 blobs occur with the bracketed text I mentioned in my original post (for example, where [Chorus:] or [LEAD BREAK] appear within the lyrics’ text).

I think there could be 2 approaches: Either preceding code could be tweaked to try and prevent the lyrics from splitting up, or if that proves difficult, logic could be added to combine the resulting blobs into one string. I think the former option would probably be the preferred one. In this particular case, the bracketed text is surrounded by italics elements (<i> and </i>), so those tags could be the cause rather than the brackets.

I’ve hit my limit for tonight, but plan to next see if I can find where and how the lyrics’ fracturing happens.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Taylor Swift – ​tolerate it Lyrics - Genius
[Verse 1] I sit and watch you reading with your head low. I wake and watch you breathing with your eyes closed. I...
Read more >
beets Documentation - Read the Docs
Welcome to the documentation for beets, the media library management system for obsessive music geeks. If you're new to beets, begin with the...
Read more >
Without Context, Some Lyrics Inside the Hello Dolly Plugin Are ...
Putting aside the debate of which version of the lyrics are used, displaying the text above without context can and is seen as...
Read more >
List of applications - ArchWiki
Transmission CLI — Simple and easy-to-use BitTorrent client with a daemon version and multiple front-ends. This package includes backend, daemon, ...
Read more >
Untitled
Grim from the diamond minecart, Groove stain girl lyrics, Gostorego backend, Michael youssef wife, Prinsip lenting sempurna. Adele target commercial, Apexi ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found