question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicate meta entries --> fail

See original GitHub issue

I’m having trouble parsing attributes for this page:

https://cosmonaut.blog/2019/02/20/no-bernie/

This might very much be my non-existent JS/CSS skills, so feel free to close and sorry for the disturbance. The problem I have is with the lead_image_url selectors. The “default” (for most extractors) for this one would be [['meta[property="og:image"]', 'content']] or [['meta[name="twitter:image"]','value']], but both of those, when executed, return two near-identical entries, causing the whole thing to fall apart (because if I read the tutorial correctly, they’d need to return exactly one item).

The other idea would be to query the image directly from the page, using [['img.wp-post-image', 'src']], but this is an image with srcset and so the result ends up being a concatenation with multiple URLs (each of which would be acceptable to me) which I cannot further process in the simple selector: [...] setting.

Am I missing something here?

  • Platform: Linux my-desktop 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Mercury Parser Version: master (2a3ade706dc445ecb09cce552b087c850d2cb817)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
black-puppydogcommented, Mar 18, 2019

sorry, forgot to close this. thanks again!

1reaction
toufic-mcommented, Mar 18, 2019

Indeed, a selector must return only one match, and there are a couple of ways to handle this:

  • your idea of querying the image directly from the page is perfectly correct, and the srcset issue that you have mentioned has been addressed in #312 , which has been merged into master and should be included in the next package release;

  • alternatively, and in other situations where a non-unique selector doesn’t exist, you can use a selector that accounts for the two matches by having it return the second match, while adding a fallback selector to match the first element in case the website’s HTML is changed to no longer have duplicate tags; so it could be something like:

  lead_image_url: {
    selectors: [
      ['meta[name="og:image"] ~ meta[name="og:image"]', 'value'], // this basically means: select the `meta[name="og:image"]` that is a subsequent sibling of a `meta[name="og:image"]`
      ['meta[name="og:image"]', 'value'], // if the first selector no longer works, then this meta property no longer has a duplicate and we can safely select the first one
    ],
  },
Read more comments on GitHub >

github_iconTop Results From Across the Web

Fixing the Error "Duplicate Meta Titles" in SEO with Siteimprove
You should avoid duplicate page titles (meta titles) on your website because the more duplicate content and duplicate page titles (meta titles) ...
Read more >
Android app run fails with "duplicate entry: META-INF/services ...
I have made a number of changes, including updating some of the build.gradle dependencies to later versions, and the problem has gone away....
Read more >
How do I fix duplicate meta tags? - Yoast
Unique URLs with duplicate meta tags​​ If you have multiple unique URLs with the same meta tags, you may decide to: Change the...
Read more >
Duplicate meta tags when using Head both in custom ... - GitHub
We have this issue with our TMS. And realized anytime we have a script that does a document.head.append, a meta-tag gets duplicated in...
Read more >
Issue with Primary key duplicate entries on all wp_meta table
Hi,. I have been having multiple issues, with my website, which I think can be traced to multiple primary key entries on all...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found