question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Text embedding: `Failed to embed text` on wikipedia pages

See original GitHub issue

Follow up to https://github.com/brave/brave-browser/issues/23424

Steps to Reproduce

  1. Run Brave
--enable-logging=stderr --vmodule=text_embedding_processor=9,embedding_processing=9,text_embedding_html_events=9 --enable-features=TextEmbedding 
  1. Enable rewards and ads
  2. Restart brave
  3. Open https://en.wikipedia.org/wiki/Svante_P%C3%A4%C3%A4bo
  4. Check logs

Actual result:

[6876:6876:1009/142934.888115:VERBOSE1:text_embedding_processor.cc(61)] Failed to embed text

Note: the processing works on other pages, e.g. interia.pl

Expected result:

embed text is processed

Reproduces how often:

Easily reproduced

Brave version (brave://version info)

Brave 1.45.90 Chromium: 106.0.5249.103 (Official Build) beta (64-bit)
Revision 182570408a1f25ab2731ef5f283b918df9b9f956-refs/branch-heads/5249_91@{#6}
OS Ubuntu 18.04 LTS

cc @jsecretan @tmancey @ptjames

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

4reactions
ptjamescommented, Oct 11, 2022

@tmancey I have no issue with adjusting the messaging. I can ask @LorenzoMinto to include it in the PR he currently has open that uses the embeddings to guide ad serving

1reaction
ptjamescommented, Oct 11, 2022

@btlechowski @tmancey Failed to embed text is the expected result here. Embeddings are created from the words in the meta tag with property og:title. In the example shared above, that would be:

<meta property="og:title" content="Svante Pääbo - Wikipedia">

The words svante, pääbo, wikipedia are not within-vocab for the current word-embedding mapping. This is mentioned by the following logging:

[73708:259:1010/225649.460944:VERBOSE9:embedding_processing.cc(88)] svante - text embedding token not found in resource vocabulary [73708:259:1010/225649.460995:VERBOSE9:embedding_processing.cc(88)] pääbo - text embedding token not found in resource vocabulary [73708:259:1010/225649.461020:VERBOSE9:embedding_processing.cc(88)] wikipedia - text embedding token not found in resource vocabulary

Since no words are available to embed, we fail to embed text in this situation. To see a working example that does embed text, see the following examples:

https://en.wikipedia.org/wiki/Mathematics https://en.wikipedia.org/wiki/Law https://en.wikipedia.org/wiki/History_of_science

Read more comments on GitHub >

github_iconTop Results From Across the Web

Help:Wikitext - Wikipedia
Indent text​​ Indentation is most commonly used on talk pages. causes the line to be indented by three more character positions. so long...
Read more >
Wikipedia:Embedded citations
Description. Embedded citations are offered as one option for citing sources on Wikipedia. This approach is to place a numbered external link in...
Read more >
Wikipedia:Manual of Style/Hidden text
On Wikipedia, hidden text is text that is visible when editing the source for the page or when using VisualEditor, but not on...
Read more >
Wikipedia:Citing sources
A general reference is a citation to a reliable source that supports content, but is not linked to any particular text in the...
Read more >
Object Linking and Embedding - Wikipedia
Object Linking & Embedding (OLE) is a proprietary technology developed by Microsoft that ... This article needs additional citations for verification.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found