question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large strings not shown properly

See original GitHub issue

When I’m looking for the src-Attribute of an image wich has a base64 string I would get back a very long string (up to 100 KB) but instead cheerio returns a shortened version to me, wich looks something like this:

data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==

I think thats intentional wich I can understand to prevent damage but I’m aware that I’ll get back a very large string. How can I get the unshortened version?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
5saviahvcommented, Dec 10, 2020

Call it boredom, but I tried make what I suggested earlier.

const cheerio = require('cheerio');
const got = require('got');
const fs = require('fs').promises;

const xbox = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36';

(async () => {
    try {
        //const url = 'https://www.google.com/search?tbm=shop&hl=de-de&tbs=vw:l&q=xbox';
        const url = 'https://www.google.com/search?q=xbox&hl=de-de&tbm=isch&sa=X&biw=2560&bih=1311';
        //const body = (await got(url, { headers: { 'User-Agent': xbox } })).body;
        //await fs.writeFile('dump.html', body, 'utf8').then(() => console.log('The file was saved!'));

        const body = await fs.readFile('dump.html', 'utf8');

        const $ = cheerio.load(body);

        // just get images from script tags
        if (false) {
            let img = [];
            $('script:contains("setImgSrc\\(")').each((ix, val) => {
                const _setImgSrc = (a, c) => (img[a] = c);
                eval($(val).html());
            });
            img.forEach((c) => console.log(c));
        }

        // replace place holders
        if (true) {
            let d = {};
            // search and keep defer images
            $('img[data-iid]').each((ix, val) => (d[$(val).attr('data-iid')] = val));

            // evaluate script
            $('script:contains("setImgSrc\\(")').each((ix, val) => {
                const _setImgSrc = (a, c) => {
                    if (d[a]) $(d[a]).removeAttr('data-iid').attr('src', c);
                };
                eval($(val).remove().html());
            });
        }

        // turn it all into html again
        const html = $.html();
        await fs.writeFile('dump (images added).html', html, 'utf8').then(() => console.log('The file was saved!'));
        //console.log(html);
    } catch (error) {
        console.log(error);
    }
})();

1reaction
5saviahvcommented, Dec 9, 2020

I loaded some source from google and I discovered it is probably not cheerios fault. dump.html

Google uses little trick there - when page is loaded from their site, image tags are filled with those place holders (seen above). Later when browser has finished rendering the page, browser uses javascript to replace those place holders with real images. It actually helps render pages quicker.

It may be confusing when you look source in browser you see big pictures but it is because browser has already replaced images. Sadly Cheerio can only read values what are in image tags during load.

Interestingly image data is loaded with page, but data is stored in script tags, like so:

<script nonce="6yvz7b/n1cGMHtmd5Ot8FA">_setImgSrc('0','data:image\/jpeg;base64,\/9j\/4AAQSkZJRgABA
...
AAAAAAAAAAoAAAAAAAAAAAAAAAP\/\/Z');</script>

you could actually find those script tags and extract data from those.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Not able to print very large strings in java (neither in Eclipse ...
One thing thats pissed me off at least twice, sometimes Eclipse won't show a large string properly in the console, it's blank.
Read more >
Strings - C# Programming Guide | Microsoft Learn
A string is an object of type String whose value is text. Internally, the text is stored as a sequential read-only collection of...
Read more >
Strings - Manual - PHP
A string is series of characters, where a character is the same as a byte. This means that PHP only supports a 256-character...
Read more >
JavaScript Strings - W3Schools
JavaScript strings are for storing and manipulating text. A JavaScript string is zero or more characters written inside quotes.
Read more >
37 - Working With Large Strings - Genero software ... - 4Js
In the last Ask Reuben article I looked at using STRING over using CHAR and one of the reasons is performance, particular with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found