Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Unexpected Detection Behavior

See original GitHub issue

Summary

I’ve come across some weird behavior with certain pages, specifically, product pages on N*w*gg.com that have the “Auto Notify” button. I’m sharing here because I’m hoping for some insight from somebody who knows more than I do, and hopefully this will help the maintainers of the plugin avoid detection in the future.

Actual Behavior

Whether or not my bot is detected and prompted with CAPTCHA varies wildly depending on inputs that I wouldn’t necessarily expect to change things. For example, my bot evaded detection 100% of the time when the userDataDir property was supplied for launch; however, if I also set defaultViewport: null and include the launch argument --start-fullscreen, then my bot evades detection only 96% of the time.

Testing

The inputs that I tested are as follow:

Running in Headless (I used Xvfb to run Headful since my Linux server is remote)
Setting defaultViewport: null
Providing a directory for userDataDir
Setting arg --start-fullscreen
Setting args ['--proxy-server="direct://"', '--proxy-bypass-list=*']

To test these inputs, I loaded asus-rog-crosshair-viii-dark-hero/p/N82E16813119362 100 times for each combination of inputs (some rows have 200 trials because I removed a redundant input after testing). I included the average load time for the pages just for extra information.

Here are the results from my experiment in their entirety. Green means 100% pass rate, yellow is at least 90%, orange is at least 50%, and red is less than 50%. success_rate

Expand the spoiler to see the results sorted by input from 0…0 to 1…1:

More Results

false_to_true

Results

The entire expression resulting from the truth table above looks like this:

(HEADLESS ∧ DATA_DIR) ∨ ( ¬NULL_VIEWPORT ∧ DATA_DIR) ∨ (DATA_DIR ∧ ¬FULLSCREEN) ∨ (DATA_DIR ∧ PROXIES)

Rewriting with respect to DATA_DIR, we get:

DATA_DIR ᴧ (HEADLESS v ¬NULL_VIEWPORT v ¬FULLSCREEN v PROXIES)

In other words, to pass bot detection 100% of the time, we can supply a data directory to Puppeteer and do any combination of the following:

Run in Headless
Don’t set default viewport to null
Don’t set fullscreen
Set the given proxy options

Unanswered Questions

How can we explain N*w*gg’s variability with regard to detection? Shouldn’t a given combination of inputs yield constant results?
Why does supplying a data directory have such an impact on whether the bot is detected? Why does enabling Null Viewport and Fullscreen in addition to Data Directory drop success rate to 96%?
Why is extra-plugin-stealth not sufficient to pass bot detection 100% of the time? How come seemingly unrelated parameters change the success rate? To be fair, maybe not all parameters change the results to a sufficient degree, I didn’t do those calculations.

Code Snippet

Expand for Code Snippet

const Xvfb = require('xvfb');
const puppeteer = require('puppeteer-extra');
const cheerio = require('cheerio');
const asyncLoop = require('async-for-loop');
const fs = require('fs');

const URL = 'https://www.n[*]w[*]gg.com/asus-rog-crosshair-viii-dark-hero/p/N82E16813119362';

const isCaptchaPage = (html) => {
  const $ = cheerio.load(html);
  const productBuyBox = $('div.product-buy-box');
  return productBuyBox.length === 0;
};

const runTest = async (USE_XVFB, USE_NULL_VIEWPORT, USE_DATA_DIR, START_FULL, USE_PROXIES, HEADLESS) => {
  /* print status */
  if (!HEADLESS && !USE_XVFB) {
    console.log('Can\'t run headless without XVFB');
    return [null, null];
  }
  /* start xvfb session */
  let xvfb;
  if (USE_XVFB) {
    xvfb = new Xvfb({
      xvfb_args: ['-screen', '0', '1280x720x24', '-ac'],
    });
    try {
      xvfb.startSync();
      console.log('Xvfb session started');
    } catch (err) {
      console.log('Failed to start Xvfb session');
      throw err;
    }
  }

  /* launch browser */
  const puppeteerOptions = {
    ...(USE_DATA_DIR && { userDataDir: '/home/<USER>/userDataDir' }),
    headless: HEADLESS,
    ...(USE_NULL_VIEWPORT && { defaultViewport: null }),
    args: [
      '--no-sandbox',
      ...(START_FULL ? ['--start-fullscreen'] : []),
      ...(USE_XVFB ? [`--display=${xvfb._display}`] : []),
      ...(USE_PROXIES ? ['--proxy-server="direct://"'] : []),
      ...(USE_PROXIES ? ['--proxy-bypass-list=*'] : []),
    ],
  };
  const browser = await puppeteer.launch(puppeteerOptions);
  console.log('Browser launched');

  /* go to page */
  const page = await browser.newPage();
  await page.goto(URL, { waitUntil: 'networkidle2' });
  const { TaskDuration: time } = await page.metrics();
  console.log(`...loaded page in ${time} seconds`);
  await page.screenshot({ path: 'sc/result.png' });
  console.log('...SC taken');

  /* test HTML */
  const html = await page.evaluate(() => {
    return document.querySelector('*').outerHTML;
  });
  const isCaptcha = isCaptchaPage(html);
  console.log(`...HTML processed. isCaptcha: ${isCaptcha}`);

  /* close browser and stop Xvfb session */
  await browser.close();
  console.log('Browser closed');
  if (USE_XVFB) {
    xvfb.stopSync();
    console.log('Xvfb session stopped');
  }
  return [time, isCaptcha];
};

/* open file for output */
const stream = fs.createWriteStream('testOutput', { flags: 'w+' });

stream.write('USE_XVFB,USE_NULL_VIEWPORT,USE_DATA_DIR,START_FULL,USE_PROXIES,HEADLESS,numTrue,numFalse,numNull,aveLoadTime\n');

/* fill value array */
let valueArr = [];
for (const v1 of [false, true]) {
  for (const v2 of [false, true]) {
    for (const v3 of [false, true]) {
      for (const v4 of [false, true]) {
        for (const v5 of [false, true]) {
          for (const v6 of [false, true]) {
            valueArr.push([v1, v2, v3, v4, v5, v6]);
          }
        }
      }
    }
  }
}

const n = 100;
asyncLoop(valueArr.length, (j, outerNext) => {
  /* get values for function call */
  const [
    USE_XVFB,
    USE_NULL_VIEWPORT,
    USE_DATA_DIR,
    START_FULL,
    USE_PROXIES,
    HEADLESS,
  ] = valueArr[j];

  console.log(`-----${USE_XVFB}-${USE_NULL_VIEWPORT}-${USE_DATA_DIR}-${START_FULL}-${USE_PROXIES}-${HEADLESS}-----`);

  let numTrue = 0;
  let numFalse = 0;
  let numNull = 0;

  let totalLoadTime = 0;

  /* run the test n times */
  asyncLoop(n, (i, innerNext) => {
    console.log(i);
    (async () => {
      const [loadTime, isCaptcha] = await runTest(USE_XVFB, USE_NULL_VIEWPORT, USE_DATA_DIR, START_FULL, USE_PROXIES, HEADLESS);
      if (isCaptcha === null) { numNull += 1; } else if (isCaptcha) { numTrue += 1; } else { numFalse += 1; }
      if (loadTime !== null) totalLoadTime += loadTime;
      innerNext();
    })();
  }, () => {
    const aveLoadTime = totalLoadTime / n;
    const str = `${USE_XVFB},${USE_NULL_VIEWPORT},${USE_DATA_DIR},${START_FULL},${USE_PROXIES},${HEADLESS},${numTrue},${numFalse},${numNull},${aveLoadTime}`;
    console.log(str, '\n-----------------------------------');
    stream.write(`${str}\n`);
    outerNext();
  });
}, () => {
  console.log('Done!');
});

Versions

System:
- OS: Linux 5.4 Ubuntu 20.04.1 LTS (Focal Fossa)
- CPU: (1) x64 Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
- Memory: 116.73 MB / 981.34 MB
- Container: Yes
- Shell: 5.0.17 - /bin/bash
Binaries:
- Node: 12.19.0 - ~/.nvm/versions/node/v12.19.0/bin/node
- Yarn: 1.22.5 - /usr/bin/yarn
- npm: 6.14.8 - ~/.nvm/versions/node/v12.19.0/bin/npm
npmPackages:
- puppeteer: ^5.3.1 => 5.5.0
- puppeteer-extra: ^3.1.15 => 3.1.15
- puppeteer-extra-plugin-stealth: ^2.6.5 => 2.6.5

Issue Analytics

State:
Created 3 years ago
Comments:12 (3 by maintainers)

Top GitHub Comments

2reactions

prescience-datacommented, Nov 30, 2020

I’ll add to this as I get more time but:

How can we explain Nwgg’s variability with regard to detection? Shouldn’t a given combination of inputs yield constant results?

First find out which anti-detect solution they are using. You can do that with the discord bot in chat.

Why does supplying a data directory have such an impact on whether the bot is detected? Why does enabling Null Viewport and Fullscreen in addition to Data Directory drop success rate to 96%?

If you don’t spawn a fresh userDataDir each session, you are effectively reusing the same directory over and over. (ie cookies, localStorage, etc) - I’m sure you can see why that would be a problem.
Window size analysis is a fairly rudimentary detection.

Why is extra-plugin-stealth not sufficient to pass bot detection 100% of the time? How come seemingly unrelated parameters change the success rate? To be fair, maybe not all parameters change the results to a sufficient degree, I didn’t do those calculations.

If you are certain you have configured Stealth correctly (and by the rigour in your question I assume you have) and you are hitting detections sporadically, it’s going to be one of these in most cases:

Poorly configured args in launchOptions (if you are using these).
Fishy IP / TCP stack.
Some sort of device fingerprint not aligning with the information presented (ie backend comparison).
Input actions not appearing human-like.

The detail in which these are analysed is going to depend on the sophistication of the anti-bot vendor’s solution.

1reaction

prescience-datacommented, Nov 30, 2020

The userDataDir and viewport issues are unrelated.

The IP stuff is more about TCP stack analysis detecting a mismatched OS to the one presented in your user-agent and other navigator properties.

Re the args, I had a look and this won’t solve your problem but add these:

    const viewport = {
      width: "1280",
      height: "720", // Your settings...
    }
    const launchOptions = {
      ignoreHTTPSErrors: true,
      defaultViewport: viewport,
      args: [
        `--no-sandbox`,
        `--disable-setuid-sandbox`,
        `--no-first-run`,
        `--disable-sync`,
        `--ignore-certificate-errors`,
        `--lang=en-US,en;q=0.9`,
        `--window-size=${viewport.width},${viewport.height}`,  
      ],
    }