question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Unexpected Detection Behavior

See original GitHub issue

Summary

I’ve come across some weird behavior with certain pages, specifically, product pages on N*w*gg.com that have the “Auto Notify” button. I’m sharing here because I’m hoping for some insight from somebody who knows more than I do, and hopefully this will help the maintainers of the plugin avoid detection in the future.

Actual Behavior

Whether or not my bot is detected and prompted with CAPTCHA varies wildly depending on inputs that I wouldn’t necessarily expect to change things. For example, my bot evaded detection 100% of the time when the userDataDir property was supplied for launch; however, if I also set defaultViewport: null and include the launch argument --start-fullscreen, then my bot evades detection only 96% of the time.

Testing

The inputs that I tested are as follow:

  • Running in Headless (I used Xvfb to run Headful since my Linux server is remote)
  • Setting defaultViewport: null
  • Providing a directory for userDataDir
  • Setting arg --start-fullscreen
  • Setting args ['--proxy-server="direct://"', '--proxy-bypass-list=*']

To test these inputs, I loaded asus-rog-crosshair-viii-dark-hero/p/N82E16813119362 100 times for each combination of inputs (some rows have 200 trials because I removed a redundant input after testing). I included the average load time for the pages just for extra information.

Here are the results from my experiment in their entirety. Green means 100% pass rate, yellow is at least 90%, orange is at least 50%, and red is less than 50%. success_rate

Expand the spoiler to see the results sorted by input from 0…0 to 1…1:

More Results

false_to_true

Results

The entire expression resulting from the truth table above looks like this:

(HEADLESS ∧ DATA_DIR) ∨ ( ¬NULL_VIEWPORT ∧ DATA_DIR) ∨ (DATA_DIR ∧ ¬FULLSCREEN) ∨ (DATA_DIR ∧ PROXIES)

Rewriting with respect to DATA_DIR, we get:

DATA_DIR ᴧ (HEADLESS v ¬NULL_VIEWPORT v ¬FULLSCREEN v PROXIES)

In other words, to pass bot detection 100% of the time, we can supply a data directory to Puppeteer and do any combination of the following:

  • Run in Headless
  • Don’t set default viewport to null
  • Don’t set fullscreen
  • Set the given proxy options

Unanswered Questions

  1. How can we explain N*w*gg’s variability with regard to detection? Shouldn’t a given combination of inputs yield constant results?
  2. Why does supplying a data directory have such an impact on whether the bot is detected? Why does enabling Null Viewport and Fullscreen in addition to Data Directory drop success rate to 96%?
  3. Why is extra-plugin-stealth not sufficient to pass bot detection 100% of the time? How come seemingly unrelated parameters change the success rate? To be fair, maybe not all parameters change the results to a sufficient degree, I didn’t do those calculations.

Code Snippet

Expand for Code Snippet
const Xvfb = require('xvfb');
const puppeteer = require('puppeteer-extra');
const cheerio = require('cheerio');
const asyncLoop = require('async-for-loop');
const fs = require('fs');

const URL = 'https://www.n[*]w[*]gg.com/asus-rog-crosshair-viii-dark-hero/p/N82E16813119362';

const isCaptchaPage = (html) => {
  const $ = cheerio.load(html);
  const productBuyBox = $('div.product-buy-box');
  return productBuyBox.length === 0;
};

const runTest = async (USE_XVFB, USE_NULL_VIEWPORT, USE_DATA_DIR, START_FULL, USE_PROXIES, HEADLESS) => {
  /* print status */
  if (!HEADLESS && !USE_XVFB) {
    console.log('Can\'t run headless without XVFB');
    return [null, null];
  }
  /* start xvfb session */
  let xvfb;
  if (USE_XVFB) {
    xvfb = new Xvfb({
      xvfb_args: ['-screen', '0', '1280x720x24', '-ac'],
    });
    try {
      xvfb.startSync();
      console.log('Xvfb session started');
    } catch (err) {
      console.log('Failed to start Xvfb session');
      throw err;
    }
  }

  /* launch browser */
  const puppeteerOptions = {
    ...(USE_DATA_DIR && { userDataDir: '/home/<USER>/userDataDir' }),
    headless: HEADLESS,
    ...(USE_NULL_VIEWPORT && { defaultViewport: null }),
    args: [
      '--no-sandbox',
      ...(START_FULL ? ['--start-fullscreen'] : []),
      ...(USE_XVFB ? [`--display=${xvfb._display}`] : []),
      ...(USE_PROXIES ? ['--proxy-server="direct://"'] : []),
      ...(USE_PROXIES ? ['--proxy-bypass-list=*'] : []),
    ],
  };
  const browser = await puppeteer.launch(puppeteerOptions);
  console.log('Browser launched');

  /* go to page */
  const page = await browser.newPage();
  await page.goto(URL, { waitUntil: 'networkidle2' });
  const { TaskDuration: time } = await page.metrics();
  console.log(`...loaded page in ${time} seconds`);
  await page.screenshot({ path: 'sc/result.png' });
  console.log('...SC taken');

  /* test HTML */
  const html = await page.evaluate(() => {
    return document.querySelector('*').outerHTML;
  });
  const isCaptcha = isCaptchaPage(html);
  console.log(`...HTML processed. isCaptcha: ${isCaptcha}`);

  /* close browser and stop Xvfb session */
  await browser.close();
  console.log('Browser closed');
  if (USE_XVFB) {
    xvfb.stopSync();
    console.log('Xvfb session stopped');
  }
  return [time, isCaptcha];
};

/* open file for output */
const stream = fs.createWriteStream('testOutput', { flags: 'w+' });

stream.write('USE_XVFB,USE_NULL_VIEWPORT,USE_DATA_DIR,START_FULL,USE_PROXIES,HEADLESS,numTrue,numFalse,numNull,aveLoadTime\n');

/* fill value array */
let valueArr = [];
for (const v1 of [false, true]) {
  for (const v2 of [false, true]) {
    for (const v3 of [false, true]) {
      for (const v4 of [false, true]) {
        for (const v5 of [false, true]) {
          for (const v6 of [false, true]) {
            valueArr.push([v1, v2, v3, v4, v5, v6]);
          }
        }
      }
    }
  }
}

const n = 100;
asyncLoop(valueArr.length, (j, outerNext) => {
  /* get values for function call */
  const [
    USE_XVFB,
    USE_NULL_VIEWPORT,
    USE_DATA_DIR,
    START_FULL,
    USE_PROXIES,
    HEADLESS,
  ] = valueArr[j];

  console.log(`-----${USE_XVFB}-${USE_NULL_VIEWPORT}-${USE_DATA_DIR}-${START_FULL}-${USE_PROXIES}-${HEADLESS}-----`);

  let numTrue = 0;
  let numFalse = 0;
  let numNull = 0;

  let totalLoadTime = 0;

  /* run the test n times */
  asyncLoop(n, (i, innerNext) => {
    console.log(i);
    (async () => {
      const [loadTime, isCaptcha] = await runTest(USE_XVFB, USE_NULL_VIEWPORT, USE_DATA_DIR, START_FULL, USE_PROXIES, HEADLESS);
      if (isCaptcha === null) { numNull += 1; } else if (isCaptcha) { numTrue += 1; } else { numFalse += 1; }
      if (loadTime !== null) totalLoadTime += loadTime;
      innerNext();
    })();
  }, () => {
    const aveLoadTime = totalLoadTime / n;
    const str = `${USE_XVFB},${USE_NULL_VIEWPORT},${USE_DATA_DIR},${START_FULL},${USE_PROXIES},${HEADLESS},${numTrue},${numFalse},${numNull},${aveLoadTime}`;
    console.log(str, '\n-----------------------------------');
    stream.write(`${str}\n`);
    outerNext();
  });
}, () => {
  console.log('Done!');
});

Versions

  • System:
    • OS: Linux 5.4 Ubuntu 20.04.1 LTS (Focal Fossa)
    • CPU: (1) x64 Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
    • Memory: 116.73 MB / 981.34 MB
    • Container: Yes
    • Shell: 5.0.17 - /bin/bash
  • Binaries:
    • Node: 12.19.0 - ~/.nvm/versions/node/v12.19.0/bin/node
    • Yarn: 1.22.5 - /usr/bin/yarn
    • npm: 6.14.8 - ~/.nvm/versions/node/v12.19.0/bin/npm
  • npmPackages:
    • puppeteer: ^5.3.1 => 5.5.0
    • puppeteer-extra: ^3.1.15 => 3.1.15
    • puppeteer-extra-plugin-stealth: ^2.6.5 => 2.6.5

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
prescience-datacommented, Nov 30, 2020

I’ll add to this as I get more time but:

How can we explain Nwgg’s variability with regard to detection? Shouldn’t a given combination of inputs yield constant results?

First find out which anti-detect solution they are using. You can do that with the discord bot in chat. image

Why does supplying a data directory have such an impact on whether the bot is detected? Why does enabling Null Viewport and Fullscreen in addition to Data Directory drop success rate to 96%?

  • If you don’t spawn a fresh userDataDir each session, you are effectively reusing the same directory over and over. (ie cookies, localStorage, etc) - I’m sure you can see why that would be a problem.
  • Window size analysis is a fairly rudimentary detection.

Why is extra-plugin-stealth not sufficient to pass bot detection 100% of the time? How come seemingly unrelated parameters change the success rate? To be fair, maybe not all parameters change the results to a sufficient degree, I didn’t do those calculations.

If you are certain you have configured Stealth correctly (and by the rigour in your question I assume you have) and you are hitting detections sporadically, it’s going to be one of these in most cases:

  • Poorly configured args in launchOptions (if you are using these).
  • Fishy IP / TCP stack.
  • Some sort of device fingerprint not aligning with the information presented (ie backend comparison).
  • Input actions not appearing human-like.

The detail in which these are analysed is going to depend on the sophistication of the anti-bot vendor’s solution.

1reaction
prescience-datacommented, Nov 30, 2020

The userDataDir and viewport issues are unrelated.

The IP stuff is more about TCP stack analysis detecting a mismatched OS to the one presented in your user-agent and other navigator properties.

Re the args, I had a look and this won’t solve your problem but add these:

    const viewport = {
      width: "1280",
      height: "720", // Your settings...
    }
    const launchOptions = {
      ignoreHTTPSErrors: true,
      defaultViewport: viewport,
      args: [
        `--no-sandbox`,
        `--disable-setuid-sandbox`,
        `--no-first-run`,
        `--disable-sync`,
        `--ignore-certificate-errors`,
        `--lang=en-US,en;q=0.9`,
        `--window-size=${viewport.width},${viewport.height}`,  
      ],
    }
Read more comments on GitHub >

github_iconTop Results From Across the Web

typo in object detection tutorial causes unexpected behavior #1247
Hi and thank you for all your work, I think I have found a typo in the tutorial docs time_limit = 60*30 #...
Read more >
Unexplained Windows or software behavior may be ...
Describes behavior that may occur if your computer has deceptive software that is installed and ... New toolbars are unexpectedly added to your...
Read more >
unexpected behaviors with stack protector functionality
Issue 477465: unexpected behaviors with stack protector functionality. Reported by schedule lloz...@chromium.org lloz...@chromium.org.
Read more >
Bug Detection Using Particle Swarm Optimization with ...
In this paper, we propose a method and tool for software bugs detection by finding such input that causes an unexpected output guided...
Read more >
What Is Network Behavior Anomaly Detection? Definition, ...
The post-pandemic corporate environment is rife with unpredictable cybersecurity threats. New types of malware built to silently compromise ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found