[Bug] Unexpected Detection Behavior
See original GitHub issueSummary
I’ve come across some weird behavior with certain pages, specifically, product pages on N*w*gg.com that have the “Auto Notify” button. I’m sharing here because I’m hoping for some insight from somebody who knows more than I do, and hopefully this will help the maintainers of the plugin avoid detection in the future.
Actual Behavior
Whether or not my bot is detected and prompted with CAPTCHA varies wildly depending on inputs that I wouldn’t necessarily expect to change things. For example, my bot evaded detection 100% of the time when the userDataDir
property was supplied for launch; however, if I also set defaultViewport: null
and include the launch argument --start-fullscreen
, then my bot evades detection only 96% of the time.
Testing
The inputs that I tested are as follow:
- Running in Headless (I used Xvfb to run Headful since my Linux server is remote)
- Setting
defaultViewport: null
- Providing a directory for
userDataDir
- Setting arg
--start-fullscreen
- Setting args
['--proxy-server="direct://"', '--proxy-bypass-list=*']
To test these inputs, I loaded asus-rog-crosshair-viii-dark-hero/p/N82E16813119362
100 times for each combination of inputs (some rows have 200 trials because I removed a redundant input after testing). I included the average load time for the pages just for extra information.
Here are the results from my experiment in their entirety. Green means 100% pass rate, yellow is at least 90%, orange is at least 50%, and red is less than 50%.
Expand the spoiler to see the results sorted by input from 0…0 to 1…1:
More Results
Results
The entire expression resulting from the truth table above looks like this:
(HEADLESS ∧ DATA_DIR) ∨ ( ¬NULL_VIEWPORT ∧ DATA_DIR) ∨ (DATA_DIR ∧ ¬FULLSCREEN) ∨ (DATA_DIR ∧ PROXIES)
Rewriting with respect to DATA_DIR
, we get:
DATA_DIR ᴧ (HEADLESS v ¬NULL_VIEWPORT v ¬FULLSCREEN v PROXIES)
In other words, to pass bot detection 100% of the time, we can supply a data directory to Puppeteer and do any combination of the following:
- Run in Headless
- Don’t set default viewport to
null
- Don’t set fullscreen
- Set the given proxy options
Unanswered Questions
- How can we explain N*w*gg’s variability with regard to detection? Shouldn’t a given combination of inputs yield constant results?
- Why does supplying a data directory have such an impact on whether the bot is detected? Why does enabling
Null Viewport
andFullscreen
in addition toData Directory
drop success rate to 96%? - Why is
extra-plugin-stealth
not sufficient to pass bot detection 100% of the time? How come seemingly unrelated parameters change the success rate? To be fair, maybe not all parameters change the results to a sufficient degree, I didn’t do those calculations.
Code Snippet
Expand for Code Snippet
const Xvfb = require('xvfb');
const puppeteer = require('puppeteer-extra');
const cheerio = require('cheerio');
const asyncLoop = require('async-for-loop');
const fs = require('fs');
const URL = 'https://www.n[*]w[*]gg.com/asus-rog-crosshair-viii-dark-hero/p/N82E16813119362';
const isCaptchaPage = (html) => {
const $ = cheerio.load(html);
const productBuyBox = $('div.product-buy-box');
return productBuyBox.length === 0;
};
const runTest = async (USE_XVFB, USE_NULL_VIEWPORT, USE_DATA_DIR, START_FULL, USE_PROXIES, HEADLESS) => {
/* print status */
if (!HEADLESS && !USE_XVFB) {
console.log('Can\'t run headless without XVFB');
return [null, null];
}
/* start xvfb session */
let xvfb;
if (USE_XVFB) {
xvfb = new Xvfb({
xvfb_args: ['-screen', '0', '1280x720x24', '-ac'],
});
try {
xvfb.startSync();
console.log('Xvfb session started');
} catch (err) {
console.log('Failed to start Xvfb session');
throw err;
}
}
/* launch browser */
const puppeteerOptions = {
...(USE_DATA_DIR && { userDataDir: '/home/<USER>/userDataDir' }),
headless: HEADLESS,
...(USE_NULL_VIEWPORT && { defaultViewport: null }),
args: [
'--no-sandbox',
...(START_FULL ? ['--start-fullscreen'] : []),
...(USE_XVFB ? [`--display=${xvfb._display}`] : []),
...(USE_PROXIES ? ['--proxy-server="direct://"'] : []),
...(USE_PROXIES ? ['--proxy-bypass-list=*'] : []),
],
};
const browser = await puppeteer.launch(puppeteerOptions);
console.log('Browser launched');
/* go to page */
const page = await browser.newPage();
await page.goto(URL, { waitUntil: 'networkidle2' });
const { TaskDuration: time } = await page.metrics();
console.log(`...loaded page in ${time} seconds`);
await page.screenshot({ path: 'sc/result.png' });
console.log('...SC taken');
/* test HTML */
const html = await page.evaluate(() => {
return document.querySelector('*').outerHTML;
});
const isCaptcha = isCaptchaPage(html);
console.log(`...HTML processed. isCaptcha: ${isCaptcha}`);
/* close browser and stop Xvfb session */
await browser.close();
console.log('Browser closed');
if (USE_XVFB) {
xvfb.stopSync();
console.log('Xvfb session stopped');
}
return [time, isCaptcha];
};
/* open file for output */
const stream = fs.createWriteStream('testOutput', { flags: 'w+' });
stream.write('USE_XVFB,USE_NULL_VIEWPORT,USE_DATA_DIR,START_FULL,USE_PROXIES,HEADLESS,numTrue,numFalse,numNull,aveLoadTime\n');
/* fill value array */
let valueArr = [];
for (const v1 of [false, true]) {
for (const v2 of [false, true]) {
for (const v3 of [false, true]) {
for (const v4 of [false, true]) {
for (const v5 of [false, true]) {
for (const v6 of [false, true]) {
valueArr.push([v1, v2, v3, v4, v5, v6]);
}
}
}
}
}
}
const n = 100;
asyncLoop(valueArr.length, (j, outerNext) => {
/* get values for function call */
const [
USE_XVFB,
USE_NULL_VIEWPORT,
USE_DATA_DIR,
START_FULL,
USE_PROXIES,
HEADLESS,
] = valueArr[j];
console.log(`-----${USE_XVFB}-${USE_NULL_VIEWPORT}-${USE_DATA_DIR}-${START_FULL}-${USE_PROXIES}-${HEADLESS}-----`);
let numTrue = 0;
let numFalse = 0;
let numNull = 0;
let totalLoadTime = 0;
/* run the test n times */
asyncLoop(n, (i, innerNext) => {
console.log(i);
(async () => {
const [loadTime, isCaptcha] = await runTest(USE_XVFB, USE_NULL_VIEWPORT, USE_DATA_DIR, START_FULL, USE_PROXIES, HEADLESS);
if (isCaptcha === null) { numNull += 1; } else if (isCaptcha) { numTrue += 1; } else { numFalse += 1; }
if (loadTime !== null) totalLoadTime += loadTime;
innerNext();
})();
}, () => {
const aveLoadTime = totalLoadTime / n;
const str = `${USE_XVFB},${USE_NULL_VIEWPORT},${USE_DATA_DIR},${START_FULL},${USE_PROXIES},${HEADLESS},${numTrue},${numFalse},${numNull},${aveLoadTime}`;
console.log(str, '\n-----------------------------------');
stream.write(`${str}\n`);
outerNext();
});
}, () => {
console.log('Done!');
});
Versions
- System:
- OS:
Linux 5.4 Ubuntu 20.04.1 LTS (Focal Fossa)
- CPU:
(1) x64 Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
- Memory:
116.73 MB / 981.34 MB
- Container:
Yes
- Shell:
5.0.17 - /bin/bash
- OS:
- Binaries:
- Node:
12.19.0 - ~/.nvm/versions/node/v12.19.0/bin/node
- Yarn:
1.22.5 - /usr/bin/yarn
- npm:
6.14.8 - ~/.nvm/versions/node/v12.19.0/bin/npm
- Node:
- npmPackages:
- puppeteer:
^5.3.1 => 5.5.0
- puppeteer-extra:
^3.1.15 => 3.1.15
- puppeteer-extra-plugin-stealth:
^2.6.5 => 2.6.5
- puppeteer:
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (3 by maintainers)
Top GitHub Comments
I’ll add to this as I get more time but:
First find out which anti-detect solution they are using. You can do that with the discord bot in chat.
userDataDir
each session, you are effectively reusing the same directory over and over. (ie cookies, localStorage, etc) - I’m sure you can see why that would be a problem.If you are certain you have configured Stealth correctly (and by the rigour in your question I assume you have) and you are hitting detections sporadically, it’s going to be one of these in most cases:
args
inlaunchOptions
(if you are using these).The detail in which these are analysed is going to depend on the sophistication of the anti-bot vendor’s solution.
The
userDataDir
and viewport issues are unrelated.The IP stuff is more about TCP stack analysis detecting a mismatched OS to the one presented in your user-agent and other navigator properties.
Re the
args
, I had a look and this won’t solve your problem but add these: