question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to Ignore GET requests to /robots.txt file?

See original GitHub issue

I have a Node.js project. This project use local-auth with passport.js, express-session and connect-mongo package to store sessions in MongoDB sessions collection.

Each time when I update page or moving to another page, to sessions collection inserted 6 new items. Thus, across some time sessions collection becomes very large.

Code snippet for use express-session and connect-mongo:

var session = require('express-session')
var MongoStore = require('connect-mongo')(session)
app.use(session({
	resave: true,
	saveUninitialized: true,
	secret: secret.secretKey,
	store: new MongoStore({
		url: secret.database,
		autoReconnect: true
	})
}))

Morgan output in console (this happens for each page and page update):

C:\Users\User\Desktop\Project>node server.js
Node.js listening on port 3000
GET / 200 37.946 ms - 2670
GET /robots.txt 400 23.490 ms - 2328
GET /robots.txt 400 185.459 ms - 2328
GET /robots.txt 400 4.794 ms - 2328
GET /robots.txt 400 6.684 ms - 2328
GET /robots.txt 400 3.966 ms - 2328
GET /favicon.ico 400 7.314 ms - 2328
GET /robots.txt 400 3.713 ms - 2328

So, I think this happens by reason of many requests to non-existent robots.txt URL.

How can I make this file ignored in express-session?

Also I found this; http://codingteam.net/project/tapage/browse/node_modules/express/node_modules/connect/lib/middleware/session.js (lines 84 - 91):

 * Ignore Paths:
 *
 *  By default `/favicon.ico` is the only ignored path, all others
 *  will utilize sessions, to manipulate the paths ignored, use
 * `connect.session.ignore.push('/my/path')`. This works for _full_
 *  pathnames only, not segments nor substrings.
 *
 *     connect.session.ignore.push('/robots.txt');

But that’s does not work in new versions of express and express-session.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
dougwilsoncommented, Oct 1, 2017

Looking how that link implements ignore, it’s identical to https://www.npmjs.com/package/ignore-paths so if that’s ehat you need, wrap this middleware with that module to ignore the paths you need.

0reactions
dougwilsoncommented, Oct 1, 2017

Idk, that was never in this project.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Avoid robots.txt exclusions - Archive-It Help Center
How to find and read a robots exclusion request. A robots.txt file is always located at the topmost level of a website and...
Read more >
python - Web Crawler - Ignore Robots.txt file? - Stack Overflow
txt file in order to stop web crawlers from crawling through their websites. Is there a way to make a web crawler ignore...
Read more >
What is a robots.txt file? - Moz
Robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl & index pages on their website....
Read more >
Robots.txt Introduction and Guide | Google Search Central
Robots.txt is used to manage crawler traffic. Explore this robots.txt introduction guide to learn what robot.txt files are and how to use them....
Read more >
A Complete Guide to Robots.txt & Why It Matters - SEMrush
Note: While a robots.txt file provides instructions, it can't enforce them. It's like a code of conduct. Good bots (like search engine bots) ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found