Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Promote a new CrawlSpider that allows overriding `parse`

See original GitHub issue

I see a lot of StackOverflow questions and problems with CrawlSpider and overriden parse methods. (e.g. https://stackoverflow.com/questions/23511230)

I’d like to see a new implementation, called CrawlSpider2 or CrawlingSpider or something, that uses another internal method with another name than parse, so that user could define their own parse method.

Then, the question is if users will expect this parse method to be used by default for each downloaded page (is addition to being parsed for links with Rules), or if the reference to parse should be explicit.

Thoughts?

Issue Analytics

State:
Created 9 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

nyovcommented, Sep 18, 2016

Closing #712 in favor of #712 …nice sleight of hand there. 😃

1reaction

nyovcommented, May 24, 2014

When you haven’t read the source and understand what’s going on, it’s easy to forget you should, or even need to call super(). Yes, the warning in the docs is also misleading, but someone reading it will know how to fix it (not using parse).

This issue is so common, it’s becoming a “useability bug” - not a bug in itself, but creating unnecessary headaches, not intuitive.

I propose having another internal _parse method called by the Scraper instead of parse, and using that in spiders who want internal “pre-processing”, then exposing parse as the public, documented, no-baggage method to implement/override from a spider.