question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make it possible to update settings in `__init__` or `from_crawler`

See original GitHub issue

This issue might be related to https://github.com/scrapy/scrapy/issues/1305

I noticed that settings are frozen in https://github.com/scrapy/scrapy/blob/master/scrapy/crawler.py#L57 However, in a given project I had a requirement to change some settings based on some spider arguments. An alternative would be to write this spider as a base class and extend it from specific spiders setting the proper settings. However, I think it would make sense to only freeze settings after the spider and other components were initialized. Or, provide some other entry point to configure settings based on arguments. The other option is to use -s arguments, but in my case I was changing the FEED_EXPORT_FIELDS setting (https://docs.scrapy.org/en/latest/topics/feed-exports.html#std:setting-FEED_EXPORT_FIELDS).

Any thoughts here?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
GeorgeA92commented, Mar 10, 2019

usage of -s argument with list based FEED_EXPORT_FIELDS setting: scrapy crawl quotes -s FEED_EXPORT_FIELDS=author,quote -o data_without_tags.csv

Option is to set list setting like FEED_EXPORT_FIELDS inside command is actual for all settings where BaseSettings.getlist method is used to read settings.: https://github.com/scrapy/scrapy/blob/b8594353d03be5574f51766c35566b713584302b/scrapy/settings/__init__.py#L161-L178

I noticed that BaseSettings.freeze method does only one thing: https://github.com/scrapy/scrapy/blob/b8594353d03be5574f51766c35566b713584302b/scrapy/settings/__init__.py#L352-L360

frozen attribute is used in _assert_mutability method which is actually prevents any changes to settings https://github.com/scrapy/scrapy/blob/b8594353d03be5574f51766c35566b713584302b/scrapy/settings/__init__.py#L336-L338

But if we change frozen attribute to False settings will become mutable and application will able to change make changes to settings using methods where _assert_mutability called: set , setmodule, update, delete, __delitem__

Spider code with updating settings inside from_crawler will look like this:

class SomeSpider(scrapy.Spider):
......
    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        if(crawler.settings.frozen):
            crawler.settings.frozen = False
            rawler.settings.set("SETTING","NEW_VALUE")
            #crawler.settings.overrides.settings.set("SETTING","NEW_VALUE")
            crawler.settings.freeze()
        spider = cls(*args, **kwargs)
        spider._set_crawler(crawler)
        return spider
0reactions
Dhruv97Sharmacommented, Feb 22, 2022

One possible solution for this could also be creating a few class variables and using them in the custom_settings being passed to the spider and then update the values of these class variables in the __init__ function of the spider, so when these custom settings are being applied, it will start using the updated values as passed from the __init__.

Example:

class NameOfYourSpider(scrapy.Spider):
    name = "spider_name"
    var1 = None

    custom_settings = {
        'ROBOTSTXT_OBEY' : var1 if var1 is not None else False,
    }

    # ... other class variables
    
    def __init__(self, obey_robotstxt):
        # This modifies the value of the class variable and also the custom value of the settings that one wanted to update
        self.var1 = True

Read more comments on GitHub >

github_iconTop Results From Across the Web

Settings — Scrapy 2.7.1 documentation
When you use Scrapy, you have to tell it which settings you're using. You can do this by using an environment variable, SCRAPY_SETTINGS_MODULE...
Read more >
How to access scrapy settings from item Pipeline
To get your settings from (settings.py): from ... The crawler engine then calls the pipeline's init function with my_setting , like so:
Read more >
Better API to manage pipelines/middlewares priority #5206
It's quite common to update pipelines/middlewares and, usually, we want them to be in a position related to some other already registered ...
Read more >
Using Python 3+ 1. Include the following imports at | Chegg.com
Question: Using Python 3+ 1. Include the following imports at the top of your module (hopefully this is sufficient): from web import LinkCollector...
Read more >
Setting crawler configuration options - AWS Glue
If you don't want a crawler to overwrite updates you made to existing fields in an Amazon S3 table definition, choose the option...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found