question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TypeError: not all arguments converted during string formatting

See original GitHub issue

Description

I’m encountering a very strange issue which I don’t understand. There has been no change in the code and I only updated to 2.5.1 but after reverting the error presents itself anyway. I’m stumbled on how to proceed as well.

Steps to Reproduce

  1. Run the script for my two crawlers:
def run(args):
    settings = get_project_settings()
    settings.update(
        {
            "LOG_LEVEL": args.log,
        }
    )
    configure_logging(settings)
    playlist_settings = settings.copy()
    playlist_settings.update(
        {
            "FEEDS": {
                Path("%(source_to_file)s").with_suffix(".csv"): {
                    "format": "csv",
                    "uri_params": "scrapy_my.utils.playlist_params",
                }
            },
        }
    )
    video_settings = settings.copy()
    video_settings.update(
        {
            "ITEM_PIPELINES": {
                "scrapy_my.pipelines.ExclusionPipeline": 100,
            },
            "FEEDS": {
                Path(args.collection): {
                    "format": "sqlite",
                },
            },
            "COLLECTION": args.collection,
        }
    )

    playlist_runner = CrawlerRunner(settings=playlist_settings)
    video_runner = CrawlerRunner(settings=video_settings)

    @defer.inlineCallbacks
    def crawl():
        for s in args.source:
            if "playlists" in args.spiders:
                yield playlist_runner.crawl("playlists", source=s)

            if "videos" in args.spiders:
                # f = PurePath(urlparse(s).path).with_suffix('.csv').name
                f = PurePath(urlparse(s).path).stem
                yield video_runner.crawl("videos", urls_file=f"{f}.csv")

        reactor.stop()

    crawl()
    reactor.run()  # the script will block here until the last crawl call is finished


def main():
    parser = argparse.ArgumentParser(description="Run Scrapy spiders")
    parser.add_argument(
        "spiders",
        type=str,
        nargs="+",
        choices=["videos", "playlists", "studio_images"],
        help="which spiders to run",
    )
    parser.add_argument(
        "--source",
        type=str,
        action="append",
        required=True,
        help="arguments for spiders to run",
    )
    parser.add_argument(
        "--out",
        type=str,
        required=True,
        dest="storage_dir",
        help="directory to store output files and images",
    )
    parser.add_argument(
        "--collection",
        type=str,
        default="collection.db",
        help="directory to store collection db",
    )
    parser.add_argument(
        "--log",
        type=str,
        choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
        default="INFO",
        help="log level",
    )
    args = parser.parse_args()

    collection = Path(args.collection)

    if not is_sqlite(collection):
        print("Collection file is not a valid sqlite3 db")
        exit(1)

    run(args)


def is_sqlite(db):
    """
    using sqlite3.connect(f"file:{Path(args.collection)}?mode=rw", uri=True)
    gives false positives as it returns an object even for text or csv files
    """
    if db.is_file():
        stat = db.stat()
        # file is empty, give benefit of the doubt that its sqlite
        # New sqlite3 files created in recent libraries are empty!
        if stat.st_size == 0:
            return True
        # SQLite database file header is 100 bytes
        if stat.st_size < 100:
            return False
        # Validate file header
        with open(db, "rb") as fd:
            header = fd.read(100)
        return header[:16] == b"SQLite format 3\x00"


if __name__ == "__main__":
    main()
  1. Spider is as follows:
class VideoSpider(scrapy.Spider):
    name = "videos"
    start_urls = []
    custom_settings = {}

    def __init__(self, *args, **kwargs):
        super(VideoSpider, self).__init__(*args, **kwargs)
        self.urls_file = kwargs.pop("urls_file")

        columns = defaultdict(list)  # each value in each column is appended to a list
        with open(self.urls_file) as f:
            # read rows into a dictionary format
            reader = csv.DictReader(f)
            # read a row as {column1: value1, column2: value2,...}
            for row in reader:
                # go over each column name and value
                for (k, v) in row.items():
                    # append the value into the appropriate list
                    # based on column name k
                    columns[k].append(v)

        self.start_urls = columns["link"]

    def parse(self, response):
        pass

Expected behavior: Normal running of the script.

Actual behavior: The following error appears:

ERROR: Error caught on signal handler: <bound method CoreStats.spider_closed of <scrapy.extensions.corestats.CoreStats object at 0x107426af0>>
Traceback (most recent call last):
  File "/users/user/scrapy/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/user/scrapy/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/user/scrapy/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
    result = current_context.run(gen.send, result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/user/scrapy/.venv/lib/python3.9/site-packages/scrapy/crawler.py", line 89, in crawl
    yield self.engine.open_spider(self.spider, start_requests)
TypeError: not all arguments converted during string formatting

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/users/user/scrapy/.venv/lib/python3.9/site-packages/scrapy/utils/defer.py", line 157, in maybeDeferred_coro
    result = f(*args, **kw)
  File "/users/user/scrapy/.venv/lib/python3.9/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/users/user/scrapy/.venv/lib/python3.9/site-packages/scrapy/extensions/corestats.py", line 31, in spider_closed
    elapsed_time = finish_time - self.start_time
TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'NoneType'

Reproduces how often: 100%.

Versions

Scrapy       : 2.5.1
lxml         : 4.6.3.0
libxml2      : 2.9.10
cssselect    : 1.1.0
parsel       : 1.6.0
w3lib        : 1.22.0
Twisted      : 21.7.0
Python       : 3.9.6 (default, Jun 29 2021, 05:25:02) - [Clang 12.0.5 (clang-1205.0.22.9)]
pyOpenSSL    : 21.0.0 (OpenSSL 1.1.1l  24 Aug 2021)
cryptography : 35.0.0
Platform     : macOS-11.4-x86_64-i386-64bit

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
elacuestacommented, Oct 21, 2021

Please provide a minimal, reproducible example. I see things which do not seem relevant to the issue at hand (argument parsing, database queries, multiple crawlers), while others are missing (the code for playlist_params for instance, which might be relevant given that we’re talking about string formatting). More often than not, the very process of removing non-essential bits of code leads to finding the solution to the problem.

0reactions
devster31commented, Oct 23, 2021

I actually ran the spider above in a brand new project initiated with scrapy startproject project. Removing settings.py results in an ModuleError and running project/spiders/videos2.py with no changes to the default template results in at least these settings being overridden:

2021-10-23 19:04:39 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'project',
 'NEWSPIDER_MODULE': 'project.spiders',
 'ROBOTSTXT_OBEY': True,
 'SPIDER_LOADER_WARN_ONLY': True,
 'SPIDER_MODULES': ['project.spiders']}

Digging deeper I found this is happening because of another change I made to scrapy/scrapy/extensions/feedexport.py, based on https://github.com/scrapy/scrapy/pull/4966. In short the change I made incorrectly made this function https://github.com/scrapy/scrapy/blob/cfff79cee6a97528185b7d24e2b660b99c07945f/scrapy/extensions/feedexport.py#L527-L537 return None when no uri_params functions were defined. It wasn’t a setting issue, but a wrong fix to another issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

not all arguments converted during string formatting python
For me, This error was caused when I was attempting to pass in a tuple into the string format method.
Read more >
Not all arguments converted during string formatting
The “not all arguments converted during string formatting” error is raised when Python does not add in all arguments to a string format...
Read more >
TypeError: not all arguments converted during ... - STechies
You might encounter these errors when working with integers and strings. Such a common error is TypeError: not all arguments converted during string...
Read more >
Resolving "typeerror: not all arguments converted ... in Python"
In Python, “typeerror: not all arguments converted during string formatting” primarily occurs when: There is no format specifier. The format specifiers and ...
Read more >
Not all arguments converted during string formatting (Python)
The Python "TypeError: not all arguments converted during string formatting" occurs when we use incorrect syntax to format a string or use ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found