TypeError: not all arguments converted during string formatting
See original GitHub issueDescription
I’m encountering a very strange issue which I don’t understand. There has been no change in the code and I only updated to 2.5.1 but after reverting the error presents itself anyway. I’m stumbled on how to proceed as well.
Steps to Reproduce
- Run the script for my two crawlers:
def run(args):
settings = get_project_settings()
settings.update(
{
"LOG_LEVEL": args.log,
}
)
configure_logging(settings)
playlist_settings = settings.copy()
playlist_settings.update(
{
"FEEDS": {
Path("%(source_to_file)s").with_suffix(".csv"): {
"format": "csv",
"uri_params": "scrapy_my.utils.playlist_params",
}
},
}
)
video_settings = settings.copy()
video_settings.update(
{
"ITEM_PIPELINES": {
"scrapy_my.pipelines.ExclusionPipeline": 100,
},
"FEEDS": {
Path(args.collection): {
"format": "sqlite",
},
},
"COLLECTION": args.collection,
}
)
playlist_runner = CrawlerRunner(settings=playlist_settings)
video_runner = CrawlerRunner(settings=video_settings)
@defer.inlineCallbacks
def crawl():
for s in args.source:
if "playlists" in args.spiders:
yield playlist_runner.crawl("playlists", source=s)
if "videos" in args.spiders:
# f = PurePath(urlparse(s).path).with_suffix('.csv').name
f = PurePath(urlparse(s).path).stem
yield video_runner.crawl("videos", urls_file=f"{f}.csv")
reactor.stop()
crawl()
reactor.run() # the script will block here until the last crawl call is finished
def main():
parser = argparse.ArgumentParser(description="Run Scrapy spiders")
parser.add_argument(
"spiders",
type=str,
nargs="+",
choices=["videos", "playlists", "studio_images"],
help="which spiders to run",
)
parser.add_argument(
"--source",
type=str,
action="append",
required=True,
help="arguments for spiders to run",
)
parser.add_argument(
"--out",
type=str,
required=True,
dest="storage_dir",
help="directory to store output files and images",
)
parser.add_argument(
"--collection",
type=str,
default="collection.db",
help="directory to store collection db",
)
parser.add_argument(
"--log",
type=str,
choices=["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
default="INFO",
help="log level",
)
args = parser.parse_args()
collection = Path(args.collection)
if not is_sqlite(collection):
print("Collection file is not a valid sqlite3 db")
exit(1)
run(args)
def is_sqlite(db):
"""
using sqlite3.connect(f"file:{Path(args.collection)}?mode=rw", uri=True)
gives false positives as it returns an object even for text or csv files
"""
if db.is_file():
stat = db.stat()
# file is empty, give benefit of the doubt that its sqlite
# New sqlite3 files created in recent libraries are empty!
if stat.st_size == 0:
return True
# SQLite database file header is 100 bytes
if stat.st_size < 100:
return False
# Validate file header
with open(db, "rb") as fd:
header = fd.read(100)
return header[:16] == b"SQLite format 3\x00"
if __name__ == "__main__":
main()
- Spider is as follows:
class VideoSpider(scrapy.Spider):
name = "videos"
start_urls = []
custom_settings = {}
def __init__(self, *args, **kwargs):
super(VideoSpider, self).__init__(*args, **kwargs)
self.urls_file = kwargs.pop("urls_file")
columns = defaultdict(list) # each value in each column is appended to a list
with open(self.urls_file) as f:
# read rows into a dictionary format
reader = csv.DictReader(f)
# read a row as {column1: value1, column2: value2,...}
for row in reader:
# go over each column name and value
for (k, v) in row.items():
# append the value into the appropriate list
# based on column name k
columns[k].append(v)
self.start_urls = columns["link"]
def parse(self, response):
pass
Expected behavior: Normal running of the script.
Actual behavior: The following error appears:
ERROR: Error caught on signal handler: <bound method CoreStats.spider_closed of <scrapy.extensions.corestats.CoreStats object at 0x107426af0>>
Traceback (most recent call last):
File "/users/user/scrapy/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/user/scrapy/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/user/scrapy/.venv/lib/python3.9/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
result = current_context.run(gen.send, result)
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/user/scrapy/.venv/lib/python3.9/site-packages/scrapy/crawler.py", line 89, in crawl
yield self.engine.open_spider(self.spider, start_requests)
TypeError: not all arguments converted during string formatting
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/users/user/scrapy/.venv/lib/python3.9/site-packages/scrapy/utils/defer.py", line 157, in maybeDeferred_coro
result = f(*args, **kw)
File "/users/user/scrapy/.venv/lib/python3.9/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/users/user/scrapy/.venv/lib/python3.9/site-packages/scrapy/extensions/corestats.py", line 31, in spider_closed
elapsed_time = finish_time - self.start_time
TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'NoneType'
Reproduces how often: 100%.
Versions
Scrapy : 2.5.1
lxml : 4.6.3.0
libxml2 : 2.9.10
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.7.0
Python : 3.9.6 (default, Jun 29 2021, 05:25:02) - [Clang 12.0.5 (clang-1205.0.22.9)]
pyOpenSSL : 21.0.0 (OpenSSL 1.1.1l 24 Aug 2021)
cryptography : 35.0.0
Platform : macOS-11.4-x86_64-i386-64bit
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
not all arguments converted during string formatting python
For me, This error was caused when I was attempting to pass in a tuple into the string format method.
Read more >Not all arguments converted during string formatting
The “not all arguments converted during string formatting” error is raised when Python does not add in all arguments to a string format...
Read more >TypeError: not all arguments converted during ... - STechies
You might encounter these errors when working with integers and strings. Such a common error is TypeError: not all arguments converted during string...
Read more >Resolving "typeerror: not all arguments converted ... in Python"
In Python, “typeerror: not all arguments converted during string formatting” primarily occurs when: There is no format specifier. The format specifiers and ...
Read more >Not all arguments converted during string formatting (Python)
The Python "TypeError: not all arguments converted during string formatting" occurs when we use incorrect syntax to format a string or use ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Please provide a minimal, reproducible example. I see things which do not seem relevant to the issue at hand (argument parsing, database queries, multiple crawlers), while others are missing (the code for
playlist_params
for instance, which might be relevant given that we’re talking about string formatting). More often than not, the very process of removing non-essential bits of code leads to finding the solution to the problem.I actually ran the spider above in a brand new project initiated with
scrapy startproject project
. Removingsettings.py
results in anModuleError
and runningproject/spiders/videos2.py
with no changes to the default template results in at least these settings being overridden:Digging deeper I found this is happening because of another change I made to
scrapy/scrapy/extensions/feedexport.py
, based on https://github.com/scrapy/scrapy/pull/4966. In short the change I made incorrectly made this function https://github.com/scrapy/scrapy/blob/cfff79cee6a97528185b7d24e2b660b99c07945f/scrapy/extensions/feedexport.py#L527-L537 returnNone
when nouri_params
functions were defined. It wasn’t a setting issue, but a wrong fix to another issue.