S3 CsvItemExporter read of closed file error
See original GitHub issueDescription
Unable to use batch_item_count
in s3 to export feeds using the CsvItemExporter
. Could this be related to #4830? I’ve tried and succeeded in exporting the same feed to my local system, and also changing the format from csv
to json
and exporting to s3.
Steps to Reproduce
- Using the feed config of
"FEEDS": {
"s3://bucket/%(name)s/%(batch_time)s.tsv": {
"format": "csv",
"batch_item_count": 10,
"item_export_kwargs": {"delimiter": "\t"},
}
}
- Scrapy successfully parses data but has trouble exporting it to s3. However I’m not sure if this is an issue with
botocore
orscrapy
.botocore==1.20.29
however I downgraded a few releases with no change in the issue.
Expected behavior: Produces the error below.
Actual behavior: Only successfully exports the last chunk that was processed.
Reproduces how often: 100%
Versions
Scrapy : 2.4.1
lxml : 4.6.2.0
libxml2 : 2.9.10
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 21.2.0
Python : 3.8.5 (default, Aug 11 2020, 11:08:40) - [Clang 11.0.3 (clang-1103.0.32.62)]
pyOpenSSL : 20.0.1 (OpenSSL 1.1.1i 8 Dec 2020)
cryptography : 3.3.1
Platform : macOS-10.15.7-x86_64-i386-64bit
Additional context
2021-03-16 15:19:17 [scrapy.extensions.feedexport] ERROR: Error storing csv feed (10 items) in: s3://bucket/bucket-file.tsv
Traceback (most recent call last):
File ".venv/lib/python3.8/site-packages/botocore/httpsession.py", line 314, in send
urllib_response = conn.urlopen(
File ".venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File ".venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File ".venv/lib/python3.8/site-packages/urllib3/connection.py", line 234, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File ".pyenv/versions/3.8.5/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 92, in _send_request
rval = super(AWSConnection, self)._send_request(
File ".pyenv/versions/3.8.5/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File ".pyenv/versions/3.8.5/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 127, in _send_output
self._handle_expect_response(message_body)
File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 170, in _handle_expect_response
self._send_message_body(message_body)
File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 197, in _send_message_body
self.send(message_body)
File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 204, in send
return super(AWSConnection, self).send(str)
File ".pyenv/versions/3.8.5/lib/python3.8/http/client.py", line 963, in send
datablock = data.read(self.blocksize)
File ".pyenv/versions/3.8.5/lib/python3.8/tempfile.py", line 474, in func_wrapper
return func(*args, **kwargs)
ValueError: read of closed file
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".venv/lib/python3.8/site-packages/twisted/python/threadpool.py", line 238, in inContext
result = inContext.theWork() # type: ignore[attr-defined]
File ".venv/lib/python3.8/site-packages/twisted/python/threadpool.py", line 254, in <lambda>
inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
File ".venv/lib/python3.8/site-packages/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File ".venv/lib/python3.8/site-packages/twisted/python/context.py", line 83, in callWithContext
return func(*args, **kw)
File ".venv/lib/python3.8/site-packages/scrapy/extensions/feedexport.py", line 155, in _store_in_thread
self.s3_client.put_object(
File ".venv/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File ".venv/lib/python3.8/site-packages/botocore/client.py", line 662, in _make_api_call
http, parsed_response = self._make_request(
File ".venv/lib/python3.8/site-packages/botocore/client.py", line 682, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 136, in _send_request
while self._needs_retry(attempts, operation_model, request_dict,
File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 253, in _needs_retry
responses = self._event_emitter.emit(
File ".venv/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File ".venv/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File ".venv/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 183, in __call__
if self._checker(attempts, response, caught_exception):
File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 250, in __call__
should_retry = self._should_retry(attempt_number, response,
File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 269, in _should_retry
return self._checker(attempt_number, response, caught_exception)
File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 316, in __call__
checker_response = checker(attempt_number, response,
File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 222, in __call__
return self._check_caught_exception(
File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
raise caught_exception
File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 200, in _do_get_response
http_response = self._send(request)
File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 269, in _send
return self.http_session.send(request)
File ".venv/lib/python3.8/site-packages/botocore/httpsession.py", line 359, in send
raise HTTPClientError(error=e)
botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: read of closed file
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:20 (11 by maintainers)
Top Results From Across the Web
ValueError : I/O operation on closed file - Stack Overflow
When I try to write to the file it reports the error: ValueError: I/O operation on closed file. python · csv · file-io...
Read more >Python ValueError: I/O operation on closed file Solution
The “ValueError : I/O operation on closed file” error is raised when you try to read from or write to a file that...
Read more >ValueError: I/O operation on closed file in Python | bobbyhadz
The Python ValueError: I/O operation on closed file occurs when we try to perform an operation on a closed file. To solve the...
Read more >Release notes — Scrapy 2.7.1 documentation
The __init__ method of CsvItemExporter now supports an errors parameter to indicate ... When using botocore to persist files in S3, all botocore-supported ......
Read more >Scrapy Documentation - Read the Docs
Put this in a text file, name it to something like ... also means that other requests can keep going even if some...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Let me see if I can get it to work on my end. I’ll take a look tomorrow or the next.
@jackblk I more or less moved away from the issue as priorities changed for me. @marlenachatzigrigoriou did a great job and debugging most of it but we never found the underlying issue. Not sure if she wants to pick it up again?