question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3 CsvItemExporter read of closed file error

See original GitHub issue

Description

Unable to use batch_item_count in s3 to export feeds using the CsvItemExporter. Could this be related to #4830? I’ve tried and succeeded in exporting the same feed to my local system, and also changing the format from csv to json and exporting to s3.

Steps to Reproduce

  1. Using the feed config of
"FEEDS": {
            "s3://bucket/%(name)s/%(batch_time)s.tsv": {
                "format": "csv",
                "batch_item_count": 10,
                "item_export_kwargs": {"delimiter": "\t"},
            }
        }
  1. Scrapy successfully parses data but has trouble exporting it to s3. However I’m not sure if this is an issue with botocore or scrapy. botocore==1.20.29 however I downgraded a few releases with no change in the issue.

Expected behavior: Produces the error below.

Actual behavior: Only successfully exports the last chunk that was processed.

Reproduces how often: 100%

Versions

Scrapy       : 2.4.1
lxml         : 4.6.2.0
libxml2      : 2.9.10
cssselect    : 1.1.0
parsel       : 1.6.0
w3lib        : 1.22.0
Twisted      : 21.2.0
Python       : 3.8.5 (default, Aug 11 2020, 11:08:40) - [Clang 11.0.3 (clang-1103.0.32.62)]
pyOpenSSL    : 20.0.1 (OpenSSL 1.1.1i  8 Dec 2020)
cryptography : 3.3.1
Platform     : macOS-10.15.7-x86_64-i386-64bit

Additional context

2021-03-16 15:19:17 [scrapy.extensions.feedexport] ERROR: Error storing csv feed (10 items) in: s3://bucket/bucket-file.tsv
Traceback (most recent call last):
  File ".venv/lib/python3.8/site-packages/botocore/httpsession.py", line 314, in send
    urllib_response = conn.urlopen(
  File ".venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File ".venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File ".venv/lib/python3.8/site-packages/urllib3/connection.py", line 234, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File ".pyenv/versions/3.8.5/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 92, in _send_request
    rval = super(AWSConnection, self)._send_request(
  File ".pyenv/versions/3.8.5/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File ".pyenv/versions/3.8.5/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 127, in _send_output
    self._handle_expect_response(message_body)
  File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 170, in _handle_expect_response
    self._send_message_body(message_body)
  File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 197, in _send_message_body
    self.send(message_body)
  File ".venv/lib/python3.8/site-packages/botocore/awsrequest.py", line 204, in send
    return super(AWSConnection, self).send(str)
  File ".pyenv/versions/3.8.5/lib/python3.8/http/client.py", line 963, in send
    datablock = data.read(self.blocksize)
  File ".pyenv/versions/3.8.5/lib/python3.8/tempfile.py", line 474, in func_wrapper
    return func(*args, **kwargs)
ValueError: read of closed file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".venv/lib/python3.8/site-packages/twisted/python/threadpool.py", line 238, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File ".venv/lib/python3.8/site-packages/twisted/python/threadpool.py", line 254, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File ".venv/lib/python3.8/site-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File ".venv/lib/python3.8/site-packages/twisted/python/context.py", line 83, in callWithContext
    return func(*args, **kw)
  File ".venv/lib/python3.8/site-packages/scrapy/extensions/feedexport.py", line 155, in _store_in_thread
    self.s3_client.put_object(
  File ".venv/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File ".venv/lib/python3.8/site-packages/botocore/client.py", line 662, in _make_api_call
    http, parsed_response = self._make_request(
  File ".venv/lib/python3.8/site-packages/botocore/client.py", line 682, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 136, in _send_request
    while self._needs_retry(attempts, operation_model, request_dict,
  File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 253, in _needs_retry
    responses = self._event_emitter.emit(
  File ".venv/lib/python3.8/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File ".venv/lib/python3.8/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File ".venv/lib/python3.8/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 183, in __call__
    if self._checker(attempts, response, caught_exception):
  File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 250, in __call__
    should_retry = self._should_retry(attempt_number, response,
  File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 269, in _should_retry
    return self._checker(attempt_number, response, caught_exception)
  File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 316, in __call__
    checker_response = checker(attempt_number, response,
  File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 222, in __call__
    return self._check_caught_exception(
  File ".venv/lib/python3.8/site-packages/botocore/retryhandler.py", line 359, in _check_caught_exception
    raise caught_exception
  File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 200, in _do_get_response
    http_response = self._send(request)
  File ".venv/lib/python3.8/site-packages/botocore/endpoint.py", line 269, in _send
    return self.http_session.send(request)
  File ".venv/lib/python3.8/site-packages/botocore/httpsession.py", line 359, in send
    raise HTTPClientError(error=e)
botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: read of closed file

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:20 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
mmaybenocommented, Jun 30, 2021

Let me see if I can get it to work on my end. I’ll take a look tomorrow or the next.

1reaction
mmaybenocommented, Feb 28, 2022

@jackblk I more or less moved away from the issue as priorities changed for me. @marlenachatzigrigoriou did a great job and debugging most of it but we never found the underlying issue. Not sure if she wants to pick it up again?

Read more comments on GitHub >

github_iconTop Results From Across the Web

ValueError : I/O operation on closed file - Stack Overflow
When I try to write to the file it reports the error: ValueError: I/O operation on closed file. python · csv · file-io...
Read more >
Python ValueError: I/O operation on closed file Solution
The “ValueError : I/O operation on closed file” error is raised when you try to read from or write to a file that...
Read more >
ValueError: I/O operation on closed file in Python | bobbyhadz
The Python ValueError: I/O operation on closed file occurs when we try to perform an operation on a closed file. To solve the...
Read more >
Release notes — Scrapy 2.7.1 documentation
The __init__ method of CsvItemExporter now supports an errors parameter to indicate ... When using botocore to persist files in S3, all botocore-supported ......
Read more >
Scrapy Documentation - Read the Docs
Put this in a text file, name it to something like ... also means that other requests can keep going even if some...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found