question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feed is not ovewritten when custom extension is used

See original GitHub issue

Description

I’m trying to export scrapy crawl results to JSON Lines format to the file with extension .jsonl (this is requirement of the external system in our case) and ovewrite the file for multiple executions. As I understand, only .jl and .jsonlines extensions are supported now and .jsonl was discussed in #4848 but not supported yet. So in this case I tried to use -O argument with --output-format for scrapy crawl command.

Steps to Reproduce

  1. scrapy crawl -O <filename>.jsonl --output-format jl <spider_name> OR scrapy crawl -O <filename>.jsonl --output-format jsonlines <spider_name>

Expected behavior: File is ovewritten with parsed content.

Actual behavior: Parsed content is appended to the end of existing file.

Reproduces how often: 100%

Versions

Scrapy : 2.6.1 lxml : 4.9.0.0 libxml2 : 2.9.14 cssselect : 1.1.0 parsel : 1.6.0 w3lib : 1.22.0 Twisted : 22.4.0 Python : 3.10.4 (main, Jun 4 2022, 14:29:37) [GCC 9.4.0] pyOpenSSL : 22.0.0 (OpenSSL 3.0.3 3 May 2022) cryptography : 37.0.2 Platform : Linux-5.13.0-44-generic-x86_64-with-glibc2.31

Additional context

If I use scrapy crawl -O <filename>.jl <spider_name> or scrapy crawl -O <filename>.jsonlines <spider_name>, file is overwritten successfully, but it seems that the case above is expected to have the same behaviour.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
labdmitriycommented, Jun 7, 2022

Thanks a lot for your feedback, Sorry I really didn’t notice this warning because of a lot of log messages.

Your approach is working, thank you!

See the documentation of the -o and -O options for more information.

I tried to find where this syntax with colon is mentioned in documentation but didn’t find any information about it.

0reactions
MagnusOffermannscommented, Aug 29, 2022

I have addressed this issue in PR #5605

Read more comments on GitHub >

github_iconTop Results From Across the Web

Jquery - how to make sure it's not overwritten - Stack Overflow
I'm working on a Chrome extension. I'm injecting a script into pages, and injecting jquery as well to be usable in my script....
Read more >
Troubleshoot data feed errors in your catalog - Facebook
Learn how to fix common errors with your data feed that may prevent items from uploading to your catalog in Commerce Manager.
Read more >
Overwrite Rows in Data Extension using Script Activity and SSJS
I think you are missing the closed } in your Rows.Update line before the first comma.
Read more >
Overwrite customizations during an upgrade - ServiceNow Docs
Open the customized object (for example, the ArrayUtil script include). Right-click the header and select Show Latest Update. Configure the form ...
Read more >
User and Workspace Settings - Visual Studio Code
To open the Settings editor, use the following VS Code menu command: ... Note: VS Code extensions can also add their own custom...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found