Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug: If a TemporarilyBanned error happens right away, the output .json file gets deleted

See original GitHub issue

Bug Report Summary

If you run the script generating a .json output once, it works great until it gets TemporarilyBanned; then it quits as expected. If you run it again immediately, the output json file gets deleted/overwritten with a 2-byte string.

Repo

Run the script, like python3 -m facebook_scraper --filename output.json --pages 1000 --cookies cookiefile.txt -fmt json --use-youtube-dl --source --resume-file resume_file.txt --dump ./html_dump_folder --group 123123123
Wait until it gets a TemporarilyBanned error.
See that the output.json file contains a valid, large json file.
Immediately re-run the script as in Step 1.
See that it gets the same TemporarilyBanned error almost immediately (before loading a single page successfully).
See that output.json is now basically empty, containing only 2 bytes of invalid json content.

What I expect to happen

output.json should not be overwritten with garbage if a TemporarilyBanned error happens right away. Instead, the script should simply exit right away without changing the ouput.json file at all.

Notes

The script works as expected (adds onto the output.json file) when 1 or more pages are successfully loaded in subsequent runs (i.e., repo Step #4).

Issue Analytics

State:
Created a year ago
Comments:5

Top GitHub Comments

1reaction

neon-ninjacommented, Aug 17, 2022

The resume file just contains a URL to start scraping from. There’s no dependency between the resume file and the output file - a user could input their desired URL to start from, and start collecting posts, without having any prior output file.

0reactions

DeflateAwningcommented, Aug 16, 2022

Would be nice if it handled that concatenation itself, considering it accepts a --resume-file argument (which appears to be wholly misleading).