[BUG] After scraping around 800 hashtags Instamancer reloads the browser
See original GitHub issueDescribe the bug When scraping for hashtag’s, recently it seem’s to fail after scraping around ~800 (this is fairly consistent). When reaching around 800 Instamancer restarts the browser and tries again from scratch.
It seems to be related to this line of code: https://github.com/ScriptSmith/instamancer/blob/07e664ea6b144f6d304c4c2cc2f7e957f53fa4f7/src/api/instagram.ts#L419
Specifically the this.start()
method which causes the browser to reload.
And by looking at the network logs in chrome I can see that one of the graphql requests returns an error around the 800 post mark. Every other request after this one seems to work ok.
To Reproduce Search for any hashtag, and make sure the limit is higher than 800.
Setup (please complete the following information):
- OS: [e.g. MacOS Catalina]
- Instamancer version [e.g. v3.0.1]
I will add more info here as I debug the issue further.
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Sorry @ScriptSmith I have not had a chance to try it. I will close this issue for now and reopen if I can get any further info.
In my initial attempts to reproduce this, I am able to gather 1000 posts from a hashtag.
The restarting process you describe is what I call grafting, which allows instamancer to perform long scraping jobs by restarting the browser in order to limit resource usage. You can read about it on the website
and in the FAQ
This bug could be because when instamancer attempts to perform a graft by swapping request parameters on the fly after being restarted, something is going wrong.
So, a few questions:
-g=false
?