CheerioCrawler overwrites request cookies when persistCookiesPerSession is false
See original GitHub issueDescribe the bug
In Apify 0.2.*, it was possible to use SessionPools
with CheerioCrawler but manage request cookie setting manually by toggling persistCookiesPerSession
to false
.
A change to _requestFunction()
in Apify 2.* (and I think Apify 1.*, but we skipped that version) has made this impossible. The issue is caused by this line: https://github.com/apify/apify-js/blob/9418dde95cbb1e7bc125e7ad533f55535f8359c5/src/crawlers/cheerio_crawler.js#L649 (only this.useSessionPool
is checked whereas previously this.persistCookiesPerSession
was checked).
This means that if CheerioCrawler is configured to use a SessionPool
(e.g. for use with proxies) and persistCookiesPerSession
is false
, any cookies set via a preNavigationHook
(or prepareRequestFunction()
in earlier Apify versions) are overwritten.
To Reproduce
Configure CheerioCrawler to use a SessionPool and not persist cookies per session. Try setting cookies on the request via prepareRequestFunction()
(or a pre navigation hook) and notice they get overwritten.
Expected behavior Same behaviour as in Apify 0.2.* (i.e. manual cookie setting is possible in conjunction with a session pool). Or at least a migration path to preserve this ability with later versions of Apify SDK.
System information:
- OS: Ubuntu 20
- Node.js version: 16
- Apify SDK version: 2.0.7
Issue Analytics
- State:
- Created 2 years ago
- Comments:18 (17 by maintainers)
@corford thanks for the API improvement suggestions. I can’t say that it will be soon because we’re working on other things now, but we want to get back to the drawing board with
SessionPool
and improve the API significantly based on our experience so far. All suggestions are very welcome.I see, then it probably makes sense to do what @szmarczak suggested and change the line to something like:
That will help with your particular issue and should not break anything else. We can iterate on this later if we see a reproduction that would require merging of cookies, but I’d rather not implement something that is almost impossible to reproduce (without hacks).