Downloader reports success, but the result appears to be incomplete
See original GitHub issueI’m using haveibeenpwned-downloader for the first time and it does not appear to be behaving correctly.
When running the downloader with only the destination file and the overwrite flag (see log below), it first starts as expected. After fluctuating a bit, the estimated time remaining settles in the 90-120 minute range. For several minutes, everything seems normal — the percentage ticks up, the time remaining ticks down, and while there are occasional failed attempts, I’ve never seen one with a number other than 1.
However, the process must run into a problem at some point, because coming back to it after some time, it reports that it has “finished downloading all hash ranges” and the estimated time remaining is zero, but the percentage complete is in the 10-15% range, the time taken is 20-25 minutes, and the output file is much smaller than expected (~5GB rather than 35-40GB) and appears to be truncated.
I have been through this process a few times and the result is always be similar. The terminal output from my latest attempt is below.
| Version | Note | |
|---|---|---|
| Windows | 10.0.19045.2546 | "Windows 10 22H2" |
dotnet | 6.0.405 | |
haveibeenpwned-downloader | 0.2.7 | installed by running dotnet tool install --global haveibeenpwned-downloader |
Terminal output of haveibeenpwned-downloader session
PS G:\> haveibeenpwned-downloader .\hibp-pwned-passwords-sha1 --overwrite
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/01496. Response contained HTTP Status code ServiceUnavailable.
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/0461D. Response contained HTTP Status code ServiceUnavailable.
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/073D2. Response contained HTTP Status code ServiceUnavailable.
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/119FF. Response contained HTTP Status code ServiceUnavailable.
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/13AB1. Response contained HTTP Status code ServiceUnavailable.
Hash ranges downloaded ---------------------------------------- 13% 00:00:00
Finished downloading all hash ranges in 1,358,036ms (103.12 hashes per second).
We made 140,168 Cloudflare requests (avg response time: 100.99ms). Of those, Cloudflare had already cached 139,539 requests,
and made 629 requests to the Have I Been Pwned origin server.
PS G:\> Get-ChildItem *.txt
Directory: G:\
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 1/23/2023 9:40 AM 5012458890 hibp-pwned-passwords-sha1.txt
-a---- 12/2/2021 1:04 AM 37342268646 pwned-passwords-sha1-ordered-by-hash-v8.txt
PS G:\> Get-Content .\hibp-pwned-passwords-sha1.txt -Tail 10
22306FBAF0C21F1D6137F13286E895A0116351E4:1
22306FC0DAD8C4EE00D26E26551ABB8336D6E76F:6
22306FC3ED89262785AAAB6126496AD390234D18:1
22306FCCB66A3CDD8DCF6BD81294FAF4427684EC:3
22306FCDB739B36ECA4FAE9F55134123EB4C84B4:10
22306FEF43B51A26138BD7A23D96289E35C3754D:4
22306FF4A933CE31819AAD29AA32E443C33C3CF5:1
22306FF689DB0ED32BB0F43E3E9CB37F9E3F78E2:3
22306FFA61DE34B22C1AE78DDBC94AFF238D6EEB:2
22306FFCE5F392AB8778DAD45DAF07311525D28D:2
PS G:\> cmd /c ver
Microsoft Windows [Version 10.0.19045.2546]
PS G:\> dotnet --version
6.0.405
PS G:\> dotnet tool list --global
Package Id Version Commands
---------------------------------------------------------------------
haveibeenpwned-downloader 0.2.7 haveibeenpwned-downloader
Issue Analytics
- State:
- Created 8 months ago
- Reactions:1
- Comments:12 (4 by maintainers)

Top Related StackOverflow Question
That would allow for incremental updates, but with possible corporate proxies, inconvenient power outages, and who-knows-what nonsense between the client and the server, how can one even confirm that the whole chunk was downloaded? The individual pieces are small, but all together it is a bit of data and when dealing with non-trivial data sets, expecting deterministic behavior from complex systems is optimistic (not that a couple of gigabytes is huge). There’s good reason for using ECC memory and for ZFS’ checksums on everything. Providing a header with a known checksum algorithm (be that a custom one, an ETag, or what-not) or at the very least a length would make a download client a whole lot more robust, particularly when restarting an interrupted transfer. To put it another way, if there is only a one-in-a-billion chance of something wonky happening during a chunk transfer, then that means a one-in-a-thousand chance when transferring a million chunks.
BTW, I think I ran into the interrupted download problem when mucking around with the client code and I think it was caused by the retries only applying to the start of the transfer, not when the
.ResponseMessage.Contentis read. TheHttpCompletionOption.ResponseHeadersReadpassed toHttpClient.SendAsync()lets that bit complete before the content is read. (So, why not just useHttpClient.GetAsync()instead?) IIRC, when I saw it, the problem popped up when the server closed the connection between when the.SendAsync()completed and when reading the contents began.I ran into the same issue, the download only ran up to 32% (first run) and 73% (second run). I tried the suggestion a bit higher up about removing the
HttpCompletionOption.ResponseHeadersReadoption and replacing the call withHttpClient.GetAsync()in a local copy of the source. That seemed to work and it fully downloaded all the hashes. It appears like there was one of those content errors caught along the way:I’ll create a PR with the change.