question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Downloader reports success, but the result appears to be incomplete

See original GitHub issue

I’m using haveibeenpwned-downloader for the first time and it does not appear to be behaving correctly.

When running the downloader with only the destination file and the overwrite flag (see log below), it first starts as expected. After fluctuating a bit, the estimated time remaining settles in the 90-120 minute range. For several minutes, everything seems normal — the percentage ticks up, the time remaining ticks down, and while there are occasional failed attempts, I’ve never seen one with a number other than 1.

However, the process must run into a problem at some point, because coming back to it after some time, it reports that it has “finished downloading all hash ranges” and the estimated time remaining is zero, but the percentage complete is in the 10-15% range, the time taken is 20-25 minutes, and the output file is much smaller than expected (~5GB rather than 35-40GB) and appears to be truncated.

I have been through this process a few times and the result is always be similar. The terminal output from my latest attempt is below.

VersionNote
Windows10.0.19045.2546"Windows 10 22H2"
dotnet6.0.405
haveibeenpwned-downloader0.2.7installed by running dotnet tool install --global haveibeenpwned-downloader
Terminal output of haveibeenpwned-downloader session
PS G:\> haveibeenpwned-downloader .\hibp-pwned-passwords-sha1 --overwrite
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/01496. Response contained HTTP Status code ServiceUnavailable.
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/0461D. Response contained HTTP Status code ServiceUnavailable.
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/073D2. Response contained HTTP Status code ServiceUnavailable.
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/119FF. Response contained HTTP Status code ServiceUnavailable.
Failed attempt #1 fetching https://api.pwnedpasswords.com/range/13AB1. Response contained HTTP Status code ServiceUnavailable.

Hash ranges downloaded ----------------------------------------  13% 00:00:00

Finished downloading all hash ranges in 1,358,036ms (103.12 hashes per second).
We made 140,168 Cloudflare requests (avg response time: 100.99ms). Of those, Cloudflare had already cached 139,539 requests,
and made 629 requests to the Have I Been Pwned origin server.
PS G:\> Get-ChildItem *.txt


    Directory: G:\


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         1/23/2023   9:40 AM     5012458890 hibp-pwned-passwords-sha1.txt
-a----         12/2/2021   1:04 AM    37342268646 pwned-passwords-sha1-ordered-by-hash-v8.txt


PS G:\> Get-Content .\hibp-pwned-passwords-sha1.txt -Tail 10
22306FBAF0C21F1D6137F13286E895A0116351E4:1
22306FC0DAD8C4EE00D26E26551ABB8336D6E76F:6
22306FC3ED89262785AAAB6126496AD390234D18:1
22306FCCB66A3CDD8DCF6BD81294FAF4427684EC:3
22306FCDB739B36ECA4FAE9F55134123EB4C84B4:10
22306FEF43B51A26138BD7A23D96289E35C3754D:4
22306FF4A933CE31819AAD29AA32E443C33C3CF5:1
22306FF689DB0ED32BB0F43E3E9CB37F9E3F78E2:3
22306FFA61DE34B22C1AE78DDBC94AFF238D6EEB:2
22306FFCE5F392AB8778DAD45DAF07311525D28D:2
PS G:\> cmd /c ver

Microsoft Windows [Version 10.0.19045.2546]
PS G:\> dotnet --version
6.0.405
PS G:\> dotnet tool list --global
Package Id                     Version      Commands
---------------------------------------------------------------------
haveibeenpwned-downloader      0.2.7        haveibeenpwned-downloader

Issue Analytics

  • State:closed
  • Created 8 months ago
  • Reactions:1
  • Comments:12 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
henricjcommented, Feb 10, 2023

a per-chunk x-hibp-digest: sha256=XXXX=

With a little bookkeeping, a new header isn’t even necessary. CloudFlare appears to fully support the Last-Modified response header and If-Modified-Since request header, meaning that if you track when each chunk was last modified (or even if you can simply trust the filesystem’s modification time, for individual files), you can get CF to respond with 304s rather than full the full contents.

That would allow for incremental updates, but with possible corporate proxies, inconvenient power outages, and who-knows-what nonsense between the client and the server, how can one even confirm that the whole chunk was downloaded? The individual pieces are small, but all together it is a bit of data and when dealing with non-trivial data sets, expecting deterministic behavior from complex systems is optimistic (not that a couple of gigabytes is huge). There’s good reason for using ECC memory and for ZFS’ checksums on everything. Providing a header with a known checksum algorithm (be that a custom one, an ETag, or what-not) or at the very least a length would make a download client a whole lot more robust, particularly when restarting an interrupted transfer. To put it another way, if there is only a one-in-a-billion chance of something wonky happening during a chunk transfer, then that means a one-in-a-thousand chance when transferring a million chunks.

BTW, I think I ran into the interrupted download problem when mucking around with the client code and I think it was caused by the retries only applying to the start of the transfer, not when the .ResponseMessage.Content is read. The HttpCompletionOption.ResponseHeadersRead passed to HttpClient.SendAsync() lets that bit complete before the content is read. (So, why not just use HttpClient.GetAsync() instead?) IIRC, when I saw it, the problem popped up when the server closed the connection between when the .SendAsync() completed and when reading the contents began.

2reactions
DM-Franciscommented, Mar 6, 2023

I ran into the same issue, the download only ran up to 32% (first run) and 73% (second run). I tried the suggestion a bit higher up about removing the HttpCompletionOption.ResponseHeadersRead option and replacing the call with HttpClient.GetAsync() in a local copy of the source. That seemed to work and it fully downloaded all the hashes. It appears like there was one of those content errors caught along the way: image

I’ll create a PR with the change.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Editor Download Failed: Incomplete or corrupted ...
It seems to happen when I'm trying to download Unity with many components at the same ... So turned off all but Editor...
Read more >
download manager sometimes thinks that incomplete ...
I have the download manager enabled (to appear). I frequently download a number of files at once. Often one or two of my...
Read more >
Prefetch reports success but download is incomplete. #565
I'm trying to download data from dbGAP using an ngc file and a krt file. I'm using the latest sratoolkit version on CentOS...
Read more >
Fix file download errors - Google Chrome Help
This error means that there's not enough space on your computer to download the file. To fix the error: Delete some files from...
Read more >
How to resolve Acrobat Reader download issues
Often, the easiest way to resolve an unsuccessful download is to try the download again using a different browser. Try any of the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found