download can silently fail?
See original GitHub issueAs reported by @yuhaia on MacOS Catalina 10.15.7 with Python 3.7 for macavaney:anserini-trec-robust04
’s anserini index download. https://github.com/allenai/ir_datasets/issues/52#issuecomment-814020499
The best place to start would be to simulate a failed download on the same OS, python version, and dataset and see if I can reproduce it. If not, I’m not sure what to try next.
@yuhaia-- if there are any more details on this that could help me get to the bottom of this problem, please let me know!
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (7 by maintainers)
Top Results From Across the Web
Open with on file download silently fails | Firefox Support Forum
It does appear in the downloads list but it is marked as "File moved or missing". Looking a little deeper, I found I...
Read more >Wget is silent, but it displays error messages - Super User
I want to download a file with Wget, but per the usual UNIX philosophy, I don't want it to output anything if the...
Read more >Downloads fail silently if the default download location is deleted
After changing the file download location to 'Ask for each download' the download succeeded. Now I could set to 'Downloads' and it still...
Read more >File Download with Error Handling - Atlantbh Sarajevo
How to detect download errors and notify the user when they happen? Here are a few methods with their advantages and disadvantages.
Read more >Why does my Curl command fail to download a file most of the ...
HTTP error 524 means that the server was able to complete a TCP connection to the server, ... Use -s/--silent to make curl...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The idea is to not create the docstore unless it’s necessary to reduce storage overhead. For a simple, in-sequence iteration over a corpus (a really common operation, e.g., for indexing), a docstore usually isn’t needed*. But in most datasets, you cannot efficiently jump ahead, so a docstore (containing document index offset info) is built when the user slices the iterator. In the case you show, it isn’t strictly necessary since the slice doesn’t jump ahead and has no stride. But right now, it doesn’t distinguish different slicing behaviours to conditionally trigger the creation of a docstore.
* There are exceptions to this rule, particularly when iteration cannot be done efficiently. This is a decision that I’ve been making on a case-by-case basis. I’m totally open to changing the behaviour for robust04, given that it’s a bit expensive to parse the corpus and the docstore doesn’t add very much storage overhead (~1.2GB).
There are probably other optimisations I could make to speed up the parsing of robust04 too, though, that would avoid the docstore overhead for simple iterations. The gzip-encoded version iterates about twice as fast as the
.z
-encoded files. But it’s still not super efficient, mostly because it’s using bs4 to handle xml-like tags and such. I’ve been working on some improvements for HTML parsing (#64), and that should be applicable here as well.The newest fix does indeed work!