`RarArchiveLoader` fails for nested rar archives
See original GitHub issue🐛 Describe the bug
For this example, I’m using this archive.
For a single archive, RarArchiveLoader
works fine:
import pathlib
from torchdata.datapipes.iter import RarArchiveLoader, IterableWrapper, FileOpener
dp = IterableWrapper(["hmdb51_org.rar"])
dp = FileOpener(dp, mode="rb")
dp = RarArchiveLoader(dp)
print([pathlib.Path(data[0]).name for data in dp])
['shoot_gun.rar', 'sit.rar', 'situp.rar', 'smile.rar', 'smoke.rar', 'somersault.rar', 'stand.rar', 'swing_baseball.rar', 'sword.rar', 'sword_exercise.rar', 'talk.rar', 'throw.rar', 'turn.rar', 'walk.rar', 'wave.rar', 'brush_hair.rar', 'cartwheel.rar', 'catch.rar', 'chew.rar', 'clap.rar', 'climb.rar', 'climb_stairs.rar', 'dive.rar', 'draw_sword.rar', 'dribble.rar', 'drink.rar', 'eat.rar', 'fall_floor.rar', 'fencing.rar', 'flic_flac.rar', 'golf.rar', 'handstand.rar', 'hit.rar', 'hug.rar', 'jump.rar', 'kick.rar', 'kick_ball.rar', 'kiss.rar', 'laugh.rar', 'pick.rar', 'pour.rar', 'pullup.rar', 'punch.rar', 'push.rar', 'pushup.rar', 'ride_bike.rar', 'ride_horse.rar', 'run.rar', 'shake_hands.rar', 'shoot_ball.rar', 'shoot_bow.rar']
As you can see from the output, the archive is actually a rar of rars. Thus, we need to use RarArchiveLoader
again. Trying to iterate now leads to
dp = RarArchiveLoader(dp)
next(iter(dp))
ValueError: whence value 7 unsupported
pointing to
@ejguan could you have a look?
Versions
Current main
.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
How to unrar nested RAR files? - Unix & Linux Stack Exchange
Use the shell (or find) to find the rar archives. Unpack the first set in a new directory, then unpack any that are...
Read more >RarArchiveLoader — TorchData main documentation - PyTorch
The nested RAR archive is not supported by this DataPipe due to the limitation of the archive type. Please extract outer RAR archive...
Read more >Unrar .rar files in .rar archives - Page 3 - NZBGet Forum
I found that the number of downloads that failed because of nested .rar files without the script was actually much lower than the...
Read more >unrar nested folder in ubuntu strange behaviour
I have some folders and files, I created a rar file from them and copy them to my linux machine using ftp then...
Read more >BazarBackdoor sneaks in through nested RAR and ZIP archives
Security researchers caught a new phishing campaign that tried to deliver the BazarBackdoor malware by using the multi-compression technique ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Even better, thanks!
@pmeier I have tried to resolve it by adding another wrapper to
parse
. But, it turns outrarfile
seems not feasible to read nested rar file.I tried the following case to read files in sequence, which should not need our patch, but it simply doesn’t work at all
This would raise Error.
This issue may not be feasible to fix. How about extracting the outer rar and read from inner one?