question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`RarArchiveLoader` fails for nested rar archives

See original GitHub issue

🐛 Describe the bug

For this example, I’m using this archive.


For a single archive, RarArchiveLoader works fine:

import pathlib

from torchdata.datapipes.iter import RarArchiveLoader, IterableWrapper, FileOpener

dp = IterableWrapper(["hmdb51_org.rar"])
dp = FileOpener(dp, mode="rb")
dp = RarArchiveLoader(dp)
print([pathlib.Path(data[0]).name for data in dp])
['shoot_gun.rar', 'sit.rar', 'situp.rar', 'smile.rar', 'smoke.rar', 'somersault.rar', 'stand.rar', 'swing_baseball.rar', 'sword.rar', 'sword_exercise.rar', 'talk.rar', 'throw.rar', 'turn.rar', 'walk.rar', 'wave.rar', 'brush_hair.rar', 'cartwheel.rar', 'catch.rar', 'chew.rar', 'clap.rar', 'climb.rar', 'climb_stairs.rar', 'dive.rar', 'draw_sword.rar', 'dribble.rar', 'drink.rar', 'eat.rar', 'fall_floor.rar', 'fencing.rar', 'flic_flac.rar', 'golf.rar', 'handstand.rar', 'hit.rar', 'hug.rar', 'jump.rar', 'kick.rar', 'kick_ball.rar', 'kiss.rar', 'laugh.rar', 'pick.rar', 'pour.rar', 'pullup.rar', 'punch.rar', 'push.rar', 'pushup.rar', 'ride_bike.rar', 'ride_horse.rar', 'run.rar', 'shake_hands.rar', 'shoot_ball.rar', 'shoot_bow.rar']

As you can see from the output, the archive is actually a rar of rars. Thus, we need to use RarArchiveLoader again. Trying to iterate now leads to

dp = RarArchiveLoader(dp)
next(iter(dp))
ValueError: whence value 7 unsupported

pointing to

https://github.com/pytorch/data/blob/fb29c810e6e8881e5a3558743d9573cb82134cd6/torchdata/datapipes/iter/util/rar_archive_loader.py#L22

@ejguan could you have a look?

Versions

Current main.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
pmeiercommented, Feb 4, 2022

Even better, thanks!

1reaction
ejguancommented, Feb 2, 2022

@pmeier I have tried to resolve it by adding another wrapper to parse. But, it turns out rarfile seems not feasible to read nested rar file.

I tried the following case to read files in sequence, which should not need our patch, but it simply doesn’t work at all

rar = rarfile.RarFile("/data/home/erjia/data/hmdb51_org.rar")
for info in rar.infolist():
    f = rar.open(inf)
    inn_rar = rarfile.RarFile(f)

This would raise Error.

This issue may not be feasible to fix. How about extracting the outer rar and read from inner one?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to unrar nested RAR files? - Unix & Linux Stack Exchange
Use the shell (or find) to find the rar archives. Unpack the first set in a new directory, then unpack any that are...
Read more >
RarArchiveLoader — TorchData main documentation - PyTorch
The nested RAR archive is not supported by this DataPipe due to the limitation of the archive type. Please extract outer RAR archive...
Read more >
Unrar .rar files in .rar archives - Page 3 - NZBGet Forum
I found that the number of downloads that failed because of nested .rar files without the script was actually much lower than the...
Read more >
unrar nested folder in ubuntu strange behaviour
I have some folders and files, I created a rar file from them and copy them to my linux machine using ftp then...
Read more >
BazarBackdoor sneaks in through nested RAR and ZIP archives
Security researchers caught a new phishing campaign that tried to deliver the BazarBackdoor malware by using the multi-compression technique ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found