Creating index with many recursive TARs inside an xz compressed TAR is 100x slower than bz2!
See original GitHub issueThe lzmaffi module provides seeking support in multi-block xz files as created with pixz, see also #42. And in small unit-like tests, it really does provide true-seeking capabilities. However, I noticed that the test for tests/2k-recursive-tars.tar.xz
is roughly 100x slower compared to tests/2k-recursive-tars.tar.bz2
! This is the only test where this difference is so glaring because it contains recursive TARs and after each recursive TAR, a backwards seek has to be applied in order to resume reading the outer TAR. For some reason lzmaffi.seek
seems to have some performance problems. Even if it might implement true seeking to a block under the hood, there might be some constant overhead cost. The file also has two problems:
- It is highly compressed, with 20MiB compressed to 16kiB for a compression ratio of roughly 1000!
- The xz file only has 3 blocks while the bz2 file has 24 blocks. That might slow down “true” seeking to an arbitrary point by factor 8 compared to bz2. But there still is a factor 12 missing for the observed slowdown! Also, the simple decoding time was found to be twice as fast a bz2, that means there effectively is even a factor 24 that can’t be explained.
Alternatively, to finding the problem in lzmaffi, I could try to reduce seeks for recursive indexing in ratarmount, e.g., by:
- Jumping to the next TAR block after analyzing the recursive TAR, effectively resulting in zero backward seeks. However, tarfile might not have an API allowing me to do this. However, I could make use of StenciledFile again, to force it to support this.
- First analyze the outer TAR and only then mount the recursive TARs in order. This would effectively reduce the backward seeks to the maximum recursion level. A nice side-effect would be that this solution could avoid recursion in ratarmount itself.
Here are some notes and benchmarks I made to try and find the problem:
Seemingly affected tests:
- tests/gnu-sparse-files.tar
- tests/2k-recursive-tars.tar.bz2
Reproduce problem:
bzip2 -kd tests/2k-recursive-tars.tar.bz2
xz -fk tests/2k-recursive-tars.tar
pixz -k tests/2k-recursive-tars.{tar,tpxz}
indexed_bzip2/tools/blockfinder tests/2k-recursive-tars.tar.bz2
Block offsets :
4 B 0 b -> magic bytes: 0x314159265359
590 B 0 b -> magic bytes: 0x314159265359
1205 B 0 b -> magic bytes: 0x314159265359
1796 B 0 b -> magic bytes: 0x314159265359
2360 B 0 b -> magic bytes: 0x314159265359
2897 B 0 b -> magic bytes: 0x314159265359
3441 B 0 b -> magic bytes: 0x314159265359
3997 B 0 b -> magic bytes: 0x314159265359
4545 B 0 b -> magic bytes: 0x314159265359
5169 B 0 b -> magic bytes: 0x314159265359
5757 B 0 b -> magic bytes: 0x314159265359
6313 B 0 b -> magic bytes: 0x314159265359
6863 B 0 b -> magic bytes: 0x314159265359
7441 B 0 b -> magic bytes: 0x314159265359
8034 B 0 b -> magic bytes: 0x314159265359
8584 B 0 b -> magic bytes: 0x314159265359
9127 B 0 b -> magic bytes: 0x314159265359
9688 B 0 b -> magic bytes: 0x314159265359
10299 B 0 b -> magic bytes: 0x314159265359
10834 B 0 b -> magic bytes: 0x314159265359
11395 B 0 b -> magic bytes: 0x314159265359
11963 B 0 b -> magic bytes: 0x314159265359
12624 B 0 b -> magic bytes: 0x314159265359
13174 B 0 b -> magic bytes: 0x314159265359
Found 24 blocks
xz -l tests/2k-recursive-tars.*xz
Strms Blocks Compressed Uncompressed Ratio Check Filename
1 1 15.8 KiB 20.5 MiB 0.001 CRC64 tests/2k-recursive-tars.tar.xz
1 3 20.2 KiB 20.6 MiB 0.001 CRC32 tests/2k-recursive-tars.tpxz
./ratarmount.py -cr tests/2k-recursive-tars.tar.bz2 bibi
Creating offset dictionary for ratarmount/tests/2k-recursive-tars.tar.bz2 ...
Creating new SQLite index database at ratarmount/tests/2k-recursive-tars.tar.bz2.index.sqlite
Creating offset dictionary for mimi/00001.tar ...
Creating offset dictionary for mimi/00001.tar took 0.00s
[...]
Creating offset dictionary for mimi/02000.tar ...
Creating offset dictionary for mimi/02000.tar took 0.00s
Creating offset dictionary for ratarmount/tests/2k-recursive-tars.tar.bz2 took 0.53s
Writing out TAR index to ratarmount/tests/2k-recursive-tars.tar.bz2.index.sqlite took 0s and is sized 589824 B
./ratarmount.py -cr tests/2k-recursive-tars.tar.xz mimi
[Warning] The specified file 'ratarmount/tests/2k-recursive-tars.tar.xz'
[Warning] is compressed using xz but only contains one xz block. This makes it
[Warning] impossible to use true seeking! Please (re)compress your TAR using pixz
[Warning] (see https://github.com/vasi/pixz) in order for ratarmount to do be able
[Warning] to do fast seeking to requested files.
[Warning] As it is, each file access will decompress the whole TAR from the beginning!
Creating offset dictionary for ratarmount/tests/2k-recursive-tars.tar.xz ...
Creating new SQLite index database at ratarmount/tests/2k-recursive-tars.tar.xz.index.sqlite
Creating offset dictionary for mimi/00001.tar ...
Creating offset dictionary for mimi/00001.tar took 0.00s
Creating offset dictionary for mimi/00002.tar ...
Creating offset dictionary for mimi/00002.tar took 0.00s
[...]
Creating offset dictionary for mimi/01999.tar ...
Creating offset dictionary for mimi/01999.tar took 0.00s
Creating offset dictionary for mimi/02000.tar ...
Creating offset dictionary for mimi/02000.tar took 0.00s
Creating offset dictionary for ratarmount/tests/2k-recursive-tars.tar.xz took 104.80s
Writing out TAR index to ratarmount/tests/2k-recursive-tars.tar.xz.index.sqlite took 0s and is sized 589824 B
./ratarmount.py -cr tests/2k-recursive-tars.tpxz pipi
Creating offset dictionary for ratarmount/tests/2k-recursive-tars.tpxz ...
Creating new SQLite index database at ratarmount/tests/2k-recursive-tars.tpxz.index.sqlite
Creating offset dictionary for mimi/00001.tar ...
Creating offset dictionary for mimi/00001.tar took 0.00s
Creating offset dictionary for mimi/00002.tar ...
Creating offset dictionary for mimi/00002.tar took 0.00s
[...]
Creating offset dictionary for mimi/02000.tar ...
Creating offset dictionary for mimi/02000.tar took 0.00s
Creating offset dictionary for ratarmount/tests/2k-recursive-tars.tpxz took 58.66s
Writing out TAR index to ratarmount/tests/2k-recursive-tars.tpxz.index.sqlite took 0s and is sized 589824 B
time python3 -c 'import lzmaffi, sys; print( len( lzmaffi.open( sys.argv[1] ).read() ) );' tests/2k-recursive-tars.tar.xz
21514240
real 0m0.129s
user 0m0.087s
sys 0m0.038s
time python3 -c 'import lzmaffi, sys; print( len( lzmaffi.open( sys.argv[1] ).read() ) );' tests/2k-recursive-tars.tpxz
21560288
real 0m0.109s
user 0m0.086s
sys 0m0.020s
time python3 -c 'import indexed_bzip2, sys; print( len( indexed_bzip2.IndexedBzip2File( sys.argv[1] ).read() ) );' tests/2k-recursive-tars.tar.bz2
21514240
real 0m0.119s
user 0m0.090s
sys 0m0.028s
python3 -m timeit -s 'import lzmaffi' 'lzmaffi.open( "tests/2k-recursive-tars.tar.xz" ).read()'
5 loops, best of 5: 41.5 msec per loop
python3 -m timeit -s 'import lzmaffi' 'lzmaffi.open( "tests/2k-recursive-tars.tpxz" ).read()'
10 loops, best of 5: 32.4 msec per loop
python3 -m timeit -s 'import indexed_bzip2' 'indexed_bzip2.IndexedBzip2File( "tests/2k-recursive-tars.tar.bz2" ).read()'
5 loops, best of 5: 98 msec per loop
-> The xz decoder is actually 2-3x faster than the bz2 decoder!
time cat bibi/mimi/01333.tar/foo
1333
real 0m0.003s
user 0m0.002s
sys 0m0.000s
time cat mimi/mimi/01333.tar/foo
1333
real 0m0.042s
user 0m0.002s
sys 0m0.000s
time cat pipi/mimi/01333.tar/foo
1333
real 0m0.029s
user 0m0.001s
sys 0m0.000s
time cat pipi/mimi/01500.tar/foo
1500
real 0m0.012s
user 0m0.001s
sys 0m0.000s
python3 -m timeit -s 'import io, lzmaffi; f = lzmaffi.open( "tests/2k-recursive-tars.tar.xz" );' 'f.seek( -1, io.SEEK_END ); f.seek( 10*1024*1024 ); f.read( 1 )'
10 loops, best of 5: 34.1 msec per loop
python3 -m timeit -s 'import io, lzmaffi; f = lzmaffi.open( "tests/2k-recursive-tars.tpxz" );' 'f.seek( -1, io.SEEK_END ); f.seek( 10*1024*1024 ); f.read( 1 )'
20 loops, best of 5: 13.7 msec per loop
python3 -m timeit -s 'import indexed_bzip2, io; f = indexed_bzip2.IndexedBzip2File( "tests/2k-recursive-tars.tar.bz2" )' 'f.seek( -1, io.SEEK_END ); f.seek( 10*1024*1024 ); f.read( 1 )'
20 loops, best of 5: 12.7 msec per loop
-
You can actually see the seeking and block boundaries by accessing the files and timing the access
-
Also, reading a file later in the TAR than the last accessed is actually multitudes faster (2ms -> ~10-20x) than reading that same file a second time because on the second time it will have to backward seek a bit!
-
Index Creation: BZ2 (24 Blocks): 0.52s, XZ (1 Block): 105s, XZ (3 Blocks): 58.7s
- There seems to be a multitude of factors making the backend ~100x slower for mounting:
- The recursive mounting requires one backwards seek per recursive TAR
- The xz files have 8x and 24x less blocks, making seeking less efficient
- Decoding is actually roughly twice as fast as bz2!
- The pixz file is generally ~25% faster for some reason. Maybe, a different default compression. => Decoding isn’t the problem. Seeking by itself also does not seem to be the problem. At this point, I’m not sure why it’s not working as fast as bz2
- There seems to be a multitude of factors making the backend ~100x slower for mounting:
Try to find the critical code location with cProfile
diff --git a/ratarmount.py b/ratarmount.py
index b71005d..7a6b5bd 100755
--- a/ratarmount.py
+++ b/ratarmount.py
@@ -1346,6 +1346,9 @@ class SQLiteIndexedTar:
assert False, ( "Could not load or store block offsets for {} probably because adding support was forgotten!"
.format( self.compression ) )
+import cProfile
+import pstats
+
class TarMount( fuse.Operations ):
"""
This class implements the fusepy interface in order to create a mounted file system view
@@ -1384,6 +1387,15 @@ class TarMount( fuse.Operations ):
except:
pass
+ tarFile = pathToMount[0]
+ pfname = 'ratarmount-profile'
+ cProfile.runctx( 'SQLiteIndexedTar( tarFile, writeIndex = True, encoding = self.encoding, **sqliteIndexedTarOptions )',
+ globals(), locals(), pfname )
+ p = pstats.Stats( pfname )
+ p.sort_stats( pstats.SortKey.CUMULATIVE )
+ p.print_stats()
+ sys.exit( 0 )
+
self.mountSources: List[Any] = [
SQLiteIndexedTar( tarFile,
writeIndex = True,
./ratarmount.py -cr tests/2k-recursive-tars.tar.bz2 bibi
Sun Dec 13 14:18:28 2020 ratarmount-profile
686148 function calls (684134 primitive calls) in 0.671 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.671 0.671 {built-in method builtins.exec}
1 0.001 0.001 0.671 0.671 <string>:1(<module>)
1 0.000 0.000 0.670 0.670 ./ratarmount.py:287(__init__)
2001/1 0.038 0.000 0.651 0.651 ./ratarmount.py:600(createIndex)
8005 0.013 0.000 0.358 0.000 /usr/lib/python3.8/tarfile.py:2292(next)
6004 0.007 0.000 0.234 0.000 /usr/lib/python3.8/tarfile.py:1097(fromtarfile)
6003 0.004 0.000 0.203 0.000 /usr/lib/python3.8/tarfile.py:2407(__iter__)
14005 0.007 0.000 0.195 0.000 /usr/lib/python3.8/tarfile.py:516(read)
14005 0.005 0.000 0.187 0.000 /usr/lib/python3.8/tarfile.py:523(_read)
14005 0.017 0.000 0.182 0.000 /usr/lib/python3.8/tarfile.py:550(__read)
2002 0.004 0.000 0.172 0.000 /usr/lib/python3.8/tarfile.py:1552(open)
2002 0.006 0.000 0.166 0.000 /usr/lib/python3.8/tarfile.py:1441(__init__)
4105 0.164 0.000 0.164 0.000 {method 'read' of '_io.BufferedReader' objects}
6004 0.022 0.000 0.116 0.000 /usr/lib/python3.8/tarfile.py:1034(frombuf)
6003 0.106 0.000 0.106 0.000 {method 'seek' of '_io.BufferedReader' objects}
4001 0.004 0.000 0.101 0.000 /usr/lib/python3.8/tarfile.py:503(seek)
2000 0.004 0.000 0.077 0.000 ./ratarmount.py:214(read)
4002 0.006 0.000 0.059 0.000 ./ratarmount.py:957(_setFileInfo)
32024 0.017 0.000 0.039 0.000 /usr/lib/python3.8/tarfile.py:172(nti)
4003 0.010 0.000 0.036 0.000 /usr/lib/python3.8/tarfile.py:221(calc_chksums)
6009 0.032 0.000 0.032 0.000 {method 'execute' of 'sqlite3.Connection' objects}
52039 0.019 0.000 0.032 0.000 /usr/lib/python3.8/tarfile.py:164(nts)
4002 0.008 0.000 0.025 0.000 ./ratarmount.py:931(_tryAddParentFolders)
4004 0.018 0.000 0.018 0.000 {built-in method builtins.print}
8006 0.015 0.000 0.015 0.000 {built-in method builtins.sum}
4003 0.002 0.000 0.014 0.000 /usr/lib/python3.8/tarfile.py:1118(_proc_member)
4003 0.005 0.000 0.012 0.000 /usr/lib/python3.8/tarfile.py:1131(_proc_builtin)
8006 0.011 0.000 0.011 0.000 {built-in method _struct.unpack_from}
1 0.000 0.000 0.011 0.011 ./ratarmount.py:1199(_openCompressedFile)
4004 0.007 0.000 0.011 0.000 /usr/lib/python3.8/posixpath.py:334(normpath)
2000 0.003 0.000 0.009 0.000 ./ratarmount.py:148(__init__)
7 0.009 0.001 0.009 0.001 {method 'executescript' of 'sqlite3.Connection' objects}
2004 0.002 0.000 0.008 0.000 ./ratarmount.py:1018(indexIsLoaded)
4002 0.004 0.000 0.008 0.000 ./ratarmount.py:937(<listcomp>)
4002 0.004 0.000 0.008 0.000 ./ratarmount.py:584(_updateProgressBar)
52039 0.007 0.000 0.007 0.000 {method 'find' of 'bytes' objects}
2251 0.007 0.000 0.007 0.000 {method 'executemany' of 'sqlite3.Connection' objects}
1 0.000 0.000 0.006 0.006 ./ratarmount.py:529(_pathIsWritable)
52041 0.006 0.000 0.006 0.000 {method 'decode' of 'bytes' objects}
1 0.006 0.006 0.006 0.006 {method 'write' of '_io.BufferedWriter' objects}
1 0.000 0.000 0.005 0.005 ./ratarmount.py:1183(_detectTar)
1 0.000 0.000 0.005 0.005 ./ratarmount.py:1153(_detectCompression)
1 0.000 0.000 0.005 0.005 /usr/lib/python3.8/tarfile.py:1643(taropen)
2000 0.003 0.000 0.005 0.000 ./ratarmount.py:242(seek)
[...]
./ratarmount.py -cr tests/2k-recursive-tars.tpxz bibi
Sun Dec 13 14:20:01 2020 ratarmount-profile
4455897 function calls (4453893 primitive calls) in 52.952 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 52.952 52.952 {built-in method builtins.exec}
1 0.000 0.000 52.952 52.952 <string>:1(<module>)
1 0.000 0.000 52.952 52.952 ./ratarmount.py:287(__init__)
2001/1 0.126 0.000 52.913 52.913 ./ratarmount.py:600(createIndex)
!!! -> 6001 0.025 0.000 51.823 0.009 ~/.local/lib/python3.8/site-packages/lzmaffi/__init__.py:482(seek)
10104 1.746 0.000 49.294 0.005 ~/.local/lib/python3.8/site-packages/lzmaffi/__init__.py:399(_read_block)
9091 0.024 0.000 47.518 0.005 ~/.local/lib/python3.8/site-packages/lzmaffi/__init__.py:453(_fill_buffer)
4991 0.027 0.000 47.430 0.010 ~/.local/lib/python3.8/site-packages/lzmaffi/_lzmamodule2.py:711(decompress)
4991 10.804 0.002 47.403 0.009 ~/.local/lib/python3.8/site-packages/lzmaffi/_lzmamodule2.py:727(_decompress)
309720 0.121 0.000 31.368 0.000 ~/.local/lib/python3.8/site-packages/lzmaffi/_lzmamodule2.py:346(catch_lzma_error)
297999 31.215 0.000 31.215 0.000 {built-in method _compiled_module.lzma_code}
293008 4.770 0.000 4.770 0.000 {built-in method _compiled_module.realloc}
3905 0.010 0.000 2.593 0.001 ~/.local/lib/python3.8/site-packages/lzmaffi/__init__.py:356(_move_to_block)
3905 2.294 0.001 2.490 0.001 ~/.local/lib/python3.8/site-packages/lzmaffi/__init__.py:343(_init_decompressor)
595998 0.263 0.000 0.461 0.000 ~/.local/lib/python3.8/site-packages/cffi/api.py:293(cast)
8005 0.032 0.000 0.452 0.000 /usr/lib/python3.8/tarfile.py:2292(next)
6004 0.013 0.000 0.338 0.000 /usr/lib/python3.8/tarfile.py:1097(fromtarfile)
2002 0.014 0.000 0.257 0.000 /usr/lib/python3.8/tarfile.py:1552(open)
6003 0.008 0.000 0.246 0.000 /usr/lib/python3.8/tarfile.py:2407(__iter__)
2002 0.017 0.000 0.233 0.000 /usr/lib/python3.8/tarfile.py:1441(__init__)
4002 0.023 0.000 0.222 0.000 ./ratarmount.py:957(_setFileInfo)
6004 0.041 0.000 0.191 0.000 /usr/lib/python3.8/tarfile.py:1034(frombuf)
6006 0.167 0.000 0.167 0.000 {method 'execute' of 'sqlite3.Connection' objects}
14005 0.010 0.000 0.148 0.000 /usr/lib/python3.8/tarfile.py:516(read)
3905 0.050 0.000 0.140 0.000 ~/.local/lib/python3.8/site-packages/lzmaffi/_lzmamodule2.py:656(__init__)
14005 0.009 0.000 0.137 0.000 /usr/lib/python3.8/tarfile.py:523(_read)
14005 0.026 0.000 0.127 0.000 /usr/lib/python3.8/tarfile.py:550(__read)
4103 0.007 0.000 0.110 0.000 ~/.local/lib/python3.8/site-packages/lzmaffi/__init__.py:367(read)
620531 0.096 0.000 0.096 0.000 ~/.local/lib/python3.8/site-packages/cffi/api.py:180(_typeof)
12814 0.093 0.000 0.093 0.000 {method 'read' of '_io.BufferedReader' objects}
3905 0.023 0.000 0.093 0.000 ~/.local/lib/python3.8/site-packages/lzmaffi/_lzmamodule2.py:549(find)
595998 0.082 0.000 0.082 0.000 {built-in method _cffi_backend.cast}
4001 0.008 0.000 0.068 0.000 /usr/lib/python3.8/tarfile.py:503(seek)
4002 0.025 0.000 0.063 0.000 ./ratarmount.py:931(_tryAddParentFolders)
32024 0.028 0.000 0.061 0.000 /usr/lib/python3.8/tarfile.py:172(nti)
24533 0.021 0.000 0.060 0.000 ~/.local/lib/python3.8/site-packages/cffi/api.py:242(new)
4007 0.060 0.000 0.060 0.000 {built-in method builtins.print}
4003 0.016 0.000 0.052 0.000 /usr/lib/python3.8/tarfile.py:221(calc_chksums)
2004 0.005 0.000 0.049 0.000 ./ratarmount.py:1018(indexIsLoaded)
2000 0.014 0.000 0.046 0.000 ./ratarmount.py:148(__init__)
639627 0.045 0.000 0.045 0.000 {built-in method builtins.isinstance}
4002 0.017 0.000 0.044 0.000 ./ratarmount.py:584(_updateProgressBar)
52039 0.022 0.000 0.043 0.000 /usr/lib/python3.8/tarfile.py:164(nts)
2000 0.007 0.000 0.041 0.000 ./ratarmount.py:214(read)
7816 0.018 0.000 0.040 0.000 ~/.local/lib/python3.8/site-packages/lzmaffi/_lzmamodule2.py:575(__init__)
4003 0.006 0.000 0.039 0.000 /usr/lib/python3.8/tarfile.py:1118(_proc_member)
3915 0.007 0.000 0.038 0.000 ~/.local/lib/python3.8/site-packages/lzmaffi/__init__.py:287(_peek)
4003 0.010 0.000 0.033 0.000 /usr/lib/python3.8/tarfile.py:1131(_proc_builtin)
1 0.000 0.000 0.030 0.030 ./ratarmount.py:1199(_openCompressedFile)
2000 0.008 0.000 0.028 0.000 ./ratarmount.py:242(seek)
3905 0.008 0.000 0.025 0.000 ~/.local/lib/python3.8/site-packages/lzmaffi/_lzmamodule2.py:296(_new_lzma_stream)
24533 0.024 0.000 0.024 0.000 {built-in method _cffi_backend.newp}
9091 0.008 0.000 0.023 0.000 ~/.local/lib/python3.8/site-packages/lzmaffi/__init__.py:41(memoryview_tobytes)
4004 0.014 0.000 0.021 0.000 /usr/lib/python3.8/posixpath.py:334(normpath)
8006 0.020 0.000 0.020 0.000 {built-in method _struct.unpack_from}
3905 0.020 0.000 0.020 0.000 {built-in method _compiled_module.lzma_block_decoder}
4002 0.011 0.000 0.017 0.000 ./ratarmount.py:937(<listcomp>)
8006 0.017 0.000 0.017 0.000 {built-in method builtins.sum}
4003 0.013 0.000 0.017 0.000 /usr/lib/python3.8/tarfile.py:1335(_apply_pax_info)
2250 0.016 0.000 0.016 0.000 {method 'executemany' of 'sqlite3.Connection' objects}
7838 0.016 0.000 0.016 0.000 {method 'seek' of '_io.BufferedReader' objects}
1 0.000 0.000 0.015 0.015 ./ratarmount.py:1153(_detectCompression)
1 0.000 0.000 0.015 0.015 ./ratarmount.py:1183(_detectTar)
4003 0.014 0.000 0.014 0.000 /usr/lib/python3.8/tarfile.py:747(__init__)
14009 0.014 0.000 0.014 0.000 {method 'join' of 'str' objects}
1 0.000 0.000 0.014 0.014 /usr/lib/python3.8/tarfile.py:1643(taropen)
[...]
=> Looks like the lzmaffi seek function is indeed problematic!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (4 by maintainers)
Top GitHub Comments
I’ll close this for now because of two reasons:
I am not 100% sure, but I think the issue comes from the following:
r|
reads are buffered, and so may read more than strictly necessaryI really don’t know. One thing I can think of is that python-xz is only modifying the wanted position when you seek, and only do seek (and potentially perform intensive operations) when you start reading. This way, if you seek multiple times before reading, there is no overhead. I’m not sure if that is the cause though.
As far as tests go, I have 100% unittest coverage, plus integration coverage testing with xz files in as many different configurations as I could think of (number of streams, stream padding, number of blocks, size of blocks, etc.). Tests are run against all officially supported Python versions (plus PyPy). I’m not saying that there are no bugs of course, but at least this gives reasonable confidence that it should work as expected.
For the API stability, I’m mirroring the
lzma
module withxz.open
andxz.XZFile
, so it should not change at all. In the worst case, if anything is breaking backward compatibility, it will be thoroughly documented in the changelog.The main change I’m planning to do on the library is to add write support which is completely missing as of now. This should not impact ratarmount’s usecase in any way.
With all of that being said, it is a very young library, and it is not battle-tested yet.
In any case, if you find any issues please report them! As you saw I’m not afraid to dive in to get a better understanding of problems and ultimately find possible solutions.