Error reading xlsx file (MIN_INFLATE_RATIO exceeded)
See original GitHub issueHello, I read some xlsx files in a s3 bucket with spark 2.4.4 and crealytics 2.11:0.13.1 artefact One of them raises an error despite of correct content (tested with Libreoffice)
It seems to reach a compression limit and Spark invites to change a ziplib parameter
Is there a way to change this ZipSecureFile.setMinInflateRatio() through the artefact ?
adv2 = spark.read.format("com.crealytics.spark.excel").option(
"header", "true").option("inferSchema", "true").load("test.xlsx")
Error message :
Traceback (most recent call last):
File "/tmp/aws-adv-daily.py", line 304, in main
"header", "true").option("inferSchema", "true").load("test.xlsx")
File "/usr/spark-2.4.4/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 166, in load
return self._df(self._jreader.load(path))
File "/usr/spark-2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/spark-2.4.4/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/spark-2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o37.load.
: java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data.
This may indicate that the file is used to inflate memory usage and thus could pose a security risk.
You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit.
Uncompressed size: 106496, Raw/compressed size: 859, ratio: 0.008066
Limits: MIN_INFLATE_RATIO: 0.010000, Entry: xl/styles.xml
at shadeio.poi.openxml4j.util.ZipArchiveThresholdInputStream.checkThreshold(ZipArchiveThresholdInputStream.java:131)
at shadeio.poi.openxml4j.util.ZipArchiveThresholdInputStream.read(ZipArchiveThresholdInputStream.java:81)
at shadeio.poi.util.IOUtils.toByteArray(IOUtils.java:152)
at shadeio.poi.util.IOUtils.toByteArray(IOUtils.java:121)
at shadeio.poi.util.IOUtils.toByteArray(IOUtils.java:108)
at shadeio.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47)
at shadeio.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53)
at shadeio.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:106)
at shadeio.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:301)
at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:134)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at shadeio.poi.ss.usermodel.WorkbookFactory.createWorkbook(WorkbookFactory.java:339)
at shadeio.poi.ss.usermodel.WorkbookFactory.createXSSFWorkbook(WorkbookFactory.java:314)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:232)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:198)
at com.crealytics.spark.excel.DefaultWorkbookReader$$anonfun$openWorkbook$1.apply(WorkbookReader.scala:50)
at com.crealytics.spark.excel.DefaultWorkbookReader$$anonfun$openWorkbook$1.apply(WorkbookReader.scala:50)
at scala.Option.fold(Option.scala:158)
at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:50)
at com.crealytics.spark.excel.WorkbookReader$class.withWorkbook(WorkbookReader.scala:14)
at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:46)
at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:30)
at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:30)
at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104)
at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103)
at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:168)
at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:167)
at scala.Option.getOrElse(Option.scala:121)
at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:167)
at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:34)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:40)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:18)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:12)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Thanks for your help
Issue Analytics
- State:
- Created 3 years ago
- Comments:25
Top Results From Across the Web
Using Apache POI - Zip Bomb detected - Stack Overflow
The workaround is to add this line before you open the workbook: ZipSecureFile.setMinInflateRatio(0);.
Read more >Table Data Import failing with inflateRatio error — oracle-tech
I installed sql developer 19.1.0.094 Trying to import data into table using import data wizard. Data imported from Excel file.
Read more >Zip bomb detected error when reading Excel file - Mule 4
When processing a larger Excel file in Mule 4, you may experience an error saying 'Zip bomb detected'. This KB goes over how...
Read more >58499 – ZipSecureFile throws zip bomb detected
The file would exceed certain limits which usually indicate that the ... MIN_INFLATE_RATIO: 0.01, so this template file cannot be read by ...
Read more >Solved: Easy Import - Failure "zip bomb detected" - ServiceNow
When I try to upload this I got this error message: "error in loading headers from the xlsx data source: Zip bomb detected!...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry for not answering faster, i didn’t see your question before … 😦
Also you can try to use streaming version of reader by adding “maxRowsInMemory” option.