Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error reading xlsx file (MIN_INFLATE_RATIO exceeded)

See original GitHub issue

Hello, I read some xlsx files in a s3 bucket with spark 2.4.4 and crealytics 2.11:0.13.1 artefact One of them raises an error despite of correct content (tested with Libreoffice)

It seems to reach a compression limit and Spark invites to change a ziplib parameter

Is there a way to change this ZipSecureFile.setMinInflateRatio() through the artefact ?

adv2 = spark.read.format("com.crealytics.spark.excel").option(
                    "header", "true").option("inferSchema", "true").load("test.xlsx")

Error message :

Traceback (most recent call last):
  File "/tmp/aws-adv-daily.py", line 304, in main
    "header", "true").option("inferSchema", "true").load("test.xlsx")
  File "/usr/spark-2.4.4/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 166, in load
    return self._df(self._jreader.load(path))
  File "/usr/spark-2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/spark-2.4.4/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/spark-2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o37.load.
: java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data.
This may indicate that the file is used to inflate memory usage and thus could pose a security risk.
You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit.
Uncompressed size: 106496, Raw/compressed size: 859, ratio: 0.008066
Limits: MIN_INFLATE_RATIO: 0.010000, Entry: xl/styles.xml
	at shadeio.poi.openxml4j.util.ZipArchiveThresholdInputStream.checkThreshold(ZipArchiveThresholdInputStream.java:131)
	at shadeio.poi.openxml4j.util.ZipArchiveThresholdInputStream.read(ZipArchiveThresholdInputStream.java:81)
	at shadeio.poi.util.IOUtils.toByteArray(IOUtils.java:152)
	at shadeio.poi.util.IOUtils.toByteArray(IOUtils.java:121)
	at shadeio.poi.util.IOUtils.toByteArray(IOUtils.java:108)
	at shadeio.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47)
	at shadeio.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53)
	at shadeio.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:106)
	at shadeio.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:301)
	at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:134)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at shadeio.poi.ss.usermodel.WorkbookFactory.createWorkbook(WorkbookFactory.java:339)
	at shadeio.poi.ss.usermodel.WorkbookFactory.createXSSFWorkbook(WorkbookFactory.java:314)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:232)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:198)
	at com.crealytics.spark.excel.DefaultWorkbookReader$$anonfun$openWorkbook$1.apply(WorkbookReader.scala:50)
	at com.crealytics.spark.excel.DefaultWorkbookReader$$anonfun$openWorkbook$1.apply(WorkbookReader.scala:50)
	at scala.Option.fold(Option.scala:158)
	at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:50)
	at com.crealytics.spark.excel.WorkbookReader$class.withWorkbook(WorkbookReader.scala:14)
	at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:46)
	at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:30)
	at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:30)
	at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104)
	at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103)
	at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:168)
	at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:167)
	at scala.Option.getOrElse(Option.scala:121)
	at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:167)
	at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:34)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:40)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:18)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:12)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Thanks for your help

Issue Analytics

State:
Created 3 years ago
Comments:25

Top GitHub Comments

1reaction

Zulpitchcommented, Nov 25, 2021

Hi @rohitthapliyal180 @Ftagn92 and @Zulpitch

I know it’s been a long time already. If it is possible for you to share or step to create one excel file with this “MIN_INFLATE_RATIO exceeded” issue. I would like to add it to the test-suite.

Appreciate your help

Sorry for not answering faster, i didn’t see your question before … 😦

1reaction

EnverOsmanovcommented, Jul 7, 2021

Also you can try to use streaming version of reader by adding “maxRowsInMemory” option.

Top Results From Across the Web

Using Apache POI - Zip Bomb detected - Stack Overflow

The workaround is to add this line before you open the workbook: ZipSecureFile.setMinInflateRatio(0);.

Table Data Import failing with inflateRatio error — oracle-tech

I installed sql developer 19.1.0.094 Trying to import data into table using import data wizard. Data imported from Excel file.

Zip bomb detected error when reading Excel file - Mule 4

When processing a larger Excel file in Mule 4, you may experience an error saying 'Zip bomb detected'. This KB goes over how...

58499 – ZipSecureFile throws zip bomb detected

The file would exceed certain limits which usually indicate that the ... MIN_INFLATE_RATIO: 0.01, so this template file cannot be read by ...

Solved: Easy Import - Failure "zip bomb detected" - ServiceNow

When I try to upload this I got this error message: "error in loading headers from the xlsx data source: Zip bomb detected!...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Error reading xlsx file (MIN_INFLATE_RATIO exceeded)

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

DF's behaviour is unexpected when using of excel's stream reading

Writing a DataFrame to an Excel file doesn't produce Operation in Spark Logical Plan