question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error reading xlsx file (MIN_INFLATE_RATIO exceeded)

See original GitHub issue

Hello, I read some xlsx files in a s3 bucket with spark 2.4.4 and crealytics 2.11:0.13.1 artefact One of them raises an error despite of correct content (tested with Libreoffice)

It seems to reach a compression limit and Spark invites to change a ziplib parameter

Is there a way to change this ZipSecureFile.setMinInflateRatio() through the artefact ?

adv2 = spark.read.format("com.crealytics.spark.excel").option(
                    "header", "true").option("inferSchema", "true").load("test.xlsx")

Error message :

Traceback (most recent call last):
  File "/tmp/aws-adv-daily.py", line 304, in main
    "header", "true").option("inferSchema", "true").load("test.xlsx")
  File "/usr/spark-2.4.4/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 166, in load
    return self._df(self._jreader.load(path))
  File "/usr/spark-2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/spark-2.4.4/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/spark-2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o37.load.
: java.io.IOException: Zip bomb detected! The file would exceed the max. ratio of compressed file size to the size of the expanded data.
This may indicate that the file is used to inflate memory usage and thus could pose a security risk.
You can adjust this limit via ZipSecureFile.setMinInflateRatio() if you need to work with files which exceed this limit.
Uncompressed size: 106496, Raw/compressed size: 859, ratio: 0.008066
Limits: MIN_INFLATE_RATIO: 0.010000, Entry: xl/styles.xml
	at shadeio.poi.openxml4j.util.ZipArchiveThresholdInputStream.checkThreshold(ZipArchiveThresholdInputStream.java:131)
	at shadeio.poi.openxml4j.util.ZipArchiveThresholdInputStream.read(ZipArchiveThresholdInputStream.java:81)
	at shadeio.poi.util.IOUtils.toByteArray(IOUtils.java:152)
	at shadeio.poi.util.IOUtils.toByteArray(IOUtils.java:121)
	at shadeio.poi.util.IOUtils.toByteArray(IOUtils.java:108)
	at shadeio.poi.openxml4j.util.ZipArchiveFakeEntry.<init>(ZipArchiveFakeEntry.java:47)
	at shadeio.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:53)
	at shadeio.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:106)
	at shadeio.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:301)
	at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:134)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at shadeio.poi.ss.usermodel.WorkbookFactory.createWorkbook(WorkbookFactory.java:339)
	at shadeio.poi.ss.usermodel.WorkbookFactory.createXSSFWorkbook(WorkbookFactory.java:314)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:232)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:198)
	at com.crealytics.spark.excel.DefaultWorkbookReader$$anonfun$openWorkbook$1.apply(WorkbookReader.scala:50)
	at com.crealytics.spark.excel.DefaultWorkbookReader$$anonfun$openWorkbook$1.apply(WorkbookReader.scala:50)
	at scala.Option.fold(Option.scala:158)
	at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:50)
	at com.crealytics.spark.excel.WorkbookReader$class.withWorkbook(WorkbookReader.scala:14)
	at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:46)
	at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:30)
	at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:30)
	at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104)
	at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103)
	at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:168)
	at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:167)
	at scala.Option.getOrElse(Option.scala:121)
	at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:167)
	at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:34)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:40)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:18)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:12)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Thanks for your help

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:25

github_iconTop GitHub Comments

1reaction
Zulpitchcommented, Nov 25, 2021

Hi @rohitthapliyal180 @Ftagn92 and @Zulpitch

I know it’s been a long time already. If it is possible for you to share or step to create one excel file with this “MIN_INFLATE_RATIO exceeded” issue. I would like to add it to the test-suite.

Appreciate your help

Sorry for not answering faster, i didn’t see your question before … 😦

1reaction
EnverOsmanovcommented, Jul 7, 2021

Also you can try to use streaming version of reader by adding “maxRowsInMemory” option.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using Apache POI - Zip Bomb detected - Stack Overflow
The workaround is to add this line before you open the workbook: ZipSecureFile.setMinInflateRatio(0);.
Read more >
Table Data Import failing with inflateRatio error — oracle-tech
I installed sql developer 19.1.0.094 Trying to import data into table using import data wizard. Data imported from Excel file.
Read more >
Zip bomb detected error when reading Excel file - Mule 4
When processing a larger Excel file in Mule 4, you may experience an error saying 'Zip bomb detected'. This KB goes over how...
Read more >
58499 – ZipSecureFile throws zip bomb detected
The file would exceed certain limits which usually indicate that the ... MIN_INFLATE_RATIO: 0.01, so this template file cannot be read by ...
Read more >
Solved: Easy Import - Failure "zip bomb detected" - ServiceNow
When I try to upload this I got this error message: "error in loading headers from the xlsx data source: Zip bomb detected!...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found