Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TensorflowWrapper.scala fails to load ClassifierDLApproach

See original GitHub issue

Description

java.util.NoSuchElementException is thrown when doing TensorflowWrapper$.readZippedSavedModel as part of ClassifierDLApproach.loadSavedModel.

StackTrace:

	at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
	at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
	at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
	at scala.collection.IterableLike.head(IterableLike.scala:109)
	at scala.collection.IterableLike.head$(IterableLike.scala:108)
	at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:49)
	at scala.collection.IndexedSeqOptimized.head(IndexedSeqOptimized.scala:129)
	at scala.collection.IndexedSeqOptimized.head$(IndexedSeqOptimized.scala:129)
	at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:49)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.readZippedSavedModel(TensorflowWrapper.scala:506)
	at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLApproach.loadSavedModel(ClassifierDLApproach.scala:410)
	at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLApproach.train(ClassifierDLApproach.scala:346)
	at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLApproach.train(ClassifierDLApproach.scala:98)
	at com.johnsnowlabs.nlp.AnnotatorApproach._fit(AnnotatorApproach.scala:69)
	at com.johnsnowlabs.nlp.AnnotatorApproach.fit(AnnotatorApproach.scala:75)
	at org.apache.spark.ml.Pipeline.$anonfun$fit$5(Pipeline.scala:151)
	at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130)
	at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123)
	at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.Pipeline.$anonfun$fit$4(Pipeline.scala:151)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.ml.Pipeline.$anonfun$fit$2(Pipeline.scala:147)
	at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130)
	at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123)
	at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.Pipeline.$anonfun$fit$1(Pipeline.scala:133)
	at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
	at scala.util.Try$.apply(Try.scala:213)
	at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
	at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:133)

This seems to happen in https://github.com/JohnSnowLabs/spark-nlp/blob/340fe8068fae9a83130871f31633109f5fda8e70/src/main/scala/com/johnsnowlabs/ml/tensorflow/TensorflowWrapper.scala#L510, which is called from https://github.com/JohnSnowLabs/spark-nlp/blob/340fe8068fae9a83130871f31633109f5fda8e70/src/main/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/ClassifierDLApproach.scala#L426, using /classifier-dl as root directory in the call to readZippedSavedModel

/classifier-dl does not exist on the target machine, and the user running the JVM is not root

Expected Behavior

The model should be loaded

Current Behavior

Exception is thrown

Possible Solution

Steps to Reproduce

Create a “blank” Google Compute cloud instance with Ubuntu 20.04 focal distro
apt-get install -y --no-install-recommends git openjdk-8-jdk maven
git clone … - Build/deploy a jar that somewhere calls model = pipeline.fit(dataset), with pipeline = new Pipeline().setStages(new PipelineStage[] { getDocumentAssembler(), getTokenizer(), getEncoder(),getEmbedder(), getClassifier() }); and getClassifier() returning a new ClassifierDLApproach()
java -jar /home/some_user/some_target/some.jar

Your Environment

VM settings: Max. Heap Size (Estimated): 2.88G Using VM: OpenJDK 64-Bit Server VM

spark.jars.packages : com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.2.1

Linux test 5.11.0-1020-gcp #22~20.04.1-Ubuntu SMP Tue Sep 21 10:54:26 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Issue Analytics

State:
Created 2 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

danilojslcommented, Oct 27, 2021

Hi @kgoderis,

I created a small Spring Boot app that trains a ClassifierDL model to replicate the error. I tested on Ubuntu 20, Debian 11, and it is working. I also containerized the app with Docker, tested it under Ubuntu 20, Debian 11, and GCP General-Purpose Machine and Computed-Optimised (Debian - buster), and it works, as you can see in the screenshot below.

GCP

I’m not sure how to configure the underlying image to “stable” In GCP Control Panel, I just found options for buster, bullseye, and stretch. Could you please elaborate more on how to configure it as stable? boot disk

1reaction

kgoderiscommented, Oct 17, 2021

@maziyarpanahi I have tossed away the test instance, but it is plain vanilla Ubuntu 20.04, nothing fancy nor non-standard configuration. The docker based test was done in the same way via the GCP Control Panel, except adding the container image, but here I changed the underlying image to be the “stable” one from the list shown on the Control Panel, e.g I avoided the “dev” image release

Top Results From Across the Web

Error when load Spark-nlp pretrainedPipeline - Stack Overflow

I am new to Scala, can anyone recognize the reason? Thank you in advance. My code: enter image description here. My dependencies: [...

ClassifierDLApproach - Spark NLP

ClassifierDL uses the state-of-the-art Universal Sentence Encoder as an input for text classifications. The ClassifierDL annotator uses a deep learning model ( ...

spark-nlp Changelog - pyup.io

Official support for Apache Spark and PySpark 3.2.x on Scala 2.12. ... Fix DeBertaForSequenceClassification in Python failing to load pretrained models

Spark NLP: State of the Art Natural Language Processing

python.conda install -c johnsnowlabs spark-nlp ... getTFHubSession(TensorflowWrapper.scala:109) at com.johnsnowlabs.ml.tensorflow.

Spark NLP: State-of-the-Art Natural Language Processing

Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, ... If you installed pyspark through pip/conda, you can install...