question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TensorflowWrapper.scala fails to load ClassifierDLApproach

See original GitHub issue

Description

java.util.NoSuchElementException is thrown when doing TensorflowWrapper$.readZippedSavedModel as part of ClassifierDLApproach.loadSavedModel.

StackTrace:

	at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
	at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
	at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
	at scala.collection.IterableLike.head(IterableLike.scala:109)
	at scala.collection.IterableLike.head$(IterableLike.scala:108)
	at scala.collection.mutable.ArrayBuffer.scala$collection$IndexedSeqOptimized$$super$head(ArrayBuffer.scala:49)
	at scala.collection.IndexedSeqOptimized.head(IndexedSeqOptimized.scala:129)
	at scala.collection.IndexedSeqOptimized.head$(IndexedSeqOptimized.scala:129)
	at scala.collection.mutable.ArrayBuffer.head(ArrayBuffer.scala:49)
	at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.readZippedSavedModel(TensorflowWrapper.scala:506)
	at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLApproach.loadSavedModel(ClassifierDLApproach.scala:410)
	at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLApproach.train(ClassifierDLApproach.scala:346)
	at com.johnsnowlabs.nlp.annotators.classifier.dl.ClassifierDLApproach.train(ClassifierDLApproach.scala:98)
	at com.johnsnowlabs.nlp.AnnotatorApproach._fit(AnnotatorApproach.scala:69)
	at com.johnsnowlabs.nlp.AnnotatorApproach.fit(AnnotatorApproach.scala:75)
	at org.apache.spark.ml.Pipeline.$anonfun$fit$5(Pipeline.scala:151)
	at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130)
	at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123)
	at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.Pipeline.$anonfun$fit$4(Pipeline.scala:151)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.ml.Pipeline.$anonfun$fit$2(Pipeline.scala:147)
	at org.apache.spark.ml.MLEvents.withFitEvent(events.scala:130)
	at org.apache.spark.ml.MLEvents.withFitEvent$(events.scala:123)
	at org.apache.spark.ml.util.Instrumentation.withFitEvent(Instrumentation.scala:42)
	at org.apache.spark.ml.Pipeline.$anonfun$fit$1(Pipeline.scala:133)
	at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
	at scala.util.Try$.apply(Try.scala:213)
	at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
	at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:133)

This seems to happen in https://github.com/JohnSnowLabs/spark-nlp/blob/340fe8068fae9a83130871f31633109f5fda8e70/src/main/scala/com/johnsnowlabs/ml/tensorflow/TensorflowWrapper.scala#L510, which is called from https://github.com/JohnSnowLabs/spark-nlp/blob/340fe8068fae9a83130871f31633109f5fda8e70/src/main/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/ClassifierDLApproach.scala#L426, using /classifier-dl as root directory in the call to readZippedSavedModel

/classifier-dl does not exist on the target machine, and the user running the JVM is not root

Expected Behavior

The model should be loaded

Current Behavior

Exception is thrown

Possible Solution

Steps to Reproduce

  1. Create a “blank” Google Compute cloud instance with Ubuntu 20.04 focal distro
  2. apt-get install -y --no-install-recommends git openjdk-8-jdk maven
  3. git clone … - Build/deploy a jar that somewhere calls model = pipeline.fit(dataset), with pipeline = new Pipeline().setStages(new PipelineStage[] { getDocumentAssembler(), getTokenizer(), getEncoder(),getEmbedder(), getClassifier() }); and getClassifier() returning a new ClassifierDLApproach()
  4. java -jar /home/some_user/some_target/some.jar

Your Environment

VM settings: Max. Heap Size (Estimated): 2.88G Using VM: OpenJDK 64-Bit Server VM

spark.jars.packages : com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.2.1

Linux test 5.11.0-1020-gcp #22~20.04.1-Ubuntu SMP Tue Sep 21 10:54:26 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
danilojslcommented, Oct 27, 2021

Hi @kgoderis,

I created a small Spring Boot app that trains a ClassifierDL model to replicate the error. I tested on Ubuntu 20, Debian 11, and it is working. I also containerized the app with Docker, tested it under Ubuntu 20, Debian 11, and GCP General-Purpose Machine and Computed-Optimised (Debian - buster), and it works, as you can see in the screenshot below.

GCP

I’m not sure how to configure the underlying image to “stable” In GCP Control Panel, I just found options for buster, bullseye, and stretch. Could you please elaborate more on how to configure it as stable? boot disk

1reaction
kgoderiscommented, Oct 17, 2021

@maziyarpanahi I have tossed away the test instance, but it is plain vanilla Ubuntu 20.04, nothing fancy nor non-standard configuration. The docker based test was done in the same way via the GCP Control Panel, except adding the container image, but here I changed the underlying image to be the “stable” one from the list shown on the Control Panel, e.g I avoided the “dev” image release

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error when load Spark-nlp pretrainedPipeline - Stack Overflow
I am new to Scala, can anyone recognize the reason? Thank you in advance. My code: enter image description here. My dependencies: [...
Read more >
ClassifierDLApproach - Spark NLP
ClassifierDL uses the state-of-the-art Universal Sentence Encoder as an input for text classifications. The ClassifierDL annotator uses a deep learning model ( ...
Read more >
spark-nlp Changelog - pyup.io
Official support for Apache Spark and PySpark 3.2.x on Scala 2.12. ... Fix DeBertaForSequenceClassification in Python failing to load pretrained models
Read more >
Spark NLP: State of the Art Natural Language Processing
python.conda install -c johnsnowlabs spark-nlp ... getTFHubSession(TensorflowWrapper.scala:109) at com.johnsnowlabs.ml.tensorflow.
Read more >
Spark NLP: State-of-the-Art Natural Language Processing
Spark NLP supports Scala 2.12.15 if you are using Apache Spark 3.0.x, 3.1.x, ... If you installed pyspark through pip/conda, you can install...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found