question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to load models on Windows 10

See original GitHub issue

Apologies as I cannot say if this is 100% a bug, but it is nonetheless unexpected behavior after following the documentation.

I have a simple example I am trying to in Spark NLP using scala.

As background I am on a windows machine and I have installed Spark 3.1.1 with prebuilt hadoop 2.7 (following these instructions https://phoenixnap.com/kb/install-spark-on-windows-10). Basic spark programs appear to work as expected which leads me to think the problem is not alone with spark and hadoop - paths set for both SPARK_HOME and HADOOP_HOME, correct winutils.exe file put in hadoop/bin folder, etc.

Description

I have the following simple spark nlp application in scala

import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import com.johnsnowlabs.nlp._
import org.apache.spark.ml.Pipeline
import com.johnsnowlabs.nlp.embeddings.BertEmbeddings


object SparkNLPExplore extends App{

  val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
    .setCleanupMode("shrink")

  val sentenceDetector = new SentenceDetector()
    .setInputCols("document")
    .setOutputCol("sentence")
    .setLazyAnnotator(false)

  val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")
    .setContextChars(Array("(", ")", "?", "!"))
    .setSplitChars(Array("-"))
    .setExceptions(Array("New York", "e-mail"))
    .setSplitPattern("'")
    .setMaxLength(0)
    .setMaxLength(99999)
    .setCaseSensitiveExceptions(false)

  val embeddings = BertEmbeddings.pretrained("bert_base_cased", "en")
    .setInputCols("sentence", "token")
    .setOutputCol("embeddings")

  println(embeddings)

}

which yields the following output

[info] done compiling
[info] running SparkNLPExplore
bert_base_cased download started this may take some time.
Approximate size to download 389.1 MB
[ WARN] Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
Download done! Loading the resource.
[error] (run-main-1) java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
[error] java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
[error]         at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
[error]         at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:460)
[error]         at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:821)
[error]         at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:735)
[error]         at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:703)
[error]         at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:52)
[error]         at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2091)
[error]         at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2071)
[error]         at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:280)
[error]         at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
[error]         at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
[error]         at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:205)
[error]         at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
[error]         at scala.Option.getOrElse(Option.scala:189)
[error]         at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
[error]         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
[error]         at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:300)
[error]         at scala.Option.getOrElse(Option.scala:189)
[error]         at org.apache.spark.rdd.RDD.partitions(RDD.scala:296)
[error]         at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1428)
[error]         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[error]         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[error]         at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
[error]         at org.apache.spark.rdd.RDD.take(RDD.scala:1422)
[error]         at org.apache.spark.rdd.RDD.$anonfun$first$1(RDD.scala:1463)
[error]         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[error]         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[error]         at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
[error]         at org.apache.spark.rdd.RDD.first(RDD.scala:1463)
[error]         at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:587)
[error]         at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:465)
[error]         at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:12)
[error]         at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8)
[error]         at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:363)
[error]         at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:357)
[error]         at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:27)
[error]         at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:24)
[error]         at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedBertModel$$super$pretrained(BertEmbeddings.scala:290)
[error]         at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained(BertEmbeddings.scala:246)
[error]         at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained$(BertEmbeddings.scala:246)
[error]         at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.pretrained(BertEmbeddings.scala:290)
[error]         at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.pretrained(BertEmbeddings.scala:290)
[error]         at com.johnsnowlabs.nlp.HasPretrained.pretrained(HasPretrained.scala:30)
[error]         at com.johnsnowlabs.nlp.HasPretrained.pretrained$(HasPretrained.scala:30)
[error]         at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.com$johnsnowlabs$nlp$embeddings$ReadablePretrainedBertModel$$super$pretrained(BertEmbeddings.scala:290)
[error]         at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained(BertEmbeddings.scala:245)
[error]         at com.johnsnowlabs.nlp.embeddings.ReadablePretrainedBertModel.pretrained$(BertEmbeddings.scala:245)
[error]         at com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.pretrained(BertEmbeddings.scala:290)
[error]         at SparkNLPExplore$.delayedEndpoint$SparkNLPExplore$1(SparkNLPExplore.scala:31)
[error]         at SparkNLPExplore$delayedInit$body.apply(SparkNLPExplore.scala:8)
[error]         at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error]         at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error]         at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error]         at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error]         at scala.collection.immutable.List.foreach(List.scala:392)
[error]         at scala.App.main(App.scala:80)
[error]         at scala.App.main$(App.scala:78)
[error]         at SparkNLPExplore$.main(SparkNLPExplore.scala:8)
[error]         at SparkNLPExplore.main(SparkNLPExplore.scala)
[error]         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]         at java.lang.reflect.Method.invoke(Method.java:498)
[error] stack trace is suppressed; run 'last Compile / bgRun' for the full output
[ERROR] uncaught error in thread spark-listener-group-appStatus, stopping SparkContext

For reference here is my sbt file:

name := "spark-nlp"

version := "0.1"

scalaVersion := "2.12.10"

// https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "3.1.1",
  "org.apache.spark" %% "spark-mllib" % "3.1.1",
  "com.johnsnowlabs.nlp" %% "spark-nlp" % "3.0.1"
)

Expected Behavior

I would expect the embeddings to be loaded and the object information to be printed

Current Behavior

It seems the pre-trained embeddings download but an exception is caused when trying to load the resource. This behavior seems to be consistent when trying to access any of the pre-trained models.

Possible Solution

Perhaps this is something as silly as a version clash though everything looks OK to me.

Things I have tried so far:

  • I have also tried adding additionally the hadoop.dll to both %HADOOP_HOME%/bin and C:/Windows/System32 with no luck
  • I also updated permission of the winutils file as suggested here #1022

Steps to Reproduce

  1. Run simple scala program in Windows 10 with spark 3.1.1 and hadoop

Context

I am not able to use spark-nlp. Can someone please help me - I have spent many hours already trying to resolve this issue.

Your Environment

  • Spark NLP version sparknlp.version(): 3.1.1
  • Apache NLP version spark.version: 2.7
  • Java version java -version: OpenJDK 64-Bit Server VM, Java 1.8.0_275
  • Setup and installation (Pypi, Conda, Maven, etc.):
  • Operating System and version: Windows 10
  • Link to your project (if any):

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
wolliqcommented, Apr 16, 2021

HI @masonedmison I’m working on your description to reproduce the issue. I’ll get back to you as soon as I have inspected the env and got the same outcome. Thank you for your patience

0reactions
wolliqcommented, Apr 28, 2021

Hello, I close the ticket as the issue is solved.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Solved: Unable to Load Model error message
I have a group of users who receive and "Unable to Load Model" error message. Another group of users have no issues with...
Read more >
When Something Goes Wrong (Unable to Load the Model)
Scenario: You deploy a model to a Power BI workspace. You assign users to Members and Viewers roles. Everyone is happy.
Read more >
Couldn't load 3d model error - Microsoft Community
Right-click the file to expand a context menu. 3. Locate 'Open with' at the top of the menu and click on it to...
Read more >
suspected dynamic Row Level Security (RLS) issue - Power BI
Unable to load model - suspected dynamic Row Level Security (RLS) issue ... According to the current Microsoft guidance:.
Read more >
Unable to load a Keras saved model (Error: unable to open file)
Try updating Keras. 'conda update keras' or upgrade via pip. This error is commonly seen with Keras v2.0. Upgrading may resolve it. –...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found