Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong FS when loading PerceptronModel

See original GitHub issue

Description

Saving a trained POS model to s3 and loading it back throws a java.lang.IllegalArgumentException exception with the message “Wrong FS, … expected: hdfs…”

Expected Behavior

A saved model (either standalone or as part of a spark PipelineModel) should be loadable from s3

Current Behavior

While the model can be saved at the moment, reading it back either as a standalone model or as part of a PipelineModel throws an exception. A sample stacktrace looks like this:

java.lang.IllegalArgumentException: Wrong FS: s3://<bucket-name>/pos-models/anc-pos/fields/POS Model, expected: hdfs://<ip-address>:9000
  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:651)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
  at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:105)
  at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
  at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1415)
  at com.johnsnowlabs.nlp.serialization.StructFeature.deserializeObject(Feature.scala:111)
  at com.johnsnowlabs.nlp.serialization.Feature.deserialize(Feature.scala:44)
  at com.johnsnowlabs.nlp.FeaturesReader$$anonfun$load$1.apply(ParamsAndFeaturesReadable.scala:13)
  at com.johnsnowlabs.nlp.FeaturesReader$$anonfun$load$1.apply(ParamsAndFeaturesReadable.scala:12)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:12)
  at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:6)
  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:218)
  at com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel$.load(PerceptronModel.scala:71)

Possible Solution

The problem appears to be related to how the default FileSystem is used instead of inferring it from the given Path. Refer https://github.com/harsha2010/magellan/issues/114

Steps to Reproduce

The following code in Scala should help reproduce the problem:

new PerceptronApproach()
  .setInputCols(Array("sentence", "normalized"))
  .setOutputCol("pos")
  .fit(df)
  .save("s3://path")

val perceptronModel = PerceptronModel.read.load("s3://path") // throws an exception

Context

Since we use S3 as our DFS, saving and loading models to/from S3 is critical to model training and serving.

Your Environment

Version used: 1.4.0

Issue Analytics

State:
Created 6 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

sparktacuscommented, May 11, 2020

i was facing this issue, while trying to load PreTrained “explain_document_dl_en_2.4.3_2.4_1584626657780”
i fixed the point by setting up the hdfs uri for the spark executors
i can give more details later, i m trying to finalize the tests to confirm the solution
my environment is:
linux , spark 2.4.4, spark nlp 2.4.5, hdfs 3.3.1, IDE eclipse, programming language JAVA
from what i tested, the solution (just 1 line in the code) works like a charm

1reaction

saif-ellaficommented, Mar 12, 2018

Fix released. Thank you @avenka11 https://github.com/JohnSnowLabs/spark-nlp/releases/tag/1.4.2

Top Results From Across the Web

IllegalArgumentException, Wrong FS when writing ML model ...

When i do df.write.save("s3://sparkstore/model") I get Name: org.apache.hadoop.fs.s3.S3Exception Message: org.jets3t.service.

A New Time Series Forecasting Model Based on Complete ... - NCBI

In this paper, a new hybrid time series forecasting model based on the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and...

https://mycourses.aalto.fi/pluginfile.php/1631809/...

Use ex1_windowing frame_length = np.int(np.around(0.025*Fs))# 25ms in samples ... 2: Train a perceptron model with the computed input parameters for VAD.

Short-term load forecasting with artificial neural network and fuzzy ...

This paper proposes a new model, which divides the electric load into two parts: the load scaled curve and the day maximal load...

Interface / fff999 / Observable

Object {BayesianClassifier: ƒ(), PerceptronModel: ƒ(), addToMean: ƒ(r, t, n), ... Error: Unable to load file: CleanShot 2022-05-03 at 19.41.49-2x.png ...