Wrong FS when loading PerceptronModel
See original GitHub issueDescription
Saving a trained POS model to s3 and loading it back throws a java.lang.IllegalArgumentException exception with the message “Wrong FS, … expected: hdfs…”
Expected Behavior
A saved model (either standalone or as part of a spark PipelineModel) should be loadable from s3
Current Behavior
While the model can be saved at the moment, reading it back either as a standalone model or as part of a PipelineModel throws an exception. A sample stacktrace looks like this:
java.lang.IllegalArgumentException: Wrong FS: s3://<bucket-name>/pos-models/anc-pos/fields/POS Model, expected: hdfs://<ip-address>:9000
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:651)
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:105)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1415)
at com.johnsnowlabs.nlp.serialization.StructFeature.deserializeObject(Feature.scala:111)
at com.johnsnowlabs.nlp.serialization.Feature.deserialize(Feature.scala:44)
at com.johnsnowlabs.nlp.FeaturesReader$$anonfun$load$1.apply(ParamsAndFeaturesReadable.scala:13)
at com.johnsnowlabs.nlp.FeaturesReader$$anonfun$load$1.apply(ParamsAndFeaturesReadable.scala:12)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:12)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:6)
at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:218)
at com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel$.load(PerceptronModel.scala:71)
Possible Solution
The problem appears to be related to how the default FileSystem is used instead of inferring it from the given Path. Refer https://github.com/harsha2010/magellan/issues/114
Steps to Reproduce
The following code in Scala should help reproduce the problem:
new PerceptronApproach()
.setInputCols(Array("sentence", "normalized"))
.setOutputCol("pos")
.fit(df)
.save("s3://path")
val perceptronModel = PerceptronModel.read.load("s3://path") // throws an exception
Context
Since we use S3 as our DFS, saving and loading models to/from S3 is critical to model training and serving.
Your Environment
Version used: 1.4.0
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (4 by maintainers)

Top Related StackOverflow Question
i was facing this issue, while trying to load PreTrained “explain_document_dl_en_2.4.3_2.4_1584626657780”
i fixed the point by setting up the hdfs uri for the spark executors
i can give more details later, i m trying to finalize the tests to confirm the solution
my environment is:
linux , spark 2.4.4, spark nlp 2.4.5, hdfs 3.3.1, IDE eclipse, programming language JAVA
from what i tested, the solution (just 1 line in the code) works like a charm
Fix released. Thank you @avenka11 https://github.com/JohnSnowLabs/spark-nlp/releases/tag/1.4.2