question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong FS when loading PerceptronModel

See original GitHub issue

Description

Saving a trained POS model to s3 and loading it back throws a java.lang.IllegalArgumentException exception with the message “Wrong FS, … expected: hdfs…”

Expected Behavior

A saved model (either standalone or as part of a spark PipelineModel) should be loadable from s3

Current Behavior

While the model can be saved at the moment, reading it back either as a standalone model or as part of a PipelineModel throws an exception. A sample stacktrace looks like this:

java.lang.IllegalArgumentException: Wrong FS: s3://<bucket-name>/pos-models/anc-pos/fields/POS Model, expected: hdfs://<ip-address>:9000
  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:651)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
  at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:105)
  at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
  at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1415)
  at com.johnsnowlabs.nlp.serialization.StructFeature.deserializeObject(Feature.scala:111)
  at com.johnsnowlabs.nlp.serialization.Feature.deserialize(Feature.scala:44)
  at com.johnsnowlabs.nlp.FeaturesReader$$anonfun$load$1.apply(ParamsAndFeaturesReadable.scala:13)
  at com.johnsnowlabs.nlp.FeaturesReader$$anonfun$load$1.apply(ParamsAndFeaturesReadable.scala:12)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:12)
  at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:6)
  at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:218)
  at com.johnsnowlabs.nlp.annotators.pos.perceptron.PerceptronModel$.load(PerceptronModel.scala:71)

Possible Solution

The problem appears to be related to how the default FileSystem is used instead of inferring it from the given Path. Refer https://github.com/harsha2010/magellan/issues/114

Steps to Reproduce

The following code in Scala should help reproduce the problem:

new PerceptronApproach()
  .setInputCols(Array("sentence", "normalized"))
  .setOutputCol("pos")
  .fit(df)
  .save("s3://path")

val perceptronModel = PerceptronModel.read.load("s3://path") // throws an exception

Context

Since we use S3 as our DFS, saving and loading models to/from S3 is critical to model training and serving.

Your Environment

Version used: 1.4.0

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
sparktacuscommented, May 11, 2020

i was facing this issue, while trying to load PreTrained “explain_document_dl_en_2.4.3_2.4_1584626657780”
i fixed the point by setting up the hdfs uri for the spark executors
i can give more details later, i m trying to finalize the tests to confirm the solution
my environment is:
linux , spark 2.4.4, spark nlp 2.4.5, hdfs 3.3.1, IDE eclipse, programming language JAVA
from what i tested, the solution (just 1 line in the code) works like a charm

1reaction
saif-ellaficommented, Mar 12, 2018
Read more comments on GitHub >

github_iconTop Results From Across the Web

IllegalArgumentException, Wrong FS when writing ML model ...
When i do df.write.save("s3://sparkstore/model") I get Name: org.apache.hadoop.fs.s3.S3Exception Message: org.jets3t.service.
Read more >
A New Time Series Forecasting Model Based on Complete ... - NCBI
In this paper, a new hybrid time series forecasting model based on the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and...
Read more >
https://mycourses.aalto.fi/pluginfile.php/1631809/...
Use ex1_windowing frame_length = np.int(np.around(0.025*Fs))# 25ms in samples ... 2: Train a perceptron model with the computed input parameters for VAD.
Read more >
Short-term load forecasting with artificial neural network and fuzzy ...
This paper proposes a new model, which divides the electric load into two parts: the load scaled curve and the day maximal load...
Read more >
Interface / fff999 / Observable
Object {BayesianClassifier: ƒ(), PerceptronModel: ƒ(), addToMean: ƒ(r, t, n), ... Error: Unable to load file: CleanShot 2022-05-03 at 19.41.49-2x.png ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found