question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using pretrained pipelines in PySpark throws java.util.NoSuchElementException

See original GitHub issue

Trying to run a minimal example including a pretrained pipeline in PySpark results in an exception thrown in com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline.

Description

Running the minimal example provided at https://github.com/JohnSnowLabs/spark-nlp/blob/master/python/example/quick-start.ipynb crashes at pipeline = PretrainedPipeline('pipeline_vivekn') with the exception

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'PretrainedPipeline' is not defined
>>> from sparknlp import PretrainedPipeline
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'PretrainedPipeline'
>>> from sparknlp.pretrained import PretrainedPipeline
>>> pipeline = PretrainedPipeline('explain_document_ml')
19/03/21 10:45:54 WARN org.apache.spark.sql.SparkSession$Builder: Using an existing SparkSession; some configuration may not take effect.
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline.
: java.lang.IllegalArgumentException: requirement failed: Was not found appropriate resource to download for request: ResourceRequest(explain_document_ml,Some(en),public/models,2.0.0,2.4.0) with downloader: com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader@4f5dba48
	at scala.Predef$.require(Predef.scala:224)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadResource(ResourceDownloader.scala:102)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:133)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:128)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadPipeline(ResourceDownloader.scala:197)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline(ResourceDownloader.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/hadoop/spark/tmp/spark-aec8140d-aa84-4905-9d08-641b49024bcc/userFiles-488b2b9f-bdd2-4ce4-af07-a70d10fb4d51/JohnSnowLabs_spark-nlp-2.0.0.jar/sparknlp/pretrained.py", line 30, in __init__
  File "/hadoop/spark/tmp/spark-aec8140d-aa84-4905-9d08-641b49024bcc/userFiles-488b2b9f-bdd2-4ce4-af07-a70d10fb4d51/JohnSnowLabs_spark-nlp-2.0.0.jar/sparknlp/pretrained.py", line 18, in downloadPipeline
  File "/hadoop/spark/tmp/spark-aec8140d-aa84-4905-9d08-641b49024bcc/userFiles-488b2b9f-bdd2-4ce4-af07-a70d10fb4d51/JohnSnowLabs_spark-nlp-2.0.0.jar/sparknlp/internal.py", line 65, in __init__
  File "/usr/lib/spark/python/pyspark/ml/wrapper.py", line 67, in _new_java_obj
    return java_obj(*java_args)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Was not found appropriate resource to download for request: ResourceRequest(explain_document_ml,Some(en),public/models,2.0.0,2.4.0) with downloader: com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader@4f5dba48'
>>> pipeline = PretrainedPipeline('pipeline_vivekn')
[Stage 0:>                                                                                                                            Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/hadoop/spark/tmp/spark-aec8140d-aa84-4905-9d08-641b49024bcc/userFiles-488b2b9f-bdd2-4ce4-af07-a70d10fb4d51/JohnSnowLabs_spark-nlp-2.0.0.jar/sparknlp/pretrained.py", line 30, in __init__
  File "/hadoop/spark/tmp/spark-aec8140d-aa84-4905-9d08-641b49024bcc/userFiles-488b2b9f-bdd2-4ce4-af07-a70d10fb4d51/JohnSnowLabs_spark-nlp-2.0.0.jar/sparknlp/pretrained.py", line 18, in downloadPipeline
  File "/hadoop/spark/tmp/spark-aec8140d-aa84-4905-9d08-641b49024bcc/userFiles-488b2b9f-bdd2-4ce4-af07-a70d10fb4d51/JohnSnowLabs_spark-nlp-2.0.0.jar/sparknlp/internal.py", line 65, in __init__
  File "/usr/lib/spark/python/pyspark/ml/wrapper.py", line 67, in _new_java_obj
    return java_obj(*java_args)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline.
: java.util.NoSuchElementException: Param patterns does not exist.
	at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
	at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
	at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
	at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:591)
	at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:589)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at org.apache.spark.ml.util.DefaultParamsReader$Metadata.setParams(ReadWrite.scala:589)
	at org.apache.spark.ml.util.DefaultParamsReader$Metadata.getAndSetParams(ReadWrite.scala:572)
	at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:497)
	at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:12)
	at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8)
	at org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstance(ReadWrite.scala:652)
	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$4.apply(Pipeline.scala:274)
	at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$4.apply(Pipeline.scala:272)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:272)
	at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348)
	at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:134)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:128)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadPipeline(ResourceDownloader.scala:197)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline(ResourceDownloader.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Expected Behavior

All necessary resources should be downloaded without running into an exception.

Current Behavior

Downloader crashes when attempting to download resources for pretrained pipeline. This happens at least for the pretrained pipelines pipeline_vivekn and explain_document_ml.

Possible Solution

No idea, but I’m running Spark on a google dataproc cluster in standalone mode. Perhaps that introduces some complications?

Steps to Reproduce

  1. Set up PySpark for Python 3.
  2. Run Spark in standalone mode.
  3. Run pyspark --packages JohnSnowLabs:spark-nlp:2.0.0.
  4. Run code in https://github.com/JohnSnowLabs/spark-nlp/blob/master/python/example/quick-start.ipynb.

Context

Goal: Run minimal example without crashing. https://github.com/JohnSnowLabs/spark-nlp/issues/299, which seems to be about the same issue. I’m running Version 2.0.0 though, so I figured this bug should be fixed by now.

Your Environment

  • Version used: 2.0.0
  • Operating System and version (desktop or mobile): Debian 9.8 on a google dataproc cluster.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
maziyarpanahicommented, Apr 23, 2020

Hi @jamshaidsohail5 I hope you are safe and fine as well. In 2020 we no longer need _noncontrib to be compatible in Windows, all the models and pipelines in 2.4.x releases are cross platform. The current and updated names of models and pipelines are here :

https://github.com/JohnSnowLabs/spark-nlp-models

So if you are on latest versions (which I can see you are) you just go ahead and remove the _noncontrib part from the name and only use : recognize_entities_dl

Let me know if you had any issue.

1reaction
maziyarpanahicommented, Aug 20, 2019

Hi, it works in our latest official release: 2.1.0 as you can see in this example:

https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/quick_start.ipynb

Read more comments on GitHub >

github_iconTop Results From Across the Web

NoSuchElementException: Failed to find a default value for ...
I works with no error. But saving and re-using the model throws this error. Any help on how to solve this " Failed...
Read more >
java.util.NoSuchElementException ... - Databricks Community
Hello,. We are using a Azure Databricks with Standard DS14_V2 Cluster with Runtime 9.1 LTS, Spark 3.1.2 and Scala 2.12 and facing the...
Read more >
spark-nlp - PyPI
This is a quick example of how to use Spark NLP pre-trained pipeline in Python and PySpark: $ java -version # should be...
Read more >
spark-nlp Changelog - pyup.io
Fix and re-upload Dependency and Type Dependency parser pre-trained models * Update pre-trained pipelines with issues on PySpark 3.2 and 3.3 ======== ...
Read more >
SparkConf (Spark 3.2.0 JavaDoc)
Get a parameter; throws a NoSuchElementException if it's not set. String, get(String key, ... Option<String>, getDeprecatedConfig(String key, java.util.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found