Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark NLP wont download analyze_sentiment model from AWS S3, keeps giving forbidden Error

See original GitHub issue

Here is the code I am trying to run on my local conda environment. It keeps giving error. Please tell the solution: ` import sparknlp

from pyspark.sql import SparkSession

spark = sparknlp.start() print(sparknlp.version()) print(spark.version) from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline(‘analyze_sentiment_ml’,‘en’)

result = pipeline.annotate(‘Harry Potter is a bad movie’) print(result[‘sentiment’]) `

Description

Expected Behavior

It should tell the sentiment of the sentence but I get the error:

Current Behavior

21/03/31 09:21:58 WARN Utils: Your hostname, shayan-GP73-Leopard-8RE resolves to a loopback address: 127.0.1.1; using 10.0.0.164 instead (on interface wlo1)
21/03/31 09:21:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/shayan/.ivy2/cache
The jars for the packages stored in: /home/shayan/.ivy2/jars
com.johnsnowlabs.nlp#spark-nlp_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-ecd584d7-281f-46f5-815d-a55d3c68e553;1.0
        confs: [default]
        found com.johnsnowlabs.nlp#spark-nlp_2.12;3.0.0 in central
        found com.typesafe#config;1.3.0 in central
        found org.rocksdb#rocksdbjni;6.5.3 in central
        found com.amazonaws#aws-java-sdk-bundle;1.11.603 in central
        found com.github.universal-automata#liblevenshtein;3.0.0 in central
        found com.google.code.findbugs#annotations;3.0.1 in central
        found net.jcip#jcip-annotations;1.0 in central
        found com.google.code.findbugs#jsr305;3.0.1 in central
        found com.google.protobuf#protobuf-java-util;3.0.0-beta-3 in central
        found com.google.protobuf#protobuf-java;3.0.0-beta-3 in central
        found com.google.code.gson#gson;2.3 in central
        found it.unimi.dsi#fastutil;7.0.12 in central
        found org.projectlombok#lombok;1.16.8 in central
        found org.slf4j#slf4j-api;1.7.21 in central
        found com.navigamez#greex;1.0 in central
        found dk.brics.automaton#automaton;1.11-8 in central
        found org.json4s#json4s-ext_2.12;3.5.3 in central
        found joda-time#joda-time;2.9.5 in central
        found org.joda#joda-convert;1.8.1 in central
        found com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.2.2 in central
        found net.sf.trove4j#trove4j;3.0.3 in central
:: resolution report :: resolve 379ms :: artifacts dl 7ms
        :: modules in use:
        com.amazonaws#aws-java-sdk-bundle;1.11.603 from central in [default]
        com.github.universal-automata#liblevenshtein;3.0.0 from central in [default]
        com.google.code.findbugs#annotations;3.0.1 from central in [default]
        com.google.code.findbugs#jsr305;3.0.1 from central in [default]
        com.google.code.gson#gson;2.3 from central in [default]
        com.google.protobuf#protobuf-java;3.0.0-beta-3 from central in [default]
        com.google.protobuf#protobuf-java-util;3.0.0-beta-3 from central in [default]
        com.johnsnowlabs.nlp#spark-nlp_2.12;3.0.0 from central in [default]
        com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.2.2 from central in [default]
        com.navigamez#greex;1.0 from central in [default]
        com.typesafe#config;1.3.0 from central in [default]
        dk.brics.automaton#automaton;1.11-8 from central in [default]
        it.unimi.dsi#fastutil;7.0.12 from central in [default]
        joda-time#joda-time;2.9.5 from central in [default]
        net.jcip#jcip-annotations;1.0 from central in [default]
        net.sf.trove4j#trove4j;3.0.3 from central in [default]
        org.joda#joda-convert;1.8.1 from central in [default]
        org.json4s#json4s-ext_2.12;3.5.3 from central in [default]
        org.projectlombok#lombok;1.16.8 from central in [default]
        org.rocksdb#rocksdbjni;6.5.3 from central in [default]
        org.slf4j#slf4j-api;1.7.21 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   21  |   0   |   0   |   0   ||   21  |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-ecd584d7-281f-46f5-815d-a55d3c68e553
        confs: [default]
        0 artifacts copied, 21 already retrieved (0kB/8ms)
21/03/31 09:21:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
3.0.0
3.1.1
analyze_sentiment_ml download started this may take some time.
Traceback (most recent call last):
  File "/home/shayan/python_projects/Spark_NLP_tutorials/spark_nlp_english_tutorials.py", line 10, in <module>
    pipeline = PretrainedPipeline('analyze_sentiment_ml','en')
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/pretrained.py", line 91, in __init__
    self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/pretrained.py", line 51, in downloadPipeline
    file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/internal.py", line 192, in __init__
    "com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize", name, language, remote_loc)
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/internal.py", line 129, in __init__
    self._java_obj = self.new_java_obj(java_obj, *args)
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/internal.py", line 139, in new_java_obj
    return self._new_java_obj(java_class, *args)
  File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 66, in _new_java_obj
    return java_obj(*java_args)
  File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: K0R3GX6B0518PB6E; S3 Extended Request ID: UOhFR5Foso3Vh7O/Bm3o0AYNxIqMPSL5CuWBGWJG3Bfj9dwcmfS3qICUUBxCpGLqigvWLEiHR2Y=), S3 Extended Request ID: UOhFR5Foso3Vh7O/Bm3o0AYNxIqMPSL5CuWBGWJG3Bfj9dwcmfS3qICUUBxCpGLqigvWLEiHR2Y=
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1320)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1294)
        at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.$anonfun$getDownloadSize$1(S3ResourceDownloader.scala:164)
        at scala.Option.flatMap(Option.scala:271)
        at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:161)
        at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:401)
        at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.getDownloadSize(ResourceDownloader.scala:501)
        at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize(ResourceDownloader.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

Your Environment

Spark NLP version sparknlp.version(): 3.0.0
Apache NLP version spark.version: 3.1.1
Java version java -version: 1.8.0_281
Setup and installation (Pypi, Conda, Maven, etc.): Conda
Operating System and version: Ubuntu 16.04
Link to your project (if any):

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

maziyarpanahicommented, Apr 6, 2021

I would always suggest this directory, it’s up to date and has all the required information for getting started:

https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/Certification_Trainings/Public

PS: the info being printed happens on every download unless it’s being loaded via .load instead of .pretrained for info. It used to be ` log, but then with log, you have to set it to INFO to see (which most users want to see) but then they get thousands of INFO logs from Spark, so we made it a simple print.

1reaction

shayanalibhatticommented, Mar 31, 2021

Thanks a lot for your quick response. This pipeline that you mentioned seems to be written in haste i guess. Someone needs to edit it. This is how it looks like:

from sparknlp.pretrained import PretrainedPipelinein pipeline = PretrainedPipeline(‘analyze_sentiment’, lang = ‘en’) annotations = pipeline.fullAnnotate(""Hello from John Snow Labs ! “”)[0] annotations.keys()

In first line Pipelinein ? Then double inverted commas in .fullAnnotate function. It wouldnt work out of the box. Same goes for all the snippets on that link. SparkSession needs to start too. Just something that would help newbies like me if you can add there. Anyways, the following worked though:

from sparknlp.pretrained import PretrainedPipeline
import sparknlp
spark = sparknlp.start()
pipeline = PretrainedPipeline('analyze_sentiment', lang = 'en')
annotations =  pipeline.fullAnnotate("Harry Potter is a great movie ")[0]
print(annotations['sentiment'])

Thanks a lot for your help