question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark NLP wont download analyze_sentiment model from AWS S3, keeps giving forbidden Error

See original GitHub issue

Here is the code I am trying to run on my local conda environment. It keeps giving error. Please tell the solution: ` import sparknlp

from pyspark.sql import SparkSession

spark = sparknlp.start() print(sparknlp.version()) print(spark.version) from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline(‘analyze_sentiment_ml’,‘en’)

result = pipeline.annotate(‘Harry Potter is a bad movie’) print(result[‘sentiment’]) `

Description

Expected Behavior

It should tell the sentiment of the sentence but I get the error:

Current Behavior

21/03/31 09:21:58 WARN Utils: Your hostname, shayan-GP73-Leopard-8RE resolves to a loopback address: 127.0.1.1; using 10.0.0.164 instead (on interface wlo1)
21/03/31 09:21:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/shayan/.ivy2/cache
The jars for the packages stored in: /home/shayan/.ivy2/jars
com.johnsnowlabs.nlp#spark-nlp_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-ecd584d7-281f-46f5-815d-a55d3c68e553;1.0
        confs: [default]
        found com.johnsnowlabs.nlp#spark-nlp_2.12;3.0.0 in central
        found com.typesafe#config;1.3.0 in central
        found org.rocksdb#rocksdbjni;6.5.3 in central
        found com.amazonaws#aws-java-sdk-bundle;1.11.603 in central
        found com.github.universal-automata#liblevenshtein;3.0.0 in central
        found com.google.code.findbugs#annotations;3.0.1 in central
        found net.jcip#jcip-annotations;1.0 in central
        found com.google.code.findbugs#jsr305;3.0.1 in central
        found com.google.protobuf#protobuf-java-util;3.0.0-beta-3 in central
        found com.google.protobuf#protobuf-java;3.0.0-beta-3 in central
        found com.google.code.gson#gson;2.3 in central
        found it.unimi.dsi#fastutil;7.0.12 in central
        found org.projectlombok#lombok;1.16.8 in central
        found org.slf4j#slf4j-api;1.7.21 in central
        found com.navigamez#greex;1.0 in central
        found dk.brics.automaton#automaton;1.11-8 in central
        found org.json4s#json4s-ext_2.12;3.5.3 in central
        found joda-time#joda-time;2.9.5 in central
        found org.joda#joda-convert;1.8.1 in central
        found com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.2.2 in central
        found net.sf.trove4j#trove4j;3.0.3 in central
:: resolution report :: resolve 379ms :: artifacts dl 7ms
        :: modules in use:
        com.amazonaws#aws-java-sdk-bundle;1.11.603 from central in [default]
        com.github.universal-automata#liblevenshtein;3.0.0 from central in [default]
        com.google.code.findbugs#annotations;3.0.1 from central in [default]
        com.google.code.findbugs#jsr305;3.0.1 from central in [default]
        com.google.code.gson#gson;2.3 from central in [default]
        com.google.protobuf#protobuf-java;3.0.0-beta-3 from central in [default]
        com.google.protobuf#protobuf-java-util;3.0.0-beta-3 from central in [default]
        com.johnsnowlabs.nlp#spark-nlp_2.12;3.0.0 from central in [default]
        com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.2.2 from central in [default]
        com.navigamez#greex;1.0 from central in [default]
        com.typesafe#config;1.3.0 from central in [default]
        dk.brics.automaton#automaton;1.11-8 from central in [default]
        it.unimi.dsi#fastutil;7.0.12 from central in [default]
        joda-time#joda-time;2.9.5 from central in [default]
        net.jcip#jcip-annotations;1.0 from central in [default]
        net.sf.trove4j#trove4j;3.0.3 from central in [default]
        org.joda#joda-convert;1.8.1 from central in [default]
        org.json4s#json4s-ext_2.12;3.5.3 from central in [default]
        org.projectlombok#lombok;1.16.8 from central in [default]
        org.rocksdb#rocksdbjni;6.5.3 from central in [default]
        org.slf4j#slf4j-api;1.7.21 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   21  |   0   |   0   |   0   ||   21  |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-ecd584d7-281f-46f5-815d-a55d3c68e553
        confs: [default]
        0 artifacts copied, 21 already retrieved (0kB/8ms)
21/03/31 09:21:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
3.0.0
3.1.1
analyze_sentiment_ml download started this may take some time.
Traceback (most recent call last):
  File "/home/shayan/python_projects/Spark_NLP_tutorials/spark_nlp_english_tutorials.py", line 10, in <module>
    pipeline = PretrainedPipeline('analyze_sentiment_ml','en')
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/pretrained.py", line 91, in __init__
    self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/pretrained.py", line 51, in downloadPipeline
    file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/internal.py", line 192, in __init__
    "com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize", name, language, remote_loc)
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/internal.py", line 129, in __init__
    self._java_obj = self.new_java_obj(java_obj, *args)
  File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/internal.py", line 139, in new_java_obj
    return self._new_java_obj(java_class, *args)
  File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 66, in _new_java_obj
    return java_obj(*java_args)
  File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: K0R3GX6B0518PB6E; S3 Extended Request ID: UOhFR5Foso3Vh7O/Bm3o0AYNxIqMPSL5CuWBGWJG3Bfj9dwcmfS3qICUUBxCpGLqigvWLEiHR2Y=), S3 Extended Request ID: UOhFR5Foso3Vh7O/Bm3o0AYNxIqMPSL5CuWBGWJG3Bfj9dwcmfS3qICUUBxCpGLqigvWLEiHR2Y=
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1320)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1294)
        at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.$anonfun$getDownloadSize$1(S3ResourceDownloader.scala:164)
        at scala.Option.flatMap(Option.scala:271)
        at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:161)
        at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:401)
        at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.getDownloadSize(ResourceDownloader.scala:501)
        at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize(ResourceDownloader.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

Your Environment

  • Spark NLP version sparknlp.version(): 3.0.0
  • Apache NLP version spark.version: 3.1.1
  • Java version java -version: 1.8.0_281
  • Setup and installation (Pypi, Conda, Maven, etc.): Conda
  • Operating System and version: Ubuntu 16.04
  • Link to your project (if any):

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
maziyarpanahicommented, Apr 6, 2021

I would always suggest this directory, it’s up to date and has all the required information for getting started:

https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/Certification_Trainings/Public

PS: the info being printed happens on every download unless it’s being loaded via .load instead of .pretrained for info. It used to be ` log, but then with log, you have to set it to INFO to see (which most users want to see) but then they get thousands of INFO logs from Spark, so we made it a simple print.

1reaction
shayanalibhatticommented, Mar 31, 2021

Thanks a lot for your quick response. This pipeline that you mentioned seems to be written in haste i guess. Someone needs to edit it. This is how it looks like:

from sparknlp.pretrained import PretrainedPipelinein pipeline = PretrainedPipeline(‘analyze_sentiment’, lang = ‘en’) annotations = pipeline.fullAnnotate(""Hello from John Snow Labs ! “”)[0] annotations.keys()

In first line Pipelinein ? Then double inverted commas in .fullAnnotate function. It wouldnt work out of the box. Same goes for all the snippets on that link. SparkSession needs to start too. Just something that would help newbies like me if you can add there. Anyways, the following worked though:

from sparknlp.pretrained import PretrainedPipeline
import sparknlp
spark = sparknlp.start()
pipeline = PretrainedPipeline('analyze_sentiment', lang = 'en')
annotations =  pipeline.fullAnnotate("Harry Potter is a great movie ")[0]
print(annotations['sentiment'])

Thanks a lot for your help

Read more comments on GitHub >

github_iconTop Results From Across the Web

AWS 403 Access denied error in getting download size for ...
On databricks job cluster, we are trying to generate embeddings for some Recommendations content using Spark NLP like this glove ...
Read more >
Spark streaming connection to S3 gives Forbidden error
I am able to access S3 from the terminal though (using the AWS profile) so I'm not sure why it doesn't work when...
Read more >
Data Science on AWS - DOKUMEN.PUB
(Amazon S3) and stored as CSV, Apache Parquet, or the equivalent. We can start training models quickly using the Amazon AI or automated...
Read more >
(PDF) Sentiment Analysis for E-Commerce Product Reviews ...
This paper proposes a new sentiment analysis model-SLCABG, which is based on the sentiment lexicon and combines Convolutional Neural Network ...
Read more >
Data Science and Analytics - Springer Link
data to build recommender systems and predictive models. The 5th International Conference on Recent Developments in Science, Engineering.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found