Spark NLP wont download analyze_sentiment model from AWS S3, keeps giving forbidden Error
See original GitHub issueHere is the code I am trying to run on my local conda environment. It keeps giving error. Please tell the solution: ` import sparknlp
from pyspark.sql import SparkSession
spark = sparknlp.start() print(sparknlp.version()) print(spark.version) from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline(‘analyze_sentiment_ml’,‘en’)
result = pipeline.annotate(‘Harry Potter is a bad movie’) print(result[‘sentiment’]) `
Description
Expected Behavior
It should tell the sentiment of the sentence but I get the error:
Current Behavior
21/03/31 09:21:58 WARN Utils: Your hostname, shayan-GP73-Leopard-8RE resolves to a loopback address: 127.0.1.1; using 10.0.0.164 instead (on interface wlo1)
21/03/31 09:21:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/shayan/.ivy2/cache
The jars for the packages stored in: /home/shayan/.ivy2/jars
com.johnsnowlabs.nlp#spark-nlp_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-ecd584d7-281f-46f5-815d-a55d3c68e553;1.0
confs: [default]
found com.johnsnowlabs.nlp#spark-nlp_2.12;3.0.0 in central
found com.typesafe#config;1.3.0 in central
found org.rocksdb#rocksdbjni;6.5.3 in central
found com.amazonaws#aws-java-sdk-bundle;1.11.603 in central
found com.github.universal-automata#liblevenshtein;3.0.0 in central
found com.google.code.findbugs#annotations;3.0.1 in central
found net.jcip#jcip-annotations;1.0 in central
found com.google.code.findbugs#jsr305;3.0.1 in central
found com.google.protobuf#protobuf-java-util;3.0.0-beta-3 in central
found com.google.protobuf#protobuf-java;3.0.0-beta-3 in central
found com.google.code.gson#gson;2.3 in central
found it.unimi.dsi#fastutil;7.0.12 in central
found org.projectlombok#lombok;1.16.8 in central
found org.slf4j#slf4j-api;1.7.21 in central
found com.navigamez#greex;1.0 in central
found dk.brics.automaton#automaton;1.11-8 in central
found org.json4s#json4s-ext_2.12;3.5.3 in central
found joda-time#joda-time;2.9.5 in central
found org.joda#joda-convert;1.8.1 in central
found com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.2.2 in central
found net.sf.trove4j#trove4j;3.0.3 in central
:: resolution report :: resolve 379ms :: artifacts dl 7ms
:: modules in use:
com.amazonaws#aws-java-sdk-bundle;1.11.603 from central in [default]
com.github.universal-automata#liblevenshtein;3.0.0 from central in [default]
com.google.code.findbugs#annotations;3.0.1 from central in [default]
com.google.code.findbugs#jsr305;3.0.1 from central in [default]
com.google.code.gson#gson;2.3 from central in [default]
com.google.protobuf#protobuf-java;3.0.0-beta-3 from central in [default]
com.google.protobuf#protobuf-java-util;3.0.0-beta-3 from central in [default]
com.johnsnowlabs.nlp#spark-nlp_2.12;3.0.0 from central in [default]
com.johnsnowlabs.nlp#tensorflow-cpu_2.12;0.2.2 from central in [default]
com.navigamez#greex;1.0 from central in [default]
com.typesafe#config;1.3.0 from central in [default]
dk.brics.automaton#automaton;1.11-8 from central in [default]
it.unimi.dsi#fastutil;7.0.12 from central in [default]
joda-time#joda-time;2.9.5 from central in [default]
net.jcip#jcip-annotations;1.0 from central in [default]
net.sf.trove4j#trove4j;3.0.3 from central in [default]
org.joda#joda-convert;1.8.1 from central in [default]
org.json4s#json4s-ext_2.12;3.5.3 from central in [default]
org.projectlombok#lombok;1.16.8 from central in [default]
org.rocksdb#rocksdbjni;6.5.3 from central in [default]
org.slf4j#slf4j-api;1.7.21 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 21 | 0 | 0 | 0 || 21 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-ecd584d7-281f-46f5-815d-a55d3c68e553
confs: [default]
0 artifacts copied, 21 already retrieved (0kB/8ms)
21/03/31 09:21:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
3.0.0
3.1.1
analyze_sentiment_ml download started this may take some time.
Traceback (most recent call last):
File "/home/shayan/python_projects/Spark_NLP_tutorials/spark_nlp_english_tutorials.py", line 10, in <module>
pipeline = PretrainedPipeline('analyze_sentiment_ml','en')
File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/pretrained.py", line 91, in __init__
self.model = ResourceDownloader().downloadPipeline(name, lang, remote_loc)
File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/pretrained.py", line 51, in downloadPipeline
file_size = _internal._GetResourceSize(name, language, remote_loc).apply()
File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/internal.py", line 192, in __init__
"com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize", name, language, remote_loc)
File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/internal.py", line 129, in __init__
self._java_obj = self.new_java_obj(java_obj, *args)
File "/home/shayan/anaconda3/envs/shayan/lib/python3.6/site-packages/sparknlp/internal.py", line 139, in new_java_obj
return self._new_java_obj(java_class, *args)
File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 66, in _new_java_obj
return java_obj(*java_args)
File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/home/shayan/Downloads/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: K0R3GX6B0518PB6E; S3 Extended Request ID: UOhFR5Foso3Vh7O/Bm3o0AYNxIqMPSL5CuWBGWJG3Bfj9dwcmfS3qICUUBxCpGLqigvWLEiHR2Y=), S3 Extended Request ID: UOhFR5Foso3Vh7O/Bm3o0AYNxIqMPSL5CuWBGWJG3Bfj9dwcmfS3qICUUBxCpGLqigvWLEiHR2Y=
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4921)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4867)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1320)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1294)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.$anonfun$getDownloadSize$1(S3ResourceDownloader.scala:164)
at scala.Option.flatMap(Option.scala:271)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:161)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:401)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.getDownloadSize(ResourceDownloader.scala:501)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize(ResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Your Environment
- Spark NLP version
sparknlp.version()
: 3.0.0 - Apache NLP version
spark.version
: 3.1.1 - Java version
java -version
: 1.8.0_281 - Setup and installation (Pypi, Conda, Maven, etc.): Conda
- Operating System and version: Ubuntu 16.04
- Link to your project (if any):
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
AWS 403 Access denied error in getting download size for ...
On databricks job cluster, we are trying to generate embeddings for some Recommendations content using Spark NLP like this glove ...
Read more >Spark streaming connection to S3 gives Forbidden error
I am able to access S3 from the terminal though (using the AWS profile) so I'm not sure why it doesn't work when...
Read more >Data Science on AWS - DOKUMEN.PUB
(Amazon S3) and stored as CSV, Apache Parquet, or the equivalent. We can start training models quickly using the Amazon AI or automated...
Read more >(PDF) Sentiment Analysis for E-Commerce Product Reviews ...
This paper proposes a new sentiment analysis model-SLCABG, which is based on the sentiment lexicon and combines Convolutional Neural Network ...
Read more >Data Science and Analytics - Springer Link
data to build recommender systems and predictive models. The 5th International Conference on Recent Developments in Science, Engineering.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I would always suggest this directory, it’s up to date and has all the required information for getting started:
https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/tutorials/Certification_Trainings/Public
PS: the info being printed happens on every download unless it’s being loaded via
.load
instead of.pretrained
for info. It used to be ` log, but then with log, you have to set it to INFO to see (which most users want to see) but then they get thousands of INFO logs from Spark, so we made it a simple print.Thanks a lot for your quick response. This pipeline that you mentioned seems to be written in haste i guess. Someone needs to edit it. This is how it looks like:
from sparknlp.pretrained import PretrainedPipelinein pipeline = PretrainedPipeline(‘analyze_sentiment’, lang = ‘en’) annotations = pipeline.fullAnnotate(""Hello from John Snow Labs ! “”)[0] annotations.keys()
In first line Pipelinein ? Then double inverted commas in .fullAnnotate function. It wouldnt work out of the box. Same goes for all the snippets on that link. SparkSession needs to start too. Just something that would help newbies like me if you can add there. Anyways, the following worked though:
Thanks a lot for your help