AccessDenied errors loading healthcare models
See original GitHub issueWhen creating annotators from pretrained models in versions 2.7.4 and 2.7.5 we are getting AccessDenied errors when running Spark locally (master = “local”). Version 2.7.3 works fine and running 2.7.4/2.7.5 on AWS EMR cluster also works.
We are specifying the AWS credentials in the .aws/credentials
file using [spark_nlp]
profile.
Current Behavior
We get the following error when trying to load a pretrained model from the clinical/models
location.
Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 7JGS2EJM96C6SHGG, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: KgsQ87K74+KaiKhc7arCbcuVtc8POle+iUPrzaJ2XG5ECu37HiN5VDNzqKBs4DunvOG1yqEJlwg=
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1111)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:984)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:69)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:81)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:159)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:403)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.getDownloadSize(ResourceDownloader.scala:501)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize(ResourceDownloader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Steps to Reproduce
import sparknlp
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("Spark NLP")\
.master("local[4]")\
.config("spark.driver.memory","16G")\
.config("spark.driver.maxResultSize", "0") \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.5")\
.config("spark.kryoserializer.buffer.max", "1000M")\
.getOrCreate()
from sparknlp.annotator import *
sent_detect = SentenceDetectorDLModel() \
.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
.setInputCols(["document"]) \
.setOutputCol("sentences")
Context
Your Environment
- Spark NLP version
sparknlp.version()
: 2.7.4 - Apache NLP version
spark.version
: 2.4.5 - Java version
java -version
: openjdk version “1.8.0_282”
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Resolve "Access Denied" errors when running Athena queries
When I run an Amazon Athena query, I get an "Access Denied" error. ... Your query has the following errors:com.amazonaws.services.s3.model.
Read more >Access denied in SharePoint SitePages library
I am trying to create modern SharePoint pages programmatically in SitePages libraries on groups-based sites.
Read more >Troubleshoot Dataflow errors - Google Cloud
If you run into problems with your Dataflow pipeline or job, this page lists error messages that you might see and provides suggestions...
Read more >Error occurs when attempting to save files in Workbench ...
Access Denied Error Message. I reinstalled 18.2 to try to resolve the issue and also tried installing 18.1. I receive the same error...
Read more >Top 8 troubleshooting steps for PDM connectivity issues
This will give you a chronological list of errors and warnings, ... you a similar message to “SQL Server does not exist or...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I tried the new 3.0.0 version that was released today and this issue seems to be fixed in that version.
Version 2.7.3 and earlier of the Spark NLP jar works with AWS credentials stored in the .aws/credentials file. The problem with specifying them in environment variables is that it only allows one set of credentials to be in effect at a time. We can’t do that since we need our own credentials to access our AWS resources. As far as I know the only way to have different credentials setup is to use profiles which is what the .aws/credentials file provides. I suspect the problem with 2.7.4 and 2.7.5 has to do with reverting to a really old version of the AWS SDK (1.7.4). Version 2.7.3 uses 1.11.603. You were using 1.11.603 at least as far back as 2.5.0.