question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AccessDenied errors loading healthcare models

See original GitHub issue

When creating annotators from pretrained models in versions 2.7.4 and 2.7.5 we are getting AccessDenied errors when running Spark locally (master = “local”). Version 2.7.3 works fine and running 2.7.4/2.7.5 on AWS EMR cluster also works.

We are specifying the AWS credentials in the .aws/credentials file using [spark_nlp] profile.

Current Behavior

We get the following error when trying to load a pretrained model from the clinical/models location.

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 7JGS2EJM96C6SHGG, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: KgsQ87K74+KaiKhc7arCbcuVtc8POle+iUPrzaJ2XG5ECu37HiN5VDNzqKBs4DunvOG1yqEJlwg=
	at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1111)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:984)
	at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:69)
	at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:81)
	at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.getDownloadSize(S3ResourceDownloader.scala:159)
	at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.getDownloadSize(ResourceDownloader.scala:403)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.getDownloadSize(ResourceDownloader.scala:501)
	at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize(ResourceDownloader.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Steps to Reproduce

import sparknlp
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("Spark NLP")\
    .master("local[4]")\
    .config("spark.driver.memory","16G")\
    .config("spark.driver.maxResultSize", "0") \
    .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.5")\
    .config("spark.kryoserializer.buffer.max", "1000M")\
    .getOrCreate()

from sparknlp.annotator import *

sent_detect = SentenceDetectorDLModel() \
  .pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
  .setInputCols(["document"]) \
  .setOutputCol("sentences")

Context

Your Environment

  • Spark NLP version sparknlp.version(): 2.7.4
  • Apache NLP version spark.version: 2.4.5
  • Java version java -version: openjdk version “1.8.0_282”

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
dkincaidcommented, Mar 22, 2021

I tried the new 3.0.0 version that was released today and this issue seems to be fixed in that version.

1reaction
dkincaidcommented, Mar 10, 2021

Version 2.7.3 and earlier of the Spark NLP jar works with AWS credentials stored in the .aws/credentials file. The problem with specifying them in environment variables is that it only allows one set of credentials to be in effect at a time. We can’t do that since we need our own credentials to access our AWS resources. As far as I know the only way to have different credentials setup is to use profiles which is what the .aws/credentials file provides. I suspect the problem with 2.7.4 and 2.7.5 has to do with reverting to a really old version of the AWS SDK (1.7.4). Version 2.7.3 uses 1.11.603. You were using 1.11.603 at least as far back as 2.5.0.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resolve "Access Denied" errors when running Athena queries
When I run an Amazon Athena query, I get an "Access Denied" error. ... Your query has the following errors:com.amazonaws.services.s3.model.
Read more >
Access denied in SharePoint SitePages library
I am trying to create modern SharePoint pages programmatically in SitePages libraries on groups-based sites.
Read more >
Troubleshoot Dataflow errors - Google Cloud
If you run into problems with your Dataflow pipeline or job, this page lists error messages that you might see and provides suggestions...
Read more >
Error occurs when attempting to save files in Workbench ...
Access Denied Error Message. I reinstalled 18.2 to try to resolve the issue and also tried installing 18.1. I receive the same error...
Read more >
Top 8 troubleshooting steps for PDM connectivity issues
This will give you a chronological list of errors and warnings, ... you a similar message to “SQL Server does not exist or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found