Pretrained Model not working due to missing Amazon dependencies
See original GitHub issueDescription
When running a pretrained model such as shown in the documentation, an error occurs due to com.amazonaws.services.s3 dependencies missing.
I’ve been trying to use this lib in a Spark Streaming job, but after trying to debug the dependency issue, I try running the simple exemple from the documention, and I’m getting this error using spark 1.6 or spark 2.2
scala> import spark.implicits._
import spark.implicits._
scala> import com.johnsnowlabs.nlp.pretrained.pipelines.en.BasicPipeline
import com.johnsnowlabs.nlp.pretrained.pipelines.en.BasicPipeline
scala> val data = Seq("hello, this is an example sentence").toDF("mainColumn")
data: org.apache.spark.sql.DataFrame = [mainColumn: string]
scala> BasicPipeline().annotate(data, "mainColumn").show()
java.lang.NoSuchMethodError: com.amazonaws.services.s3.S3ClientOptions.builder()Lcom/amazonaws/services/s3/S3ClientOptions$Builder;
at com.amazonaws.services.s3.AmazonS3Builder.resolveS3ClientOptions(AmazonS3Builder.java:404)
at com.amazonaws.services.s3.AmazonS3ClientBuilder.build(AmazonS3ClientBuilder.java:64)
at com.amazonaws.services.s3.AmazonS3ClientBuilder.build(AmazonS3ClientBuilder.java:28)
at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.client$lzycompute(S3ResourceDownloader.scala:47)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.client(S3ResourceDownloader.scala:35)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:62)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:73)
at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.download(S3ResourceDownloader.scala:85)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadResource(ResourceDownloader.scala:91)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:123)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:118)
at com.johnsnowlabs.nlp.pretrained.pipelines.PretrainedPipeline.modelCache$lzycompute(PretrainedPipeline.scala:11)
at com.johnsnowlabs.nlp.pretrained.pipelines.PretrainedPipeline.modelCache(PretrainedPipeline.scala:10)
at com.johnsnowlabs.nlp.pretrained.pipelines.PretrainedPipeline.annotate(PretrainedPipeline.scala:14)
... 50 elided
Steps to Reproduce
- run a spark-shell
spark-shell --packages JohnSnowLabs:spark-nlp:1.5.3 - run documentation exemple:
import spark.implicits._
import com.johnsnowlabs.nlp.pretrained.pipelines.en.BasicPipeline
val data = Seq("hello, this is an example sentence").toDF("mainColumn")
BasicPipeline().annotate(data, "mainColumn").show()
Your Environment
Running on a CDH Cluster (EDH 5.11, Cloudera Certified) with Spark 1.6.1 & Spark 2.2.0
Issue Analytics
- State:
- Created 5 years ago
- Comments:15 (7 by maintainers)
Top Results From Across the Web
Use Amazon SageMaker Built-in Algorithms or Pre-trained ...
Here a few examples out of the 15 problem types that can be addressed by the pre-trained models and pre-built solution templates provided...
Read more >Install External Libraries and Kernels in Notebook Instances
Install custom environments and kernels on the notebook instance's Amazon EBS volume. This ensures that they persist when you stop and restart the...
Read more >Troubleshooting Amazon SageMaker Studio
The following are common errors that you might run into when using Amazon SageMaker Studio. Each error is followed by a solution to...
Read more >Step 4: Train a Model - Amazon SageMaker
This tutorial uses the XGBoost built-in algorithm for the SageMaker generic estimator. To run a model training job. Import the Amazon SageMaker Python...
Read more >Machine Learning - Amazon SageMaker FAQs
Amazon SageMaker does not use or share customer models, training data, ... since the dependencies needed to run a notebook are automatically tracked...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Released.
@sshikov there is no way to do that now. But once you download them once through the API, you may have access to cache folder and may be able to read the models offline directly pointing to the path.
Cache folder by default is located in $HOME/cache_pretrained
Hi @saifjsl, thanks for looking this up. OVH is an european IAAS provider (www.ovh.com) that why @GreGGus having the same issue would demonstrate that’s it’s not linked to Cloudera maybe…