question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pretrained Model not working due to missing Amazon dependencies

See original GitHub issue

Description

When running a pretrained model such as shown in the documentation, an error occurs due to com.amazonaws.services.s3 dependencies missing.

I’ve been trying to use this lib in a Spark Streaming job, but after trying to debug the dependency issue, I try running the simple exemple from the documention, and I’m getting this error using spark 1.6 or spark 2.2

scala> import spark.implicits._
import spark.implicits._

scala> import com.johnsnowlabs.nlp.pretrained.pipelines.en.BasicPipeline
import com.johnsnowlabs.nlp.pretrained.pipelines.en.BasicPipeline

scala> val data = Seq("hello, this is an example sentence").toDF("mainColumn")
data: org.apache.spark.sql.DataFrame = [mainColumn: string]

scala> BasicPipeline().annotate(data, "mainColumn").show()
java.lang.NoSuchMethodError: com.amazonaws.services.s3.S3ClientOptions.builder()Lcom/amazonaws/services/s3/S3ClientOptions$Builder;
  at com.amazonaws.services.s3.AmazonS3Builder.resolveS3ClientOptions(AmazonS3Builder.java:404)
  at com.amazonaws.services.s3.AmazonS3ClientBuilder.build(AmazonS3ClientBuilder.java:64)
  at com.amazonaws.services.s3.AmazonS3ClientBuilder.build(AmazonS3ClientBuilder.java:28)
  at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.client$lzycompute(S3ResourceDownloader.scala:47)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.client(S3ResourceDownloader.scala:35)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:62)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:73)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.download(S3ResourceDownloader.scala:85)
  at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadResource(ResourceDownloader.scala:91)
  at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:123)
  at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:118)
  at com.johnsnowlabs.nlp.pretrained.pipelines.PretrainedPipeline.modelCache$lzycompute(PretrainedPipeline.scala:11)
  at com.johnsnowlabs.nlp.pretrained.pipelines.PretrainedPipeline.modelCache(PretrainedPipeline.scala:10)
  at com.johnsnowlabs.nlp.pretrained.pipelines.PretrainedPipeline.annotate(PretrainedPipeline.scala:14)
  ... 50 elided

Steps to Reproduce

  1. run a spark-shell spark-shell --packages JohnSnowLabs:spark-nlp:1.5.3
  2. run documentation exemple:
import spark.implicits._
import com.johnsnowlabs.nlp.pretrained.pipelines.en.BasicPipeline
val data = Seq("hello, this is an example sentence").toDF("mainColumn")
BasicPipeline().annotate(data, "mainColumn").show()

Your Environment

Running on a CDH Cluster (EDH 5.11, Cloudera Certified) with Spark 1.6.1 & Spark 2.2.0

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
saif-ellaficommented, Jul 20, 2018

Released.

@sshikov there is no way to do that now. But once you download them once through the API, you may have access to cache folder and may be able to read the models offline directly pointing to the path.

Cache folder by default is located in $HOME/cache_pretrained

1reaction
VincentRomacommented, Jun 1, 2018

Hi @saifjsl, thanks for looking this up. OVH is an european IAAS provider (www.ovh.com) that why @GreGGus having the same issue would demonstrate that’s it’s not linked to Cloudera maybe…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use Amazon SageMaker Built-in Algorithms or Pre-trained ...
Here a few examples out of the 15 problem types that can be addressed by the pre-trained models and pre-built solution templates provided...
Read more >
Install External Libraries and Kernels in Notebook Instances
Install custom environments and kernels on the notebook instance's Amazon EBS volume. This ensures that they persist when you stop and restart the...
Read more >
Troubleshooting Amazon SageMaker Studio
The following are common errors that you might run into when using Amazon SageMaker Studio. Each error is followed by a solution to...
Read more >
Step 4: Train a Model - Amazon SageMaker
This tutorial uses the XGBoost built-in algorithm for the SageMaker generic estimator. To run a model training job. Import the Amazon SageMaker Python...
Read more >
Machine Learning - Amazon SageMaker FAQs
Amazon SageMaker does not use or share customer models, training data, ... since the dependencies needed to run a notebook are automatically tracked...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found