Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pretrained Model not working due to missing Amazon dependencies

See original GitHub issue

Description

When running a pretrained model such as shown in the documentation, an error occurs due to com.amazonaws.services.s3 dependencies missing.

I’ve been trying to use this lib in a Spark Streaming job, but after trying to debug the dependency issue, I try running the simple exemple from the documention, and I’m getting this error using spark 1.6 or spark 2.2

scala> import spark.implicits._
import spark.implicits._

scala> import com.johnsnowlabs.nlp.pretrained.pipelines.en.BasicPipeline
import com.johnsnowlabs.nlp.pretrained.pipelines.en.BasicPipeline

scala> val data = Seq("hello, this is an example sentence").toDF("mainColumn")
data: org.apache.spark.sql.DataFrame = [mainColumn: string]

scala> BasicPipeline().annotate(data, "mainColumn").show()
java.lang.NoSuchMethodError: com.amazonaws.services.s3.S3ClientOptions.builder()Lcom/amazonaws/services/s3/S3ClientOptions$Builder;
  at com.amazonaws.services.s3.AmazonS3Builder.resolveS3ClientOptions(AmazonS3Builder.java:404)
  at com.amazonaws.services.s3.AmazonS3ClientBuilder.build(AmazonS3ClientBuilder.java:64)
  at com.amazonaws.services.s3.AmazonS3ClientBuilder.build(AmazonS3ClientBuilder.java:28)
  at com.amazonaws.client.builder.AwsSyncClientBuilder.build(AwsSyncClientBuilder.java:46)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.client$lzycompute(S3ResourceDownloader.scala:47)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.client(S3ResourceDownloader.scala:35)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.downloadMetadataIfNeed(S3ResourceDownloader.scala:62)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.resolveLink(S3ResourceDownloader.scala:73)
  at com.johnsnowlabs.nlp.pretrained.S3ResourceDownloader.download(S3ResourceDownloader.scala:85)
  at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadResource(ResourceDownloader.scala:91)
  at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:123)
  at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:118)
  at com.johnsnowlabs.nlp.pretrained.pipelines.PretrainedPipeline.modelCache$lzycompute(PretrainedPipeline.scala:11)
  at com.johnsnowlabs.nlp.pretrained.pipelines.PretrainedPipeline.modelCache(PretrainedPipeline.scala:10)
  at com.johnsnowlabs.nlp.pretrained.pipelines.PretrainedPipeline.annotate(PretrainedPipeline.scala:14)
  ... 50 elided

Steps to Reproduce

run a spark-shell spark-shell --packages JohnSnowLabs:spark-nlp:1.5.3
run documentation exemple:

import spark.implicits._
import com.johnsnowlabs.nlp.pretrained.pipelines.en.BasicPipeline
val data = Seq("hello, this is an example sentence").toDF("mainColumn")
BasicPipeline().annotate(data, "mainColumn").show()

Your Environment

Running on a CDH Cluster (EDH 5.11, Cloudera Certified) with Spark 1.6.1 & Spark 2.2.0

Issue Analytics

State:
Created 5 years ago
Comments:15 (7 by maintainers)

Top GitHub Comments

1reaction

saif-ellaficommented, Jul 20, 2018

Released.

@sshikov there is no way to do that now. But once you download them once through the API, you may have access to cache folder and may be able to read the models offline directly pointing to the path.

Cache folder by default is located in $HOME/cache_pretrained

1reaction

VincentRomacommented, Jun 1, 2018

Hi @saifjsl, thanks for looking this up. OVH is an european IAAS provider (www.ovh.com) that why @GreGGus having the same issue would demonstrate that’s it’s not linked to Cloudera maybe…

Top Results From Across the Web

Use Amazon SageMaker Built-in Algorithms or Pre-trained ...

Here a few examples out of the 15 problem types that can be addressed by the pre-trained models and pre-built solution templates provided...

Install External Libraries and Kernels in Notebook Instances

Install custom environments and kernels on the notebook instance's Amazon EBS volume. This ensures that they persist when you stop and restart the...

Troubleshooting Amazon SageMaker Studio

The following are common errors that you might run into when using Amazon SageMaker Studio. Each error is followed by a solution to...

Step 4: Train a Model - Amazon SageMaker

This tutorial uses the XGBoost built-in algorithm for the SageMaker generic estimator. To run a model training job. Import the Amazon SageMaker Python...

Machine Learning - Amazon SageMaker FAQs

Amazon SageMaker does not use or share customer models, training data, ... since the dependencies needed to run a notebook are automatically tracked...