question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

java.lang.IllegalArgumentException: requirement failed: Storage not found under given ref: EMBEDDINGS_glove_100d

See original GitHub issue

I am trying to run spark nlp on cluster, I didn’t have this problem with BertEmbeddings, but WordEmbeddingsModel. I guess the point dose at the environment of spark cluster, but I didn’t know how to fix this.

Description

the error: This usually means:

  1. You have not loaded any storage under such ref or one of your Storage based annotators has includeStorage set to false and must be loaded manually
  2. You are trying to use cluster mode without a proper shared filesystem.
  3. source was not provided to Storage creation
  4. If you are trying to utilize Storage defined elsewhere, make sure it has the appropriate ref. at scala.Predef$.require(Predef.scala:224) at com.johnsnowlabs.storage.RocksDBConnection.findLocalIndex(RocksDBConnection.scala:36) at com.johnsnowlabs.storage.RocksDBConnection.connectReadOnly(RocksDBConnection.scala:63) at com.johnsnowlabs.storage.HasConnection$class.$init$(HasConnection.scala:10) the code: def start(gpu=False): builder = SparkSession.builder
    .appName(“Spark NLP”)
    if gpu: builder.config(“spark.jars.packages”, “com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.7.2”) else: builder.config(“spark.jars.packages”, “com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.2”) return builder.getOrCreate()

spark = start(gpu=False) glove = WordEmbeddingsModel().pretrained()
.setInputCols([“sentence”,‘token’])
.setOutputCol(“glove”)
.setCaseSensitive(False)

test_data = CoNLL().readDataset(spark, ‘hdfs://master:9000/root/sparknlp/eng.testa’) test_data = glove.transform(test_data) test_data.show()

spark-defaults.conf: spark.master spark://10.168.2.219:7077 spark.driver.host 10.168.2.219

spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory 50g spark.executor.memory 30g spark.executor.cores 6 spark.cores.max 32 spark.local.dir /nfs/cache/tmp spark.kryoserializer.buffer.max 1000M spark.jars.packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.2

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context

Your Environment

  • Spark NLP version:2.7.2
  • Apache NLP version: NA
  • Java version (java -version):java 8
  • Setup and installation (Pypi, Conda, Maven, etc.): pip
  • Operating System and version: ubuntu 20.04
  • Link to your project (if any):
  • Using Jupter Notebook to start spark application

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
leeivancommented, Mar 9, 2021

@tawheedmanzoor , yes, I resolved this problem, I set up HADOOP_CONF_DIR in the spark-env.sh. HADOOP_CONF_DIR point to hadoop configuration directory.

0reactions
tawheedmanzoorcommented, Mar 6, 2021

@leeivan, were you able to solve this issue? I am trying to use a standalone spark as well, and I am getting the exact same error.

Read more comments on GitHub >

github_iconTop Results From Across the Web

java.lang.IllegalArgumentException: requirement failed: No ...
My data is in cassandra table, the code read the column, "timecol", that have date in format "2015-08-21 04:01:00+0000". e.g the line diacv...
Read more >
requirement failed: length (-1) cannot be negative" from spark ...
+Below is the stack trace.+ Caused by: java.lang.IllegalArgumentException: requirement failed: length (-1) cannot be negative at scala.
Read more >
Annotators - Spark NLP
This annotator takes a sequence of strings (e.g. the output of a Tokenizer, Normalizer, Lemmatizer, and Stemmer) and drops all the stop words ......
Read more >
locationtech/rasterframes - Gitter
1 on spark 3.1. Both initially failed with "modules not found", adding --repositories https://jitpack.io had no effect, although 0.10.1 eventually started ...
Read more >
Need instructions to setup WASB as storage for HDP on Azure ...
Solved: Some customers opt to set up HDP on Azure IaaS instead of HDInsight. ... has the following requirements when working with blob...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found