Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

java.lang.IllegalArgumentException: requirement failed: Storage not found under given ref: EMBEDDINGS_glove_100d

See original GitHub issue

I am trying to run spark nlp on cluster, I didn’t have this problem with BertEmbeddings, but WordEmbeddingsModel. I guess the point dose at the environment of spark cluster, but I didn’t know how to fix this.

Description

the error: This usually means:

You have not loaded any storage under such ref or one of your Storage based annotators has includeStorage set to false and must be loaded manually
You are trying to use cluster mode without a proper shared filesystem.
source was not provided to Storage creation
If you are trying to utilize Storage defined elsewhere, make sure it has the appropriate ref. at scala.Predef$.require(Predef.scala:224) at com.johnsnowlabs.storage.RocksDBConnection.findLocalIndex(RocksDBConnection.scala:36) at com.johnsnowlabs.storage.RocksDBConnection.connectReadOnly(RocksDBConnection.scala:63) at com.johnsnowlabs.storage.HasConnection$class.$init$(HasConnection.scala:10) the code: def start(gpu=False): builder = SparkSession.builder
.appName(“Spark NLP”)
if gpu: builder.config(“spark.jars.packages”, “com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.7.2”) else: builder.config(“spark.jars.packages”, “com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.2”) return builder.getOrCreate()

spark = start(gpu=False) glove = WordEmbeddingsModel().pretrained()
.setInputCols([“sentence”,‘token’])
.setOutputCol(“glove”)
.setCaseSensitive(False)

test_data = CoNLL().readDataset(spark, ‘hdfs://master:9000/root/sparknlp/eng.testa’) test_data = glove.transform(test_data) test_data.show()

spark-defaults.conf: spark.master spark://10.168.2.219:7077 spark.driver.host 10.168.2.219

spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory 50g spark.executor.memory 30g spark.executor.cores 6 spark.cores.max 32 spark.local.dir /nfs/cache/tmp spark.kryoserializer.buffer.max 1000M spark.jars.packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.2

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context

Your Environment

Spark NLP version:2.7.2
Apache NLP version: NA
Java version (java -version):java 8
Setup and installation (Pypi, Conda, Maven, etc.): pip
Operating System and version: ubuntu 20.04
Link to your project (if any):
Using Jupter Notebook to start spark application

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

leeivancommented, Mar 9, 2021

@tawheedmanzoor , yes, I resolved this problem, I set up HADOOP_CONF_DIR in the spark-env.sh. HADOOP_CONF_DIR point to hadoop configuration directory.

0reactions

tawheedmanzoorcommented, Mar 6, 2021

@leeivan, were you able to solve this issue? I am trying to use a standalone spark as well, and I am getting the exact same error.

Top Results From Across the Web

java.lang.IllegalArgumentException: requirement failed: No ...

My data is in cassandra table, the code read the column, "timecol", that have date in format "2015-08-21 04:01:00+0000". e.g the line diacv...

requirement failed: length (-1) cannot be negative" from spark ...

+Below is the stack trace.+ Caused by: java.lang.IllegalArgumentException: requirement failed: length (-1) cannot be negative at scala.

Annotators - Spark NLP

This annotator takes a sequence of strings (e.g. the output of a Tokenizer, Normalizer, Lemmatizer, and Stemmer) and drops all the stop words ......

locationtech/rasterframes - Gitter

1 on spark 3.1. Both initially failed with "modules not found", adding --repositories https://jitpack.io had no effect, although 0.10.1 eventually started ...

Need instructions to setup WASB as storage for HDP on Azure ...

Solved: Some customers opt to set up HDP on Azure IaaS instead of HDInsight. ... has the following requirements when working with blob...