java.lang.IllegalArgumentException: requirement failed: Storage not found under given ref: EMBEDDINGS_glove_100d
See original GitHub issueI am trying to run spark nlp on cluster, I didn’t have this problem with BertEmbeddings, but WordEmbeddingsModel. I guess the point dose at the environment of spark cluster, but I didn’t know how to fix this.
Description
the error: This usually means:
- You have not loaded any storage under such ref or one of your Storage based annotators has
includeStorage
set to false and must be loaded manually - You are trying to use cluster mode without a proper shared filesystem.
- source was not provided to Storage creation
- If you are trying to utilize Storage defined elsewhere, make sure it has the appropriate ref.
at scala.Predef$.require(Predef.scala:224)
at com.johnsnowlabs.storage.RocksDBConnection.findLocalIndex(RocksDBConnection.scala:36)
at com.johnsnowlabs.storage.RocksDBConnection.connectReadOnly(RocksDBConnection.scala:63)
at com.johnsnowlabs.storage.HasConnection$class.$init$(HasConnection.scala:10)
the code:
def start(gpu=False):
builder = SparkSession.builder
.appName(“Spark NLP”)
if gpu: builder.config(“spark.jars.packages”, “com.johnsnowlabs.nlp:spark-nlp-gpu_2.11:2.7.2”) else: builder.config(“spark.jars.packages”, “com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.2”) return builder.getOrCreate()
spark = start(gpu=False)
glove = WordEmbeddingsModel().pretrained()
.setInputCols([“sentence”,‘token’])
.setOutputCol(“glove”)
.setCaseSensitive(False)
test_data = CoNLL().readDataset(spark, ‘hdfs://master:9000/root/sparknlp/eng.testa’) test_data = glove.transform(test_data) test_data.show()
spark-defaults.conf: spark.master spark://10.168.2.219:7077 spark.driver.host 10.168.2.219
spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory 50g spark.executor.memory 30g spark.executor.cores 6 spark.cores.max 32 spark.local.dir /nfs/cache/tmp spark.kryoserializer.buffer.max 1000M spark.jars.packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.7.2
Expected Behavior
Current Behavior
Possible Solution
Steps to Reproduce
Context
Your Environment
- Spark NLP version:2.7.2
- Apache NLP version: NA
- Java version (java -version):java 8
- Setup and installation (Pypi, Conda, Maven, etc.): pip
- Operating System and version: ubuntu 20.04
- Link to your project (if any):
- Using Jupter Notebook to start spark application
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
@tawheedmanzoor , yes, I resolved this problem, I set up HADOOP_CONF_DIR in the spark-env.sh. HADOOP_CONF_DIR point to hadoop configuration directory.
@leeivan, were you able to solve this issue? I am trying to use a standalone spark as well, and I am getting the exact same error.