question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UniversalSentenceEncoder Tutorial Example: org.tensorflow.exceptions.TFFailedPreconditionException: Table not initialized.

See original GitHub issue

Hi folks, hope you are safe in these COVID times. We are trying to run the sentence similarity tutorial here: https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/SENTENCE_SIMILARITY.ipynb#scrollTo=lBggF5P8J1gc

We are able to complete a number of other QuickStart guides that work: e.g.

# Download a pre-trained pipeline
pipeline = PretrainedPipeline('explain_document_dl', lang='en')

but when we run the UniversalSentenceEncoder, we get a tensorflow error from this line:

result = light_pipeline.transform(df)

Expected Behavior

I would expect this to build some embeddings.

Current Behavior

Here is the full stack trace:

Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-1405345296431633783.py", line 331, in <module>
    exec(code)
  File "<stdin>", line 2, in <module>
  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 1132, in head
    rs = self.head(1)
  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 1134, in head
    return self.take(n)
  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 504, in take
    return self.limit(num).collect()
  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 466, in collect
    sock_info = self._jdf.collectToPython()
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o455.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, 10.0.30.161, executor 1): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$dfAnnotate$1: (array<array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>>) => array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:346)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.tensorflow.exceptions.TFFailedPreconditionException: Table not initialized.
	 [[{{node module_apply_default/string_to_index_Lookup/hash_table_Lookup}}]]
	at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:95)
	at org.tensorflow.Session.run(Session.java:666)
	at org.tensorflow.Session.access$100(Session.java:72)
	at org.tensorflow.Session$Runner.runHelper(Session.java:381)
	at org.tensorflow.Session$Runner.run(Session.java:329)
	at com.johnsnowlabs.ml.tensorflow.TensorflowUSE.calculateEmbeddings(TensorflowUSE.scala:47)
	at com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder.annotate(UniversalSentenceEncoder.scala:142)
	at com.johnsnowlabs.nlp.HasSimpleAnnotate$$anonfun$dfAnnotate$1.apply(HasSimpleAnnotate.scala:24)
	at com.johnsnowlabs.nlp.HasSimpleAnnotate$$anonfun$dfAnnotate$1.apply(HasSimpleAnnotate.scala:23)
	... 20 more
Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1682)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1670)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1669)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1669)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:862)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:862)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:862)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1903)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1852)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1841)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:652)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2289)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2310)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2329)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:363)
	at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
	at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3207)
	at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3204)
	at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3266)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:78)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3265)
	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3204)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to execute user defined function($anonfun$dfAnnotate$1: (array<array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>>) => array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:346)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more
Caused by: org.tensorflow.exceptions.TFFailedPreconditionException: Table not initialized.
	 [[{{node module_apply_default/string_to_index_Lookup/hash_table_Lookup}}]]
	at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:95)
	at org.tensorflow.Session.run(Session.java:666)
	at org.tensorflow.Session.access$100(Session.java:72)
	at org.tensorflow.Session$Runner.runHelper(Session.java:381)
	at org.tensorflow.Session$Runner.run(Session.java:329)
	at com.johnsnowlabs.ml.tensorflow.TensorflowUSE.calculateEmbeddings(TensorflowUSE.scala:47)
	at com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder.annotate(UniversalSentenceEncoder.scala:142)
	at com.johnsnowlabs.nlp.HasSimpleAnnotate$$anonfun$dfAnnotate$1.apply(HasSimpleAnnotate.scala:24)
	at com.johnsnowlabs.nlp.HasSimpleAnnotate$$anonfun$dfAnnotate$1.apply(HasSimpleAnnotate.scala:23)
	... 20 more
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-1405345296431633783.py", line 333, in <module>
    raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-1405345296431633783.py", line 331, in <module>
    exec(code)
  File "<stdin>", line 2, in <module>
  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 1132, in head
    rs = self.head(1)
  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 1134, in head
    return self.take(n)
  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 504, in take
    return self.limit(num).collect()
  File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 466, in collect
    sock_info = self._jdf.collectToPython()
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o455.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, 10.0.30.161, executor 1): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$dfAnnotate$1: (array<array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>>) => array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:346)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.tensorflow.exceptions.TFFailedPreconditionException: Table not initialized.
	 [[{{node module_apply_default/string_to_index_Lookup/hash_table_Lookup}}]]
	at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:95)
	at org.tensorflow.Session.run(Session.java:666)
	at org.tensorflow.Session.access$100(Session.java:72)
	at org.tensorflow.Session$Runner.runHelper(Session.java:381)
	at org.tensorflow.Session$Runner.run(Session.java:329)
	at com.johnsnowlabs.ml.tensorflow.TensorflowUSE.calculateEmbeddings(TensorflowUSE.scala:47)
	at com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder.annotate(UniversalSentenceEncoder.scala:142)
	at com.johnsnowlabs.nlp.HasSimpleAnnotate$$anonfun$dfAnnotate$1.apply(HasSimpleAnnotate.scala:24)
	at com.johnsnowlabs.nlp.HasSimpleAnnotate$$anonfun$dfAnnotate$1.apply(HasSimpleAnnotate.scala:23)
	... 20 more
Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1682)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1670)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1669)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1669)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:862)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:862)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:862)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1903)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1852)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1841)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:652)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2289)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2310)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2329)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:363)
	at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
	at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3207)
	at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3204)
	at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3266)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:78)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3265)
	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3204)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to execute user defined function($anonfun$dfAnnotate$1: (array<array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>>) => array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:346)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	... 1 more
Caused by: org.tensorflow.exceptions.TFFailedPreconditionException: Table not initialized.
	 [[{{node module_apply_default/string_to_index_Lookup/hash_table_Lookup}}]]
	at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:95)
	at org.tensorflow.Session.run(Session.java:666)
	at org.tensorflow.Session.access$100(Session.java:72)
	at org.tensorflow.Session$Runner.runHelper(Session.java:381)
	at org.tensorflow.Session$Runner.run(Session.java:329)
	at com.johnsnowlabs.ml.tensorflow.TensorflowUSE.calculateEmbeddings(TensorflowUSE.scala:47)
	at com.johnsnowlabs.nlp.embeddings.UniversalSentenceEncoder.annotate(UniversalSentenceEncoder.scala:142)
	at com.johnsnowlabs.nlp.HasSimpleAnnotate$$anonfun$dfAnnotate$1.apply(HasSimpleAnnotate.scala:24)
	at com.johnsnowlabs.nlp.HasSimpleAnnotate$$anonfun$dfAnnotate$1.apply(HasSimpleAnnotate.scala:23)
	... 20 more

Possible Solution

No clue. Cannot find any solutions on SO, apart from some Tensorflow model saving errors.

Steps to Reproduce

  1. We followed this exact tutorial, but in a Zeppelin/Qubole notebook: https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/SENTENCE_SIMILARITY.ipynb#scrollTo=lBggF5P8J1gc

Context

Recreate this notebook: https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/SENTENCE_SIMILARITY.ipynb#scrollTo=lBggF5P8J1gc

Your Environment

  • Spark NLP version sparknlp.version(): 3.0.2 (Also tried 2.7.5)
  • Apache NLP version spark.version: 2.3.2
  • Java version java -version: 1.8.0_201 (Oracle Corporation)
  • Setup and installation (Pypi, Conda, Maven, etc.): Zeppelin-based solution
# From https://stackoverflow.com/questions/12332975/installing-python-module-within-code
def install(package):
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])

def list_packages():
    # Now you can check to ensure that your package exists in the current pip
    subprocess.check_call([sys.executable, "-m", "pip", "list"])
    
install("spark-nlp==3.0.2")# Spark NLP for Spark 3+

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
maziyarpanahicommented, Apr 30, 2021

Glad to hear that! Thanks for the feedback 👍

1reaction
franckjaycommented, Apr 30, 2021

@maziyarpanahi , you solved it! I apologize for being daft, but my Spark Session had spark.kryoserializer.buffer.max: 1000M but not spark.serializer: org.apache.spark.serializer.KryoSerializer

Read more comments on GitHub >

github_iconTop Results From Across the Web

FailedPreconditionError: Table not initialized (using tf.data ...
All good so far. However, when I try to pass my dataset to my Keras model for training, I get: tensorflow.python.framework.errors_impl.
Read more >
Universal Sentence Encoder | TensorFlow Hub
This notebook illustrates how to access the Universal Sentence Encoder and use it for sentence similarity and sentence classification tasks.
Read more >
spacy-universal-sentence-encoder - PyPI
pyPI: pip install spacy-universal-sentence-encoder. Compatibility: python 3.6/3.7/3.8 (constraint from tensorflow); tensorflow>=2.4.0,<3.0.0 ...
Read more >
How to Use Google's Universal Sentence Encoder for Spam ...
This is a tutorial on how to use TensorFlow Hub to get the Universal Sentence Encoder module into Keras. This an example of...
Read more >
Universal Sentence Encoder - Google Research
Comparisons are made with base- lines that use word level transfer learning via pretrained word embeddings as well as baselines do not use...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found