wordseg_best throw an error com.johnsnowlabs.nlp.annotators.pos.perceptron.AveragedPerceptron; local class incompatible: stream classdesc serialVersionUID
See original GitHub issueDescription
I’m trying with wordseg_best, follow the example given. But it doesn’t work as expected (the error message in below)
I do try with other languages (zh, ja, ko), all working fine.
Expected Behavior
Should be shown as output provided in example
+-----------------------------------+---------------------------------------------------------+
|text |result |
+-----------------------------------+---------------------------------------------------------+
|จวนจะถึงร้านที่คุณจองโต๊ะไว้แล้วจ้ะ|[จวน, จะ, ถึง, ร้าน, ที่, คุณ, จอง, โต๊ะ, ไว้, แล้ว, จ้ะ]|
+-----------------------------------+---------------------------------------------------------+
Current Behavior
Error happened
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 7.0 failed 4 times, most recent failure: Lost task 6.3 in stage 7.0 (TID 3395) (ip-10-0-4-134.ap-southeast-1.compute.internal executor 2): java.io.InvalidClassException: com.johnsnowlabs.nlp.annotators.pos.perceptron.AveragedPerceptron; local class incompatible: stream classdesc serialVersionUID = -7114715142956979922, local class serialVersionUID = 6642857758815297725
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1850)
Possible Solution
Steps to Reproduce
I write a unit test for it
from pyspark.sql import SparkSession
from sparknlp.annotator import *
from sparknlp.base import DocumentAssembler, Pipeline
class TestThaiNlp(unittest.TestCase):
def setUp(self):
self.spark = SparkSession.builder \
.master('local') \
.appName('vision') \
.config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:3.2.2") \
.getOrCreate()
self.df = self.spark.createDataFrame([['จวนจะถึงร้านที่คุณจองโต๊ะไว้แล้วจ้ะ']], ["text"])
def test_sparknlp(self):
document_assembler = DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
word_seg = WordSegmenterModel.pretrained('wordseg_best', 'th') \
.setInputCols('document') \
.setOutputCol('token')
pipeline = Pipeline(stages=[document_assembler, word_seg])
result = pipeline.fit(self.df).transform(self.df)
result.show(2, False)
def tearDown(self):
self.spark.stop()
Context
I’m actually trying to do a benchmark to compare with pythainlp. However it doesn’t.
Your Environment
- EMR: 6.3.0
- spark version: 3.1.1
- sparknlp version: 3.2.2
[hadoop@ip-10-0-4-81 ~]$ java -version
openjdk version "1.8.0_282"
OpenJDK Runtime Environment Corretto-8.282.08.1 (build 1.8.0_282-b08)
OpenJDK 64-Bit Server VM Corretto-8.282.08.1 (build 25.282-b08, mixed mode)
- Spark NLP version
3.2.2
: - Apache NLP version
3.1.1
: - Java version
openjdk version "1.8.0_282"
: - Setup and installation (Pypi, Conda, Maven, etc.):
apt-get update -y && apt-get install -y default-jre python3-dev
pip install pyspark PyICU pythainlp spark-nlp==3.2.2
- Operating System and version:
Amazon Linux release 2 (Karoo)
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Issue with spark-nlp_2.11-1.5.3.jar (Unable to find ... - GitHub
Issue with spark-nlp_2.11-1.5.3.jar (Unable to find class: [Lcom.johnsnowlabs.nlp.annotators.pos.perceptron.AveragedPerceptron;) #203.
Read more >PerceptronModel - Spark NLP
Averaged Perceptron model to tag words part-of-speech. Sets a POS tag to each word within a sentence. This is the instantiated model of...
Read more >how resolve java.io.InvalidClassException: local class ...
This solves the problem described in the message, by forcing the serialVersionUID of the local class to match what was put in the...
Read more >Spark (Standalone) error local class incompatible:... - 25909
15/03/25 06:15:58 ERROR Executor: Exception in task 1.3 in stage 2.0 (TID ... PairRDDFunctions; local class incompatible: stream classdesc serialVersionUID ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks @maziyarpanahi @dcecchini .
it works now
Hi @maziyarpanahi thanks for the quick update. Then I will wait for the new version