question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wordseg_best throw an error com.johnsnowlabs.nlp.annotators.pos.perceptron.AveragedPerceptron; local class incompatible: stream classdesc serialVersionUID

See original GitHub issue

Description

I’m trying with wordseg_best, follow the example given. But it doesn’t work as expected (the error message in below)

I do try with other languages (zh, ja, ko), all working fine.

Expected Behavior

Should be shown as output provided in example

+-----------------------------------+---------------------------------------------------------+
|text                               |result                                                   |
+-----------------------------------+---------------------------------------------------------+
|จวนจะถึงร้านที่คุณจองโต๊ะไว้แล้วจ้ะ|[จวน, จะ, ถึง, ร้าน, ที่, คุณ, จอง, โต๊ะ, ไว้, แล้ว, จ้ะ]|
+-----------------------------------+---------------------------------------------------------+

Current Behavior

Error happened

An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 7.0 failed 4 times, most recent failure: Lost task 6.3 in stage 7.0 (TID 3395) (ip-10-0-4-134.ap-southeast-1.compute.internal executor 2): java.io.InvalidClassException: com.johnsnowlabs.nlp.annotators.pos.perceptron.AveragedPerceptron; local class incompatible: stream classdesc serialVersionUID = -7114715142956979922, local class serialVersionUID = 6642857758815297725
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2003)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1850)

Possible Solution

Steps to Reproduce

I write a unit test for it

from pyspark.sql import SparkSession
from sparknlp.annotator import *
from sparknlp.base import DocumentAssembler, Pipeline

class TestThaiNlp(unittest.TestCase):

    def setUp(self):
        self.spark = SparkSession.builder \
            .master('local') \
            .appName('vision') \
            .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:3.2.2") \
            .getOrCreate()
        self.df = self.spark.createDataFrame([['จวนจะถึงร้านที่คุณจองโต๊ะไว้แล้วจ้ะ']], ["text"])

    def test_sparknlp(self):
        document_assembler = DocumentAssembler() \
            .setInputCol('text') \
            .setOutputCol('document')
        word_seg = WordSegmenterModel.pretrained('wordseg_best', 'th') \
            .setInputCols('document') \
            .setOutputCol('token')
        pipeline = Pipeline(stages=[document_assembler, word_seg])
        result = pipeline.fit(self.df).transform(self.df)
        result.show(2, False)

    def tearDown(self):
        self.spark.stop()

Context

I’m actually trying to do a benchmark to compare with pythainlp. However it doesn’t.

Your Environment

  • EMR: 6.3.0
  • spark version: 3.1.1
  • sparknlp version: 3.2.2
[hadoop@ip-10-0-4-81 ~]$ java -version
openjdk version "1.8.0_282"
OpenJDK Runtime Environment Corretto-8.282.08.1 (build 1.8.0_282-b08)
OpenJDK 64-Bit Server VM Corretto-8.282.08.1 (build 25.282-b08, mixed mode)
  • Spark NLP version 3.2.2:
  • Apache NLP version 3.1.1:
  • Java version openjdk version "1.8.0_282":
  • Setup and installation (Pypi, Conda, Maven, etc.):
apt-get update -y && apt-get install -y default-jre python3-dev
pip install pyspark PyICU pythainlp spark-nlp==3.2.2
  • Operating System and version: Amazon Linux release 2 (Karoo)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
jslim89commented, Sep 22, 2021

Thanks @maziyarpanahi @dcecchini .

it works now

image

2reactions
jslim89commented, Sep 15, 2021

Hi @maziyarpanahi thanks for the quick update. Then I will wait for the new version

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issue with spark-nlp_2.11-1.5.3.jar (Unable to find ... - GitHub
Issue with spark-nlp_2.11-1.5.3.jar (Unable to find class: [Lcom.johnsnowlabs.nlp.annotators.pos.perceptron.AveragedPerceptron;) #203.
Read more >
PerceptronModel - Spark NLP
Averaged Perceptron model to tag words part-of-speech. Sets a POS tag to each word within a sentence. This is the instantiated model of...
Read more >
how resolve java.io.InvalidClassException: local class ...
This solves the problem described in the message, by forcing the serialVersionUID of the local class to match what was put in the...
Read more >
Spark (Standalone) error local class incompatible:... - 25909
15/03/25 06:15:58 ERROR Executor: Exception in task 1.3 in stage 2.0 (TID ... PairRDDFunctions; local class incompatible: stream classdesc serialVersionUID ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found