Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issues loading NorvigSweetingModel pretrained model since 3.1.1

See original GitHub issue

Description

The parameter wordCount can not be loaded from the pretrained model since commit 7457c0b307aaa2932dbb4f1b4ca537feb2357a22. Checking out commits prior to that one is fine.

Training a new instance of the model with NorvigSweetingApproach works fine.

"explain_document_dl" pretrained Pipeline is also affected, since it contains the pretrained NorvigSweetingModel

Steps to Reproduce

Example Pipe

Click to expand

    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.annotators.spell.norvig.NorvigSweetingModel
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import org.apache.spark.ml.Pipeline
    import spark.implicits._

    println(spark.version)
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")

    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")

    val spellChecker: NorvigSweetingModel = NorvigSweetingModel.pretrained()
      .setInputCols("token")
      .setOutputCol("spell")

    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      spellChecker
    ))

    val data = Seq("spmetimes i wrrite wordz erong.").toDF("text")
    val result = pipeline.fit(data).transform(data)
    result.select("spell.result").show(false)

Your Environment

Spark NLP version sparknlp.version(): 3.1.1
Apache NLP version spark.version: 3.0.2
Java version java -version: 1.8
Setup and installation (Pypi, Conda, Maven, etc.): Maven

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

maziyarpanahicommented, Jul 2, 2021

@DevinTDHa Interesting, it would be great if you can check to see with spark-nlp-spark24==3.1.1 and pyspark==2.7.4 the issue still there. Want to be sure it’s about us and not pyspark version. Many thanks.

0reactions

DevinTDHacommented, Jul 2, 2021

This is getting quite confusing, running it on Databricks with coordinates com.johnsnowlabs.nlp:spark-nlp_2.12:3.1.1 results in an error, com.johnsnowlabs.nlp:spark-nlp_2.12:3.1.0 is fine (spark version 3.0.1)

Top Results From Across the Web

Out of Memory error, for using pretrained model ... - GitHub

i was trying to use one of the explain_document_ml pretrained model for text analytics, it was working fine on smaller dataset but when...

Not able to use JohnSnowLabs pretrained model in Zeppelin

Whenever you have problem with auto download of pre-trained models/pipelines due to your environment, you can always load them manually.

Explain Document DL – SPARK NLP Pretrained Pipeline

Let's see how we can use explain_document_dl pre-trained model in Python. We start by importing the required modules. Now, we load a pipeline...

spark-nlp - PyPI

pretrained () function to download pretrained models, you will need to manually download your pipeline/model from Models Hub, extract it, and load it....

Getting errors when enabling Spark NLP pre-trained NER ...

Issue : We attempted to enable Spark NLP pre-trained NER model (based out of BERT) at indexing pipeline stage and we are running...