question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issues loading NorvigSweetingModel pretrained model since 3.1.1

See original GitHub issue

Description

The parameter wordCount can not be loaded from the pretrained model since commit 7457c0b307aaa2932dbb4f1b4ca537feb2357a22. Checking out commits prior to that one is fine.

Training a new instance of the model with NorvigSweetingApproach works fine.

"explain_document_dl" pretrained Pipeline is also affected, since it contains the pretrained NorvigSweetingModel

Steps to Reproduce

Example Pipe

Click to expand
    import com.johnsnowlabs.nlp.annotators.Tokenizer
    import com.johnsnowlabs.nlp.annotators.spell.norvig.NorvigSweetingModel
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import org.apache.spark.ml.Pipeline
    import spark.implicits._

    println(spark.version)
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")

    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")

    val spellChecker: NorvigSweetingModel = NorvigSweetingModel.pretrained()
      .setInputCols("token")
      .setOutputCol("spell")

    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      spellChecker
    ))

    val data = Seq("spmetimes i wrrite wordz erong.").toDF("text")
    val result = pipeline.fit(data).transform(data)
    result.select("spell.result").show(false)

Your Environment

  • Spark NLP version sparknlp.version(): 3.1.1
  • Apache NLP version spark.version: 3.0.2
  • Java version java -version: 1.8
  • Setup and installation (Pypi, Conda, Maven, etc.): Maven

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
maziyarpanahicommented, Jul 2, 2021

@DevinTDHa Interesting, it would be great if you can check to see with spark-nlp-spark24==3.1.1 and pyspark==2.7.4 the issue still there. Want to be sure it’s about us and not pyspark version. Many thanks.

0reactions
DevinTDHacommented, Jul 2, 2021

This is getting quite confusing, running it on Databricks with coordinates com.johnsnowlabs.nlp:spark-nlp_2.12:3.1.1 results in an error, com.johnsnowlabs.nlp:spark-nlp_2.12:3.1.0 is fine (spark version 3.0.1)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Out of Memory error, for using pretrained model ... - GitHub
i was trying to use one of the explain_document_ml pretrained model for text analytics, it was working fine on smaller dataset but when...
Read more >
Not able to use JohnSnowLabs pretrained model in Zeppelin
Whenever you have problem with auto download of pre-trained models/pipelines due to your environment, you can always load them manually.
Read more >
Explain Document DL – SPARK NLP Pretrained Pipeline
Let's see how we can use explain_document_dl pre-trained model in Python. We start by importing the required modules. Now, we load a pipeline...
Read more >
spark-nlp - PyPI
pretrained () function to download pretrained models, you will need to manually download your pipeline/model from Models Hub, extract it, and load it....
Read more >
Getting errors when enabling Spark NLP pre-trained NER ...
Issue : We attempted to enable Spark NLP pre-trained NER model (based out of BERT) at indexing pipeline stage and we are running...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found