Issues loading NorvigSweetingModel pretrained model since 3.1.1
See original GitHub issueDescription
The parameter wordCount
can not be loaded from the pretrained model since commit 7457c0b307aaa2932dbb4f1b4ca537feb2357a22. Checking out commits prior to that one is fine.
Training a new instance of the model with NorvigSweetingApproach
works fine.
"explain_document_dl"
pretrained Pipeline is also affected, since it contains the pretrained NorvigSweetingModel
Steps to Reproduce
Example Pipe
Click to expand
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.spell.norvig.NorvigSweetingModel
import com.johnsnowlabs.nlp.base.DocumentAssembler
import org.apache.spark.ml.Pipeline
import spark.implicits._
println(spark.version)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val spellChecker: NorvigSweetingModel = NorvigSweetingModel.pretrained()
.setInputCols("token")
.setOutputCol("spell")
val pipeline = new Pipeline().setStages(Array(
documentAssembler,
tokenizer,
spellChecker
))
val data = Seq("spmetimes i wrrite wordz erong.").toDF("text")
val result = pipeline.fit(data).transform(data)
result.select("spell.result").show(false)
Your Environment
- Spark NLP version
sparknlp.version()
: 3.1.1 - Apache NLP version
spark.version
: 3.0.2 - Java version
java -version
: 1.8 - Setup and installation (Pypi, Conda, Maven, etc.): Maven
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Out of Memory error, for using pretrained model ... - GitHub
i was trying to use one of the explain_document_ml pretrained model for text analytics, it was working fine on smaller dataset but when...
Read more >Not able to use JohnSnowLabs pretrained model in Zeppelin
Whenever you have problem with auto download of pre-trained models/pipelines due to your environment, you can always load them manually.
Read more >Explain Document DL – SPARK NLP Pretrained Pipeline
Let's see how we can use explain_document_dl pre-trained model in Python. We start by importing the required modules. Now, we load a pipeline...
Read more >spark-nlp - PyPI
pretrained () function to download pretrained models, you will need to manually download your pipeline/model from Models Hub, extract it, and load it....
Read more >Getting errors when enabling Spark NLP pre-trained NER ...
Issue : We attempted to enable Spark NLP pre-trained NER model (based out of BERT) at indexing pipeline stage and we are running...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@DevinTDHa Interesting, it would be great if you can check to see with spark-nlp-spark24==3.1.1 and pyspark==2.7.4 the issue still there. Want to be sure it’s about us and not pyspark version. Many thanks.
This is getting quite confusing, running it on Databricks with coordinates
com.johnsnowlabs.nlp:spark-nlp_2.12:3.1.1
results in an error,com.johnsnowlabs.nlp:spark-nlp_2.12:3.1.0
is fine (spark version 3.0.1)