Cannot load large XlmRoBertaForTokenClassification model in scala 2.12
See original GitHub issueSteps to Reproduce
- XlmRoBertaForTokenClassification.loadSavedModel(<large model path>)
Stack Trace
Exception in thread “main” java.lang.OutOfMemoryError: Java heap space at com.johnsnowlabs.ml.tensorflow.io.ChunkBytes$.readFileInByteChunks(ChunkBytes.scala:44) at com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(TensorflowWrapper.scala:436) at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadXlmRoBertaForTokenTensorflowModel.loadSavedModel(XlmRoBertaForTokenClassification.scala:311) at com.johnsnowlabs.nlp.annotators.classifier.dl.ReadXlmRoBertaForTokenTensorflowModel.loadSavedModel$(XlmRoBertaForTokenClassification.scala:292) at com.johnsnowlabs.nlp.annotators.classifier.dl.XlmRoBertaForTokenClassification$.loadSavedModel(XlmRoBertaForTokenClassification.scala:330)
Your Environment
- Spark NLP version
3.3.1
: - Apache NLP version
3.0.1
: - Java version
1.8.0
: - Setup and installation (Pypi, Conda, Maven, etc.): SBT + Scala
- Operating System and version: Ubuntu + MacOS
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Error while emitting: Method too large · Issue #529 - GitHub
scalaxb 1.7.3, scala 2.12.11 I'm trying to generate code for: ... while emitting bmecat/XMLProtocol$DefaultBmecat_DtUNITFormat [error] Method too large: ...
Read more >Spark NLP: State of the Art Natural Language Processing
Loading PerceptronModel annotator model inside Spark NLP Pipeline ... spark-nlp on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only):
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
You are write. I added -Xmx15g to the java process and it’s working. Thank you!
Thanks, the error clearly indicates there is not enough memory to serialize XLM-RoBERTa large model in Java. It seems strange, the 15G should be enough for that model. Might be some settings regarding Java heap in your classpath or you actually don’t have 15G actual free memory.
I just tested
xlm-roberta-large
on Google Colab which only has 12G memory and 2G-3G was already in use by previous operations and it worked: https://colab.research.google.com/drive/1p5jFqxMuCnfcWFDGy_JS7yLJDeYGF00f?usp=sharingUnfortunately, not much left other than freeing up more memory on that machine.