Unknown environment issue with BioBert
See original GitHub issueI am using the nlu BioBert mapper to improve upon a tool that already exists called text2term. A few weeks ago, I was able to get the tool working on a personal computer (Mac), but shortly after when I switched to my new work computer (also Mac, same OS but with an Apple Chip instead of Intel), the program no longer worked even with the same source code, Python, and Java version.
A coworker recreated the issue with an Apple Chip computer, Python 3.9.5, and Java 17. If you have any insights, please let me know.
Here are the co-requirements, as well as the versions and the error: Python 3.10.6 (Also tried 3.9.13) Java version “1.8.0_341” (Also tried Java 16) requirements.txt:
Owlready2==0.36
argparse==1.4.0
pandas==1.4.1
numpy==1.23.2
gensim==4.1.2
scipy==1.8.0
scikit-learn==1.0.2
setuptools==60.9.3
requests==2.27.1
tqdm==4.62.3
sparse_dot_topn==0.3.1
bioregistry==0.4.63
nltk==3.7
rapidfuzz==2.0.5
shortuuid==1.0.9
Error:
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/clientserver.py", line 516, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/java_gateway.py", line 1038, in send_command
response = connection.send_command(command)
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/clientserver.py", line 539, in send_command
raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
[OK!]
Traceback (most recent call last):
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/pipe/component_resolution.py", line 276, in get_trained_component_for_nlp_model_ref
component.get_pretrained_model(nlp_ref, lang, model_bucket),
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/components/embeddings/sentence_bert/BertSentenceEmbedding.py", line 13, in get_pretrained_model
return BertSentenceEmbeddings.pretrained(name,language,bucket) \
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/annotator/embeddings/bert_sentence_embeddings.py", line 231, in pretrained
return ResourceDownloader.downloadModel(BertSentenceEmbeddings, name, lang, remote_loc)
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/pretrained/resource_downloader.py", line 40, in downloadModel
j_obj = _internal._DownloadModel(reader.name, name, language, remote_loc, j_dwn).apply()
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/internal/__init__.py", line 317, in __init__
super(_DownloadModel, self).__init__("com.johnsnowlabs.nlp.pretrained." + validator + ".downloadModel", reader,
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/internal/extended_java_wrapper.py", line 26, in __init__
self._java_obj = self.new_java_obj(java_obj, *args)
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/sparknlp/internal/extended_java_wrapper.py", line 36, in new_java_obj
return self._new_java_obj(java_class, *args)
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pyspark/ml/wrapper.py", line 86, in _new_java_obj
return java_obj(*java_args)
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/pyspark/sql/utils.py", line 190, in deco
return f(*a, **kw)
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/py4j/protocol.py", line 334, in get_return_value
raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/__init__.py", line 234, in load
nlu_component = nlu_ref_to_component(nlu_ref)
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/pipe/component_resolution.py", line 160, in nlu_ref_to_component
resolved_component = get_trained_component_for_nlp_model_ref(lang, nlu_ref, nlp_ref, license_type, model_params)
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/pipe/component_resolution.py", line 287, in get_trained_component_for_nlp_model_ref
raise ValueError(f'Failure making component, nlp_ref={nlp_ref}, nlu_ref={nlu_ref}, lang={lang}, \n err={e}')
ValueError: Failure making component, nlp_ref=sent_biobert_pmc_base_cased, nlu_ref=en.embed_sentence.biobert.pmc_base_cased, lang=en,
err=An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/__main__.py", line 48, in <module>
Text2Term().map_file(arguments.source, arguments.target, output_file=arguments.output, csv_columns=csv_columns,
File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/t2t.py", line 63, in map_file
return self.map(source_terms, target_ontology, source_terms_ids=source_terms_ids, base_iris=base_iris,
File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/t2t.py", line 115, in map
self._do_biobert_mapping(source_terms, target_terms, biobert_file)
File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/t2t.py", line 161, in _do_biobert_mapping
biobert = BioBertMapper(ontology_terms)
File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/biobert_mapper.py", line 28, in __init__
self.biobert = self.load_biobert()
File "/Users/jason/Documents/GitHub/ontology-mapper/text2term/biobert_mapper.py", line 34, in load_biobert
biobert = nlu.load('en.embed_sentence.biobert.pmc_base_cased')
File "/Users/jason/.pyenv/versions/3.10.6/lib/python3.10/site-packages/nlu/__init__.py", line 249, in load
raise Exception(
Exception: Something went wrong during creating the Spark NLP model_anno_obj for your request = en.embed_sentence.biobert.pmc_base_cased Did you use a NLU Spell?
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Predictions in raw data #5 - dmis-lab/biobert-pytorch - GitHub
I mean a normal biomedical text. The issue is that there is no .predict function, so the file run_ner.py has to be customized....
Read more >Topic Modeling based BioBERT QA System | Kaggle
Enhance capacity (people, technology, data) for sequencing with advanced analytics for unknown pathogens, and explore capabilities for distinguishing ...
Read more >Optimising biomedical relationship extraction with BioBERT
Abstract. Text mining is widely used within the life sciences as an evidence stream for inferring relationships between biological entities.
Read more >Automated Adverse Drug Event (ADE) Detection from Text in ...
ner_ade_biobert : trained with 768d BioBert embeddings ... scope of the ADE detected ( gastric problems would be detected as absent ).
Read more >Self-Attention-Based Models for the Extraction of Molecular ...
The first paper investigated the scope of the BERT and BioBERT model in general BLM problems. The second paper improved on the result...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks @C-K-Loan
I am sure @DevinTDHa has more to say about this. However, it seems that M1 is different than M1 Pro and M1 Max. These are very new and proprietary chips with little to know about. We also don’t have any CI/CD support at the moment so we can automate this and test everything thoroughly. We also had to build everything ourselves, there is no support for M1 in TensorFlow for Java and other dependencies we need.
Maybe we missed something and maybe it is possible to build/release for M1 that supports Pro and Max (I have no idea about the M2 family), but at the moment it seems only Apple M1 is supported with our build.
Hi @C-K-Loan, @paynejason
As @maziyarpanahi said indeed the current build of Spark NLP only supports the default M1 architecture. Other processor deviations seem to have a differing instruction set, thus the error.
We are hoping to broaden support once CI/CD based on M1 is more generally available but in the meantime we recommend to run the pipelines remotely, such as in Google Colab or Databricks (if thats possible for you use case.)