question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MMSplice error and general kipoi questions

See original GitHub issue

Hello,

I’m quite new to kipoi, but I should praise your effort in advance for such great resource and concept.

I want to employ kipoi to study predictions of Splicing models. To start, I decided to go for MMSplice, but it seems my setup isn’t working. I followed recommendations by creating a specific environment for all MMSplcie modules

However, when I run the simplest case on a set of variants I get the following error:

(kipoi-MMSplice) pedro.barbosa@lobo-2:~/resources/ kipoi veff score_variants MMSplice/pathogenicity -i variants.vcf.gz -o test.vcf
Already up-to-date.
Using TensorFlow backend.
/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)
/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator HuberRegressor from version 0.19.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator StandardScaler from version 0.19.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator LogisticRegression from version 0.19.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator Pipeline from version 0.19.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
2019-08-08 16:40:29.661278: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-08 16:40:29.668075: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200050000 Hz
2019-08-08 16:40:29.668911: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5657620 executing computations on platform Host. Devices:
2019-08-08 16:40:29.668935: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-08-08 16:40:29.692338: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
Traceback (most recent call last):
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/bin/kipoi", line 11, in <module>
    sys.exit(main())
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/kipoi/__main__.py", line 105, in main
    command_fn(args.command, sys.argv[2:])
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/kipoi_veff/__main__.py", line 11, in cli_main
    kipoi_veff.cli.cli_main(command, raw_args)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/kipoi_veff/cli.py", line 458, in cli_main
    command_fn(args.command, raw_args[1:])
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/kipoi_veff/cli.py", line 192, in cli_score_variants
    model_info = kipoi_veff.ModelInfoExtractor(model, Dl)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/kipoi_veff/utils/generic.py", line 318, in __init__
    self.seq_fields = _get_seq_fields(model_obj)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-MMSplice/lib/python3.5/site-packages/kipoi_veff/utils/generic.py", line 451, in _get_seq_fields
    raise Exception("Model does not support var_effect_prediction")
Exception: Model does not support var_effect_prediction

Any help ?

Additionally, I would like to have a seamless way to run multiple models within the same environment. I checked and pulled kipoi/models docker image, but I’m not sure if all models dependencies are solved there. Do you have any tutorial on how to run models/ score variants using docker containers ? That would be ideal, since it is the preferred way to run things in my cluster.

Last but not least, what’s the biggest difference between kipoi predict vs kipoi veff score_variants for a model that is designed to predict effects of genomic variants ?

Thanks in advance, Pedro Barbosa

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:33 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
PedroBarbosacommented, Aug 19, 2019

Dear @Avsecz ,

Testing now HAL on clinvar file gives me a parser error

Traceback (most recent call last):
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/bin/kipoi", line 10, in <module>
    sys.exit(main())
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi/__main__.py", line 105, in main
    command_fn(args.command, sys.argv[2:])
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi_veff/__main__.py", line 11, in cli_main
    kipoi_veff.cli.cli_main(command, raw_args)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi_veff/cli.py", line 458, in cli_main
    command_fn(args.command, raw_args[1:])
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi_veff/cli.py", line 222, in cli_score_variants
    model_outputs=model_outputs)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi_veff/snv_predict.py", line 795, in score_variants
    return_predictions=return_predictions)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi_veff/snv_predict.py", line 656, in predict_snvs
    writer(res_here, eval_kwargs["vcf_records"], eval_kwargs["line_id"])
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi_veff/utils/io.py", line 328, in __call__
    record_vcf = convert_record(record, self.vcf_reader)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi_veff/utils/generic.py", line 180, in convert_record
    info_tag = revert_to_info(input_record.INFO)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/kipoi_veff/utils/generic.py", line 175, in revert_to_info
    return pyvcf_reader._parse_info(u";".join(out_str_elms))
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/vcf/parser.py", line 397, in _parse_info
    val = self._map(float, vals)
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/vcf/parser.py", line 360, in _map
    for x in iterable]
  File "/home/pedro.barbosa/software/miniconda3/envs/kipoi-shared__envs__kipoi-py3-keras2/lib/python3.6/site-packages/vcf/parser.py", line 360, in <listcomp>
    for x in iterable]
ValueError: could not convert string to float: '(0.0'

My VCF contains SIFT and Polyphen annotations added by VEP (that add string characters to the corresponding fields), i don’t know if this could be the problem: |tolerated(1)|benign(0.003)

1reaction
PedroBarbosacommented, Aug 14, 2019

@PedroBarbosa There is a ‘Postprocessing’ tag on the website:

Ah, that is useful, thanks. It makes sense now.

Can you double-check your fasta file for the chromosome names? I recommend using the same style (e.g. chr1 vs 1) in the vcf as well as the fasta file.

I checked for that. Both have the ‘chr’ strings, but the problem remains. I removed chr myself, and it appears to be working (VCF file is annotated with KV:kipoi:HAL:DIFF and KV:kipoi:HAL:rID fields). However, few variants are scored. I believe it is because HAL can’t predict arbitrary across the whole genome, right? Is that the reason I get the following warning?

no intervals found for b'/mnt/nfs/lobo/MCFONSECA-NFS/pedro.barbosa/resources/variants.vcf.gz' at None:107087180-107087339

I would like to know more about HAL. In the paper, they refer the model predicts the effect of variants (SNPs, indels) on different isoform usage from alternative splicing events (alternative 5’ , alternative 3’ and Exon skipping events). When I run kipoi predict HAL, which just requires a fasta and a gtf I get several predictions for each feature present in my gtf. They seem vague to me, apparently it predicts PSI of alternative Splice acceptor events, but the genomic coordinates (metadata/ranges/start and metadata/ranges/end columns in the tsv output) doesn’t seem to match exon/intron boundaries present in the gtf. What is HAL actually predicting here? Please find below an example of a transcript:

metadata/biotype        metadata/geneID metadata/order  metadata/ranges/chr     metadata/ranges/end     metadata/ranges/id      metadata/ranges/start   metadata/ranges/strand  metadata/transcriptID   preds/0
protein_coding  ENSG00000172748.13_3    0       chr8    182347  ENSG00000172748.13_3    182187  +       ENST00000521145.5_2     12.975850868827989
protein_coding  ENSG00000172748.13_3    1       chr8    190987  ENSG00000172748.13_3    190827  +       ENST00000521145.5_2     14.155642398397264
protein_coding  ENSG00000172748.13_3    2       chr8    193093  ENSG00000172748.13_3    192933  +       ENST00000521145.5_2     16.67051892178627
protein_coding  ENSG00000172748.13_3    3       chr8    193885  ENSG00000172748.13_3    193725  +       ENST00000521145.5_2     14.183913541499903
protein_coding  ENSG00000172748.13_3    4       chr8    194781  ENSG00000172748.13_3    194621  +       ENST00000521145.5_2     13.87503416914901

In addition, in the model page (http://splicing.cs.washington.edu/) , they seem to provide utilities to predict variants that influence alternative 5’ss and exon skipping, but kipoi refers to 3’ss. Was that done on purpose in the yaml file ?

Thanks in advance, Pedro Barbosa

Read more comments on GitHub >

github_iconTop Results From Across the Web

Modular Modeling of Splicing (MMSplice) - Kipoi
MMSplice predicts variant effect with 5 modules scoring exon, donor, acceptor, 3' intron and 5' intron. Modular predictions are combined with a linear...
Read more >
Interpreting regulatory variants with predictive models
Second, I developed MMSplice, a modular deep learning framework to predict effect of genetic variants on splicing in human cells.
Read more >
MMSplice: modular modeling improves the predictions of ...
We evaluated the performance of ΔΨ predictions of MMSplice, HAL, and SPANR using root-mean-square errors (RMSE) on test data.
Read more >
Computational and experimental methods for classifying ...
We discuss the prospects of integrating large-scale functional data with machine learning and clinical knowledge for the development of accurate pathogenicity ...
Read more >
The Kipoi repository accelerates community exchange and ...
We discuss successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores. We include general guidance for how to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found