CUDNN_STATUS_INTERNAL_ERROR
See original GitHub issueDescribe the issue: /opt/deepvariant/bin/run_deepvariant crashes when start the GPU stage of call variants.
Setup google/deepvariant:0.10.0 Docker subset of illumina resequencing data $nvcc --version nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85 $ nvidia-smi NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 GeForce RTX 2070 super.
Workaround Apparently the gpu module is consuming all my memmory (8gb), possilbe " config.gpu_options.allow_growth = True" not present in the script?
Command line
BIN_VERSION="1.0.0"
BASE="${PWD}/deepvariant-run"
INPUT_DIR="${BASE}/input"
REF="10consensus.fasta"
REF2="reftst.fa"
BAM="268_041_m10.sorted.bam"
BAM2="tst.sorted.bam"
OUTPUT_DIR="${BASE}/output"
DATA_DIR="${INPUT_DIR}/data"
OUTPUT_VCF="M10.output.vcf.gz"
OUTPUT_VCF2="TST.output.vcf.gz"
OUTPUT_GVCF="M10.output.g.vcf.gz"
OUTPUT_GVCF2="TST.output.g.vcf.gz"
sudo docker run --gpus 1 -v "${DATA_DIR}":"/input" -v "${OUTPUT_DIR}:/output" google/deepvariant:"${BIN_VERSION}-gpu" /opt/deepvariant/bin/run_deepvariant --model_type=WGS --ref="/input/${REF2}" --reads="/input/${BAM2}" --output_vcf=/output/${OUTPUT_VCF} --output_gvcf=/output/${OUTPUT_GVCF} --intermediate_results_dir /output/intermediate_results_dir --num_shards=30
Error trace …
2020-09-24 03:47:35.386802: W third_party/nucleus/io/sam_reader.cc:534] Could not read base quality scores GWNJ-1012:204:GW191209000:1:1101:22544:2049: Not found: Could not read base quality scores I0924 03:47:35.394492 139826099087104 make_examples.py:587] Task 28/30: Found 88 candidate variants I0924 03:47:35.394706 139826099087104 make_examples.py:587] Task 28/30: Created 88 examples I0924 03:47:35.416212 139915800631040 make_examples.py:587] Task 9/30: Found 74 candidate variants I0924 03:47:35.416471 139915800631040 make_examples.py:587] Task 9/30: Created 76 examples I0924 03:47:35.441959 139746083813120 make_examples.py:587] Task 29/30: Found 78 candidate variants I0924 03:47:35.442209 139746083813120 make_examples.py:587] Task 29/30: Created 78 examples
real 0m5.429s user 2m1.568s sys 0m23.089s
***** Running the command:*****
time /opt/deepvariant/bin/call_variants --outfile “/output/intermediate_results_dir/call_variants_output.tfrecord.gz” --examples “/output/intermediate_results_dir/make_examples.tfrecord@30.gz” --checkpoint “/opt/models/wgs/model.ckpt”
I0924 03:47:37.408303 140325876573952 call_variants.py:335] Shape of input examples: [100, 221, 6]
2020-09-24 03:47:37.413854: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-09-24 03:47:37.437208: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-09-24 03:47:37.440001: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5e41920 executing computations on platform Host. Devices:
2020-09-24 03:47:37.440048: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2020-09-24 03:47:37.444991: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-09-24 03:47:37.554617: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5ea0f10 executing computations on platform CUDA. Devices:
2020-09-24 03:47:37.554679: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5
2020-09-24 03:47:37.556109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:21:00.0
2020-09-24 03:47:37.556612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-24 03:47:37.559375: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-24 03:47:37.561650: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-24 03:47:37.562295: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-24 03:47:37.565509: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-24 03:47:37.567974: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-24 03:47:37.574763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-24 03:47:37.576204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-24 03:47:37.576265: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-24 03:47:37.577441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-24 03:47:37.577462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-09-24 03:47:37.577470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-09-24 03:47:37.578993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6199 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:21:00.0, compute capability: 7.5)
W0924 03:47:37.676500 140325876573952 estimator.py:1821] Using temporary folder as model directory: /tmp/tmp3gvrq0ei
I0924 03:47:37.676881 140325876573952 estimator.py:212] Using config: {‘_model_dir’: ‘/tmp/tmp3gvrq0ei’, ‘_tf_random_seed’: None, ‘_save_summary_steps’: 100, ‘_save_checkpoints_steps’: None, ‘_save_checkpoints_secs’: 600, ‘_session_config’: , ‘_keep_checkpoint_max’: 100000, ‘_keep_checkpoint_every_n_hours’: 10000, ‘_log_step_count_steps’: 100, ‘_train_distribute’: None, ‘_device_fn’: None, ‘_protocol’: None, ‘_eval_distribute’: None, ‘_experimental_distribute’: None, ‘_experimental_max_worker_delay_secs’: None, ‘_session_creation_timeout_secs’: 7200, ‘_service’: None, ‘_cluster_spec’: <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9f898d3630>, ‘_task_type’: ‘worker’, ‘_task_id’: 0, ‘_global_id_in_cluster’: 0, ‘_master’: ‘’, ‘_evaluation_master’: ‘’, ‘_is_chief’: True, ‘_num_ps_replicas’: 0, ‘_num_worker_replicas’: 1}
I0924 03:47:37.677164 140325876573952 call_variants.py:426] Writing calls to /output/intermediate_results_dir/call_variants_output.tfrecord.gz
W0924 03:47:37.681965 140325876573952 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0924 03:47:37.690693 140325876573952 deprecation.py:323] From /tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/data_providers.py:375: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic
.
W0924 03:47:37.814187 140325876573952 deprecation.py:323] From /tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/data_providers.py:381: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.map(map_func, num_parallel_calls)
followed by tf.data.Dataset.batch(batch_size, drop_remainder)
. Static tf.data optimizations will take care of using the fused implementation.
I0924 03:47:38.164505 140325876573952 estimator.py:1147] Calling model_fn.
W0924 03:47:38.168455 140325876573952 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use layer.__call__
method instead.
I0924 03:47:41.667636 140325876573952 estimator.py:1149] Done calling model_fn.
I0924 03:47:42.548214 140325876573952 monitored_session.py:240] Graph was finalized.
2020-09-24 03:47:42.549039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:21:00.0
2020-09-24 03:47:42.549107: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-09-24 03:47:42.549121: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-09-24 03:47:42.549131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-09-24 03:47:42.549143: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-09-24 03:47:42.549151: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-09-24 03:47:42.549164: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-09-24 03:47:42.549174: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-24 03:47:42.549558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-24 03:47:42.549586: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-24 03:47:42.549595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-09-24 03:47:42.549601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-09-24 03:47:42.549975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6199 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:21:00.0, compute capability: 7.5)
I0924 03:47:42.550738 140325876573952 saver.py:1284] Restoring parameters from /opt/models/wgs/model.ckpt
I0924 03:47:43.702764 140325876573952 session_manager.py:500] Running local_init_op.
I0924 03:47:43.766339 140325876573952 session_manager.py:502] Done running local_init_op.
I0924 03:47:44.184749 140325876573952 modeling.py:415] Reloading EMA…
I0924 03:47:44.185623 140325876573952 saver.py:1284] Restoring parameters from /opt/models/wgs/model.ckpt
2020-09-24 03:47:45.236844: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-09-24 03:47:45.652085: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-09-24 03:47:45.654628: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D}}]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D}}]]
[[softmax_tensor_1/_3035]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File “/tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/call_variants.py”, line 491, in <module> tf.compat.v1.app.run() File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py”, line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File “/tmp/Bazel.runfiles_dgqnmzud/runfiles/absl_py/absl/app.py”, line 300, in run _run_main(main, args) File “/tmp/Bazel.runfiles_dgqnmzud/runfiles/absl_py/absl/app.py”, line 251, in _run_main sys.exit(main(argv)) File “/tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/call_variants.py”, line 481, in main use_tpu=FLAGS.use_tpu, File “/tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/call_variants.py”, line 433, in call_variants prediction = next(predictions) File “/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 640, in predict preds_evaluated = mon_sess.run(predictions) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 754, in run run_metadata=run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1259, in run run_metadata=run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1360, in run raise six.reraise(*original_exc_info) File “/tmp/Bazel.runfiles_dgqnmzud/runfiles/six_archive/six.py”, line 686, in reraise raise value File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1345, in run return self._sess.run(*args, **kwargs) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1418, in run run_metadata=run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1176, in run return self._sess.run(*args, **kwargs) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 956, in run run_metadata_ptr) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1180, in _run feed_dict_tensor, options, run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1359, in _do_run run_metadata) File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D (defined at usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D (defined at usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [[softmax_tensor_1/_3035]] 0 successful operations. 0 derived errors ignored.
Original stack trace for ‘InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D’: File “tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/call_variants.py”, line 491, in <module> tf.compat.v1.app.run() File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py”, line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File “tmp/Bazel.runfiles_dgqnmzud/runfiles/absl_py/absl/app.py”, line 300, in run _run_main(main, args) File “tmp/Bazel.runfiles_dgqnmzud/runfiles/absl_py/absl/app.py”, line 251, in _run_main sys.exit(main(argv)) File “tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/call_variants.py”, line 481, in main use_tpu=FLAGS.use_tpu, File “tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/call_variants.py”, line 433, in call_variants prediction = next(predictions) File “usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 622, in predict features, None, ModeKeys.PREDICT, self.config) File “usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 1148, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File “tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/modeling.py”, line 914, in model_fn is_training=mode == tf.estimator.ModeKeys.TRAIN) File “tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/modeling.py”, line 744, in create return self._create(images, num_classes, is_training) File “tmp/Bazel.runfiles_dgqnmzud/runfiles/com_google_deepvariant/deepvariant/modeling.py”, line 1122, in _create images, num_classes, create_aux_logits=False, is_training=is_training) File “usr/local/lib/python3.6/dist-packages/tf_slim/nets/inception_v3.py”, line 587, in inception_v3 depth_multiplier=depth_multiplier) File “usr/local/lib/python3.6/dist-packages/tf_slim/nets/inception_v3.py”, line 117, in inception_v3_base net = layers.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point) File “usr/local/lib/python3.6/dist-packages/tf_slim/ops/arg_scope.py”, line 184, in func_with_args return func(*args, **current_args) File “usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py”, line 1191, in convolution2d conv_dims=2) File “usr/local/lib/python3.6/dist-packages/tf_slim/ops/arg_scope.py”, line 184, in func_with_args return func(*args, **current_args) File “usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py”, line 1089, in convolution outputs = layer.apply(inputs) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 324, in new_func return func(*args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py”, line 1695, in apply return self.call(inputs, *args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/layers/base.py”, line 548, in call outputs = super(Layer, self).call(inputs, *args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py”, line 847, in call outputs = call_fn(cast_inputs, *args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py”, line 234, in wrapper return converted_call(f, options, args, kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py”, line 439, in converted_call return _call_unconverted(f, args, kwargs, options) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py”, line 330, in _call_unconverted return f(*args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/layers/convolutional.py”, line 197, in call outputs = self._convolution_op(inputs, self.kernel) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py”, line 1134, in call return self.conv_op(inp, filter) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py”, line 639, in call return self.call(inp, filter) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py”, line 238, in call name=self.name) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py”, line 2010, in conv2d name=name) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py”, line 1071, in conv2d data_format=data_format, dilations=dilations, name=name) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py”, line 793, in _apply_op_helper op_def=op_def) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 507, in new_func return func(*args, **kwargs) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3360, in create_op attrs, op_def, compute_device) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3429, in _create_op_internal op_def=op_def) File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 1751, in init self._traceback = tf_stack.extract_stack()
real 0m10.613s user 0m11.112s sys 0m4.718s I0924 03:47:46.482943 140410383501056 run_deepvariant.py:364] None Traceback (most recent call last): File “/opt/deepvariant/bin/run_deepvariant.py”, line 369, in <module> app.run(main) File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 299, in run _run_main(main, args) File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 250, in _run_main sys.exit(main(argv)) File “/opt/deepvariant/bin/run_deepvariant.py”, line 362, in main subprocess.check_call(command, shell=True, executable=‘/bin/bash’) File “/usr/lib/python3.6/subprocess.py”, line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command ‘time /opt/deepvariant/bin/call_variants --outfile “/output/intermediate_results_dir/call_variants_output.tfrecord.gz” --examples “/output/intermediate_results_dir/make_examples.tfrecord@30.gz” --checkpoint “/opt/models/wgs/model.ckpt”’ returned non-zero exit status 1.
falllowing my ndvida-smi it consumes all the memmory, htere is a way to limit memmory?
Cheers.
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (3 by maintainers)
Top GitHub Comments
Yes, it fixed the issue. Thanks!
I took the intermediate results and ran into a bigger machine. It worked. The memory consumption rised to 435G !! by far the hungry step. Thanks.