Exit Status 247
See original GitHub issueNot sure what is causing the issue but upon reaching this step DeepVariant failed. Any thoughts on how to fix? I tired to run it in a python2.7 environment and still it somehow is pulling from python 3.6 it seems.
***** Running the command:***** time /opt/deepvariant/bin/call_variants --outfile “/tmp/tmp9_28zx5u/call_variants_output.tfrecord.gz” --examples “/tmp/tmp9_28zx5u/make_examples.tfrecord@1.gz” --checkpoint “/opt/models/wes/model.ckpt”
I0424 15:59:50.266534 139872277903104 call_variants.py:316] Set KMP_BLOCKTIME to 0
2020-04-24 15:59:50.321136: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel® MKL-DNN to use the following CPU instructions in performance critical operations: AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-04-24 15:59:50.376605: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2020-04-24 15:59:50.378224: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56a1fd0 executing computations on platform Host. Devices:
2020-04-24 15:59:50.378283: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2020-04-24 15:59:50.380979: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
I0424 15:59:50.447775 139872277903104 modeling.py:563] Initializing model with random parameters
W0424 15:59:50.449538 139872277903104 estimator.py:1821] Using temporary folder as model directory: /tmp/tmp3bl4tsmc
I0424 15:59:50.450443 139872277903104 estimator.py:212] Using config: {‘_model_dir’: ‘/tmp/tmp3bl4tsmc’, ‘_tf_random_seed’: None, ‘_save_summary_steps’: 100, ‘_save_checkpoints_steps’: None, ‘_save_checkpoints_secs’: 600, ‘_session_config’: , ‘_keep_checkpoint_max’: 100000, ‘_keep_checkpoint_every_n_hours’: 10000, ‘_log_step_count_steps’: 100, ‘_train_distribute’: None, ‘_device_fn’: None, ‘_protocol’: None, ‘_eval_distribute’: None, ‘_experimental_distribute’: None, ‘_experimental_max_worker_delay_secs’: None, ‘_session_creation_timeout_secs’: 7200, ‘_service’: None, ‘_cluster_spec’: <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3659263518>, ‘_task_type’: ‘worker’, ‘_task_id’: 0, ‘_global_id_in_cluster’: 0, ‘_master’: ‘’, ‘_evaluation_master’: ‘’, ‘_is_chief’: True, ‘_num_ps_replicas’: 0, ‘_num_worker_replicas’: 1}
I0424 15:59:50.451262 139872277903104 call_variants.py:384] Writing calls to /tmp/tmp9_28zx5u/call_variants_output.tfrecord.gz
W0424 15:59:50.467876 139872277903104 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
I0424 15:59:50.501495 139872277903104 data_providers.py:369] self.input_read_threads=8
W0424 15:59:50.501965 139872277903104 deprecation.py:323] From /tmp/Bazel.runfiles_sszxydhb/runfiles/com_google_deepvariant/deepvariant/data_providers.py:374: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)
instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic
.
I0424 15:59:50.681574 139872277903104 data_providers.py:376] self.input_map_threads=48
W0424 15:59:50.681832 139872277903104 deprecation.py:323] From /tmp/Bazel.runfiles_sszxydhb/runfiles/com_google_deepvariant/deepvariant/data_providers.py:381: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.map(map_func, num_parallel_calls)
followed by tf.data.Dataset.batch(batch_size, drop_remainder)
. Static tf.data optimizations will take care of using the fused implementation.
I0424 15:59:51.794167 139872277903104 estimator.py:1147] Calling model_fn.
W0424 15:59:51.800228 139872277903104 deprecation.py:323] From /tmp/Bazel.runfiles_sszxydhb/runfiles/com_google_deepvariant/deepvariant/modeling.py:885: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
W0424 15:59:51.806498 139872277903104 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use layer.__call__
method instead.
I0424 16:00:02.682547 139872277903104 estimator.py:1149] Done calling model_fn.
I0424 16:00:06.021238 139872277903104 monitored_session.py:240] Graph was finalized.
I0424 16:00:06.037272 139872277903104 saver.py:1284] Restoring parameters from /opt/models/wes/model.ckpt
I0424 16:00:10.817819 139872277903104 session_manager.py:500] Running local_init_op.
I0424 16:00:11.060626 139872277903104 session_manager.py:502] Done running local_init_op.
I0424 16:00:12.403780 139872277903104 modeling.py:413] Reloading EMA…
I0424 16:00:12.405867 139872277903104 saver.py:1284] Restoring parameters from /opt/models/wes/model.ckpt
I0424 16:00:48.634510 139872277903104 call_variants.py:402] Processed 1 examples in 1 batches [5816.472 sec per 100]
real 4m2.970s user 5m54.674s sys 1m14.107s I0424 16:03:48.557898 140277446174464 run_deepvariant.py:321] None Traceback (most recent call last): File “/opt/deepvariant/bin/run_deepvariant.py”, line 332, in <module> app.run(main) File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 299, in run _run_main(main, args) File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 250, in _run_main sys.exit(main(argv)) File “/opt/deepvariant/bin/run_deepvariant.py”, line 319, in main subprocess.check_call(command, shell=True, executable=‘/bin/bash’) File “/usr/lib/python3.6/subprocess.py”, line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command ‘time /opt/deepvariant/bin/call_variants --outfile “/tmp/tmp9_28zx5u/call_variants_output.tfrecord.gz” --examples “/tmp/tmp9_28zx5u/make_examples.tfrecord@1.gz” --checkpoint “/opt/models/wes/model.ckpt”’ returned non-zero exit status 247.
Issue Analytics
- State:
- Created 3 years ago
- Comments:14
Top GitHub Comments
@ptrebert Glad it worked 😃 DeepVariant is nice but it’s written more complex than it has to be, and when you add Docker/Singularity on top of that, that injects many layers of complexity (not easily exposed) creating opportunity for heisenbugs. Docker/Singularity are really meant for smaller applications, since their interaction with the kernel become multiplicative rather than additive for larger applications, which you noticed indirectly via the memory resource requirements.
@pgrosu A substantial increase (almost double) of the memory available to the DeepVariant cluster jobs worked, both jobs now succeeded. Thanks for your help.