GPU training: program is "killed" after "XLA compilation"
See original GitHub issueI tried training the code on a GPU, after including the changes made earlier today, I am having a memory issues. Just after 2020-01-08 16:11:33.715292: I tensorflow/compiler/jit/xla_compilation_cache.cc:238] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
message, the program crashes with a Killed
message.
Here is an extended log for your attention:
> $ t5_mesh_transformer --model_dir="danielk-files/models" --t5_tfds_data_dir="danielk-files" --gin_file="dataset.gin" --gin_param="utils.run.mesh_shape = 'model:2,batch:1'" --gin_param="utils.run.mesh_devices = ['gpu:0','gpu:1']" --gin_param="MIXTURE_NAME = 'glue_mrpc_v002'" --gin_file="gs://t5-data/pretrained_models/small/operative_config.gin"
.
.
.
Colocation members, user-requested devices, and framework assigned devices, if any:
decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/shape (Const)
decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/min (Const)
decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/max (Const)
decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/sub (Sub)
decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform/mul (Mul)
decoder/block_005/layer_001/EncDecAttention/o_slice_1/Initializer/random_uniform (Add)
decoder/block_005/layer_001/EncDecAttention/o_slice_1 (VariableV2) /device:GPU:1
decoder/block_005/layer_001/EncDecAttention/o_slice_1/Assign (Assign) /device:GPU:1
decoder/block_005/layer_001/EncDecAttention/o_slice_1/read (Identity) /device:GPU:1
decoder/block_005/layer_001/EncDecAttention/o_1/parallel_1_1/Assign (Assign) /device:GPU:1
assign_1/parallel_1_96/Assign (Assign) /device:GPU:1
2020-01-08 16:10:35.215915: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/shape (Const)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/min (Const)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/max (Const)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/sub (Sub)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform/mul (Mul)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Initializer/random_uniform (Add)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0 (VariableV2) /device:GPU:0
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/Assign (Assign) /device:GPU:0
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_0/read (Identity) /device:GPU:0
decoder/block_005/layer_002/DenseReluDense/wi/kernel_1/parallel_0_1/Assign (Assign) /device:GPU:0
assign_1/parallel_0_97/Assign (Assign) /device:GPU:0
2020-01-08 16:10:35.216672: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/shape (Const)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/min (Const)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/max (Const)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/sub (Sub)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform/mul (Mul)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Initializer/random_uniform (Add)
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1 (VariableV2) /device:GPU:1
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/Assign (Assign) /device:GPU:1
decoder/block_005/layer_002/DenseReluDense/wi/kernel_slice_1/read (Identity) /device:GPU:1
decoder/block_005/layer_002/DenseReluDense/wi/kernel_1/parallel_1_1/Assign (Assign) /device:GPU:1
assign_1/parallel_1_97/Assign (Assign) /device:GPU:1
2020-01-08 16:10:35.217428: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/shape (Const)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/min (Const)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/max (Const)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/sub (Sub)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform/mul (Mul)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Initializer/random_uniform (Add)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0 (VariableV2) /device:GPU:0
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/Assign (Assign) /device:GPU:0
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_0/read (Identity) /device:GPU:0
decoder/block_005/layer_002/DenseReluDense/wo/kernel_1/parallel_0_1/Assign (Assign) /device:GPU:0
assign_1/parallel_0_98/Assign (Assign) /device:GPU:0
2020-01-08 16:10:35.218184: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/shape (Const)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/min (Const)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/max (Const)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/sub (Sub)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform/mul (Mul)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Initializer/random_uniform (Add)
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1 (VariableV2) /device:GPU:1
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/Assign (Assign) /device:GPU:1
decoder/block_005/layer_002/DenseReluDense/wo/kernel_slice_1/read (Identity) /device:GPU:1
decoder/block_005/layer_002/DenseReluDense/wo/kernel_1/parallel_1_1/Assign (Assign) /device:GPU:1
assign_1/parallel_1_98/Assign (Assign) /device:GPU:1
2020-01-08 16:10:35.254151: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/shape (Const)
stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/min (Const)
stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/max (Const)
stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/sub (Sub)
stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform/mul (Mul)
stacked/shared/embedding_slot_vr_slice_0/Initializer/random_uniform (Add)
stacked/shared/embedding_slot_vr_slice_0 (VariableV2) /device:GPU:0
stacked/shared/embedding_slot_vr_slice_0/Assign (Assign) /device:GPU:0
stacked/shared/embedding_slot_vr_slice_0/read (Identity) /device:GPU:0
stacked/shared/embedding_slot_vr/parallel_0_1/Assign (Assign) /device:GPU:0
assign/parallel_0/Assign (Assign) /device:GPU:0
2020-01-08 16:10:35.254968: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/shape (Const)
stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/min (Const)
stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/max (Const)
stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/sub (Sub)
stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform/mul (Mul)
stacked/shared/embedding_slot_vr_slice_1/Initializer/random_uniform (Add)
stacked/shared/embedding_slot_vr_slice_1 (VariableV2) /device:GPU:1
stacked/shared/embedding_slot_vr_slice_1/Assign (Assign) /device:GPU:1
stacked/shared/embedding_slot_vr_slice_1/read (Identity) /device:GPU:1
stacked/shared/embedding_slot_vr/parallel_1_1/Assign (Assign) /device:GPU:1
assign/parallel_1/Assign (Assign) /device:GPU:1
2020-01-08 16:10:35.255704: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
shared/embedding_slot_vc_slice_0/Initializer/random_uniform/shape (Const)
shared/embedding_slot_vc_slice_0/Initializer/random_uniform/min (Const)
shared/embedding_slot_vc_slice_0/Initializer/random_uniform/max (Const)
shared/embedding_slot_vc_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
shared/embedding_slot_vc_slice_0/Initializer/random_uniform/sub (Sub)
shared/embedding_slot_vc_slice_0/Initializer/random_uniform/mul (Mul)
shared/embedding_slot_vc_slice_0/Initializer/random_uniform (Add)
shared/embedding_slot_vc_slice_0 (VariableV2) /device:GPU:0
shared/embedding_slot_vc_slice_0/Assign (Assign) /device:GPU:0
shared/embedding_slot_vc_slice_0/read (Identity) /device:GPU:0
shared/embedding_slot_vc_1/parallel_0_1/Assign (Assign) /device:GPU:0
assign/parallel_0_1/Assign (Assign) /device:GPU:0
2020-01-08 16:10:35.256476: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
shared/embedding_slot_vc_slice_1/Initializer/random_uniform/shape (Const)
shared/embedding_slot_vc_slice_1/Initializer/random_uniform/min (Const)
shared/embedding_slot_vc_slice_1/Initializer/random_uniform/max (Const)
shared/embedding_slot_vc_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
shared/embedding_slot_vc_slice_1/Initializer/random_uniform/sub (Sub)
shared/embedding_slot_vc_slice_1/Initializer/random_uniform/mul (Mul)
shared/embedding_slot_vc_slice_1/Initializer/random_uniform (Add)
shared/embedding_slot_vc_slice_1 (VariableV2) /device:GPU:1
shared/embedding_slot_vc_slice_1/Assign (Assign) /device:GPU:1
shared/embedding_slot_vc_slice_1/read (Identity) /device:GPU:1
shared/embedding_slot_vc_1/parallel_1_1/Assign (Assign) /device:GPU:1
assign/parallel_1_1/Assign (Assign) /device:GPU:1
2020-01-08 16:10:35.257669: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/shape (Const)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/min (Const)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/max (Const)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/sub (Sub)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform/mul (Mul)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Initializer/random_uniform (Add)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0 (VariableV2) /device:GPU:0
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/Assign (Assign) /device:GPU:0
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_0/read (Identity) /device:GPU:0
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr/parallel_0_1/Assign (Assign) /device:GPU:0
assign/parallel_0_2/Assign (Assign) /device:GPU:0
2020-01-08 16:10:35.258527: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/shape (Const)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/min (Const)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/max (Const)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/sub (Sub)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform/mul (Mul)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Initializer/random_uniform (Add)
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1 (VariableV2) /device:GPU:1
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/Assign (Assign) /device:GPU:1
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr_slice_1/read (Identity) /device:GPU:1
stacked/encoder/block_000/layer_000/SelfAttention/q_slot_vr/parallel_1_1/Assign (Assign) /device:GPU:1
assign/parallel_1_2/Assign (Assign) /device:GPU:1
2020-01-08 16:10:35.260295: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/shape (Const)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/min (Const)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/max (Const)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/sub (Sub)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform/mul (Mul)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Initializer/random_uniform (Add)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0 (VariableV2) /device:GPU:0
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/Assign (Assign) /device:GPU:0
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_0/read (Identity) /device:GPU:0
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v/parallel_0_1/Assign (Assign) /device:GPU:0
assign/parallel_0_3/Assign (Assign) /device:GPU:0
2020-01-08 16:10:35.261051: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/shape (Const)
Colocation members, user-requested devices, and framework assigned devices, if any:
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/shape (Const)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/min (Const)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/max (Const)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/sub (Sub)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform/mul (Mul)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Initializer/random_uniform (Add)
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1 (VariableV2) /device:GPU:1
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/Assign (Assign) /device:GPU:1
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v_slice_1/read (Identity) /device:GPU:1
stacked/encoder/block_000/layer_000/SelfAttention/relative_attention_bias_slot_v/parallel_1_1/Assign (Assign) /device:GPU:1
assign/parallel_1_3/Assign (Assign) /device:GPU:1
2020-01-08 16:10:35.262048: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/shape (Const)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/min (Const)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/max (Const)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/sub (Sub)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform/mul (Mul)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Initializer/random_uniform (Add)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0 (VariableV2) /device:GPU:0
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/Assign (Assign) /device:GPU:0
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_0/read (Identity) /device:GPU:0
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc/parallel_0_1/Assign (Assign) /device:GPU:0
assign/parallel_0_4/Assign (Assign) /device:GPU:0
2020-01-08 16:10:35.262775: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/shape (Const)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/min (Const)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/max (Const)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/sub (Sub)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform/mul (Mul)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Initializer/random_uniform (Add)
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1 (VariableV2) /device:GPU:1
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/Assign (Assign) /device:GPU:1
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc_slice_1/read (Identity) /device:GPU:1
stacked/encoder/block_000/layer_001/DenseReluDense/wi/kernel_slot_vc/parallel_1_1/Assign (Assign) /device:GPU:1
assign/parallel_1_4/Assign (Assign) /device:GPU:1
2020-01-08 16:10:35.288719: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:0' assigned_device_name_='' resource_device_name_='/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/shape (Const)
decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/min (Const)
decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/max (Const)
decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/RandomUniform (RandomUniform)
decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/sub (Sub)
decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform/mul (Mul)
decoder/final_layer_norm/scale_slot_v_slice_0/Initializer/random_uniform (Add)
decoder/final_layer_norm/scale_slot_v_slice_0 (VariableV2) /device:GPU:0
decoder/final_layer_norm/scale_slot_v_slice_0/Assign (Assign) /device:GPU:0
decoder/final_layer_norm/scale_slot_v_slice_0/read (Identity) /device:GPU:0
decoder/final_layer_norm/scale_slot_v_1/parallel_0_1/Assign (Assign) /device:GPU:0
assign/parallel_0_5/Assign (Assign) /device:GPU:0
2020-01-08 16:10:35.289603: W tensorflow/core/common_runtime/colocation_graph.cc:983] Failed to place the graph without changing the devices of some resources. Some of the operations (that had to be colocated with resource generating operations) are not supported on the resources' devices. Current candidate devices are [
/job:localhost/replica:0/task:0/device:CPU:0].
See below for details of this colocation group:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=-1 requested_device_name_='/device:GPU:1' assigned_device_name_='' resource_device_name_='/device:GPU:1' supported_device_types_=[CPU] possible_devices_=[]
Assign: CPU
RandomUniform: CPU XLA_CPU XLA_GPU
Const: CPU XLA_CPU XLA_GPU
Mul: CPU XLA_CPU XLA_GPU
Sub: CPU XLA_CPU XLA_GPU
Add: CPU XLA_CPU XLA_GPU
Identity: CPU XLA_CPU XLA_GPU
VariableV2: CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/shape (Const)
decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/min (Const)
decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/max (Const)
decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/RandomUniform (RandomUniform)
decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/sub (Sub)
decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform/mul (Mul)
decoder/final_layer_norm/scale_slot_v_slice_1/Initializer/random_uniform (Add)
decoder/final_layer_norm/scale_slot_v_slice_1 (VariableV2) /device:GPU:1
decoder/final_layer_norm/scale_slot_v_slice_1/Assign (Assign) /device:GPU:1
decoder/final_layer_norm/scale_slot_v_slice_1/read (Identity) /device:GPU:1
decoder/final_layer_norm/scale_slot_v_1/parallel_1_1/Assign (Assign) /device:GPU:1
assign/parallel_1_5/Assign (Assign) /device:GPU:1
INFO:tensorflow:Running local_init_op.
I0108 16:10:37.353017 140685207750400 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0108 16:10:37.880563 140685207750400 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Before copy master to slices.
I0108 16:10:38.634351 140685207750400 ops.py:5541] Before copy master to slices.
INFO:tensorflow:Done with copy master to slices.
I0108 16:10:39.607687 140685207750400 ops.py:5543] Done with copy master to slices.
INFO:tensorflow:Saving checkpoints for 0 into danielk-files/models/model.ckpt.
I0108 16:10:51.266983 140685207750400 basic_session_run_hooks.py:606] Saving checkpoints for 0 into danielk-files/models/model.ckpt.
INFO:tensorflow:Before Save.
I0108 16:10:51.276291 140685207750400 ops.py:5516] Before Save.
INFO:tensorflow:About to write a checkpoint
I0108 16:10:52.409570 140685207750400 ops.py:5518] About to write a checkpoint
INFO:tensorflow:danielk-files/models/model.ckpt-0 is not in all_model_checkpoint_paths. Manually adding it.
I0108 16:10:53.351364 140685207750400 checkpoint_management.py:95] danielk-files/models/model.ckpt-0 is not in all_model_checkpoint_paths. Manually adding it.
INFO:tensorflow:Done writing checkpoint.
I0108 16:10:55.473980 140685207750400 ops.py:5521] Done writing checkpoint.
import feature targets[[[7072 1 7072 1 7072 1 7072 1 7072 1 7072 1 7072 1 7072 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][59 834 15 1169 15592 1 7072 1 7072 1 59 834 15 1169 15592 1 7072 1 7072 1 7072 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...]]...]import feature targets_segmentation[[[1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][1 1 1 1 1 1 2 2 3 3 4 4 4 4 4 4 5 5 6 6 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0...]]...]
import feature inputs[[[3 51 52 102 75 7142 536 10 86 3 9 2493 865 3 6 3 88 243 34 4283 112 596 141 9717 3 9 720 710 3 5 7142 357 10 3969 40 18276 28325 26 16 3 9 2493 4683 2818 24 112 563 164 43 9717 96 3 9 720 710 3 5 96 1 3 51 52 102 75 7142 536 10 96 30628 65 2994 186 203 13 3014 3 233 581 385 42 150 2259 3 6 96 10261 1836 243 16 8 2493 3 5 7142 357 10 96 30628 65 2994 186 203 13 3014 3190 4148 8852 581 385 42 150 2259 3 6 96 243 3 5494 2146 15702 6187 10261 1836 3 5 1 3 51 52 102 75 7142 536 10 25103 3 6 8 1113 13 1473 7 6914 19 5657 30587 7 190 8 1719 3 5 7142 357 10 25103 3 6 1473 3 31 7 6914 33 5657 30587 7 190 8 1719 3 5 1 3 51 52 102 75 7142 536 10 907 641 65 1866 8 690 1514 6154 770 16 17524 21 3586 8 166 1751 13 4311 8874 3 5 7142 357 10 907 65 1866 1514 6154 770 16 17524 21 12385 12 942 4311 8874 3 5 1 3 51 52 102 75 7142 536 10 216 3 9925 38 46 1038 4297 8211 30 2645 6834 7 12 36 3 9 14625 2378 11 37 101 1639 222 7505 3 5 7142 357 10 71 9396 1424 8211 113 1279 30 1267 379 2645 6834 7 304 493 3 9 14625 2378 3 58 1 3 51 52 102 75 7142 536 10 23066 43 4313 10209 12778 13485 30 3 10363 3972 7159 1296 24 164 554 10475 32 7 15 6917 12 824 6716 3 5 7142 357 10 9864 11 112 372 43 4313 10209 12778 13485 24 9296 1137 42 554 10475 32 7 15 6917 12 824 6716 3 5 1 3 51 52 102 75 7142 536 10 12394 4794 25394 11385 7 16 1798 3370 2213 4599 3 31 37 29210 127 3 31 18786 21 8 3 4060 189 2041 18050 3 31 7 29952 21670 13 1718 5396 6751 262 1014 21537 3 5 7142 357 10 1881 18 279 2741 7 2213 4599 3 31 8 29210 127 3 31 18786 5978 7 16 8 18050 3 31 7 29952 21670 13 1718 5396 6751 262 1014 21537 1701 3 5 1 3 51 52 102 75 7142 536 10 86 119 1234 3 6 17240 6610 19 3 30273 26 12 726 21 3 476 3205 3426 3 31 7 9953 581 2900 20055 17240 3 5 7142 357 10 9046 6402 3 6 17240 6610 56 36 2418 53 8 2876 21 3 476 3205 3 31 9953 581 2900 20055 17240 3 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][3 51 52 102 75 7142 536 10 71 903 13 8 20 900 6682 646 1187 116 3 9 388 10719 112 443 190 3 9 3 17208 7208 512 16 4625 20268 30 2875 3 5 7142 357 10 37 643 13 3 9 7584 7797 365 3 9 4459 3 2046 102 16 851 13 3 9 443 24 3 102 22411 190 3 9 3 17208 7208 512 16 4625 20268 3 5 1 3 51 52 102 75 7142 536 10 1363 1793 23 19001 23 3 60 3007 5100 1363 2974 32 5208 3 6 2145 112 21029 130 96 21001 96 3 5 7142 357 10 1793 23 19001 23 1219 19644 44 112 828 24 2974 32 5208 3 31 7 21029 130 96 21001 3 5 96 1 3 51 52 102 75 7142 536 10 24583 4300 1390 12 7464 19852 1213 2747 23620 13335 21 81 1514 3 4060 770 16 1723 3 6 8 688 243 1701 3 5 7142 357 10 24583 4300 10052 5 19 3 19031 2747 23620 13335 3937 3 5 3 6 3 9 19852 1213 8106 13 331 18 20393 889 3 6 21 3241 1514 3 4060 770 3 6 8 688 243 1701 3 5 1 3 51 52 102 75 7142 536 10 25394 243 30 2875 24 66 898 4627 724 13928 11 133 36 5285 21 4845 11 4798 3 5 7142 357 10 216 243 8 20395 3 31 7 4627 56 36 19257 11 5285 21 4845 11 4798 3 5 1 3 51 52 102 75 7142 536 10 37 29 8 5015 54 1520 430 356 13 452 3507 7 30 7954 3 287 4246 53 1358 3 6 8 5191 243 3 5 7142 357 10 299 8 5191 3 6 5181 1060 3 6 243 8 5015 54 1520 430 356 13 452 3507 7 30 7954 3 287 4246 53 1358 227 2239 3 5 1 3 51 52 102 75 7142 536 10 486 709 2838 797 12673 43 118 4792 16 1041 437 8905 10126 779 4719 147 30 932 209 3 5 7142 357 10 886 386 9611 797 11 2390 12673 43 118 4792 437 8905 10126 779 4719 147 16 7457 30 932 209 3 5 1 3 51 52 102 75 7142 536 10 37 5923 3271 7 13 1473 3 6 4623 11 662 2069 6578 9352 43 1736 16 14465 11 3754 3 9 1487 21 70 3518 1034 563 53 3 5 7142 357 10 37 3427 6323 7 13 1473 3 6 4623 11 662 2808 6578 1440 3814 10663 30 2818 24 356 1390 16 4644 21 3 9 307 18 9 13106 3518 1181 18 14389 2050 16 412 172 346 2168 5627 3 5 1 0 0 0 0 0 0 0 0 0 0...]]...]
import feature inputs_segmentation[[[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0][1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 0 0 0 0 0 0 0 0 0 0...]]...]
2020-01-08 16:11:33.715292: I tensorflow/compiler/jit/xla_compilation_cache.cc:238] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
Killed
My GPU specs:
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Wed Jan 8 16:25:55 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro GV100 On | 00000000:01:00.0 Off | Off |
| 29% 41C P2 25W / 250W | 0MiB / 32478MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 8000 On | 00000000:02:00.0 Off | Off |
| 33% 28C P8 11W / 260W | 0MiB / 48571MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Memory info:
top - 16:29:15 up 47 days, 7:31, 4 users, load average: 0.11, 0.54, 1.95
Tasks: 648 total, 1 running, 647 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.7 sy, 0.0 ni, 99.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
GiB Mem : 62.842 total, 61.903 free, 0.506 used, 0.433 buff/cache
GiB Swap: 8.000 total, 7.730 free, 0.270 used. 61.731 avail Mem
and pip packages:
$ pip list
Package Version
----------------------------- ----------
absl-py 0.9.0
alabaster 0.7.12
allennlp 0.9.0
astor 0.8.1
attrs 19.3.0
Babel 2.8.0
blis 0.2.4
boto 2.49.0
boto3 1.10.49
botocore 1.13.49
cachetools 4.0.0
certifi 2019.11.28
chardet 3.0.4
Click 7.0
conllu 1.3.1
cycler 0.10.0
cymem 2.0.3
dill 0.3.1.1
distro 1.4.0
docutils 0.15.2
editdistance 0.5.3
flaky 3.6.1
Flask 1.1.1
Flask-Cors 3.0.8
ftfy 5.6
future 0.18.2
gast 0.2.2
gevent 1.4.0
gin-config 0.3.0
google-api-core 1.15.0
google-api-python-client 1.7.11
google-auth 1.10.0
google-auth-httplib2 0.0.3
google-cloud-core 1.1.0
google-cloud-storage 1.24.1
google-compute-engine 2.8.13
google-pasta 0.1.8
google-resumable-media 0.5.0
googleapis-common-protos 1.6.0
greenlet 0.4.15
grpcio 1.26.0
h5py 2.10.0
httplib2 0.15.0
idna 2.8
imagesize 1.2.0
importlib-metadata 1.3.0
itsdangerous 1.1.0
Jinja2 2.10.3
jmespath 0.9.4
joblib 0.14.1
jsonnet 0.14.0
jsonpickle 1.2
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
kiwisolver 1.1.0
Markdown 3.1.1
MarkupSafe 1.1.1
matplotlib 3.1.2
mesh-tensorflow 0.1.9
more-itertools 8.0.2
murmurhash 1.0.2
nltk 3.4.5
numpy 1.18.1
numpydoc 0.9.2
oauth2client 4.1.3
opt-einsum 3.1.0
overrides 2.8.0
packaging 20.0
pandas 0.25.3
parsimonious 0.8.1
pip 19.3.1
plac 0.9.6
pluggy 0.13.1
portalocker 1.5.2
preshed 2.0.1
promise 2.3
protobuf 3.11.2
py 1.8.1
pyasn1 0.4.8
pyasn1-modules 0.2.7
Pygments 2.5.2
pyparsing 2.4.6
pytest 5.3.2
python-dateutil 2.8.1
pytorch-pretrained-bert 0.6.2
pytorch-transformers 1.1.0
pytz 2019.3
regex 2020.1.8
requests 2.22.0
responses 0.10.9
rouge-score 0.0.3
rsa 4.0
s3transfer 0.2.1
sacrebleu 1.4.3
scikit-learn 0.22.1
scipy 1.4.1
sentencepiece 0.1.85
setuptools 44.0.0
six 1.13.0
snowballstemmer 2.0.0
spacy 2.1.9
Sphinx 2.3.1
sphinxcontrib-applehelp 1.0.1
sphinxcontrib-devhelp 1.0.1
sphinxcontrib-htmlhelp 1.0.2
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.2
sphinxcontrib-serializinghtml 1.1.3
sqlparse 0.3.0
srsly 1.0.1
t5 0.1.7
tensorboard 1.15.0
tensorboardX 2.0
tensorflow 1.15.0
tensorflow-datasets 1.3.2
tensorflow-estimator 1.15.1
tensorflow-metadata 0.21.0
tensorflow-text 1.15.0rc0
termcolor 1.1.0
thinc 7.0.8
torch 1.3.1
tqdm 4.41.1
typing 3.7.4.1
Unidecode 1.1.1
uritemplate 3.0.1
urllib3 1.25.7
wasabi 0.6.0
wcwidth 0.1.8
Werkzeug 0.16.0
wheel 0.33.6
word2number 1.1
wrapt 1.11.2
zipp 0.6.0
Issue Analytics
- State:
- Created 4 years ago
- Comments:13
Top Results From Across the Web
XLA: Optimizing Compiler for Machine Learning
XLA is a compiler-based linear algebra execution engine. ... When a TensorFlow program is run, all of the operations are executed ...
Read more >How to Accelerate your PyTorch GPU Training with XLA
In this post we explore the potential of optimizing our PyTorch training step by using XLA compilation. We will begin with a brief...
Read more >Tensorflow see's GPU but only uses xla_cpu and crashes ...
I was training my models when it felt like they were running very slowly. After some digging I noticed that device GPU 0...
Read more >How to accelerate TensorFlow models with the XLA ...
The XLA compiler improves the execution speed by fusing tasks on a single GPU kernel which increases the execution speed. Fused operations ...
Read more >TensorFlow User Guide :: NVIDIA Deep Learning ...
This guide also provides documentation on the NVIDIA TensorFlow parameters ... node is executed, cluster C is compiled into an XLA binary.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
It’s looking for
libcu*.so.10.0
but (according to yournvidia-smi
printout at least) you have v10.1 which probably names the fileslibcu*.so.10.1
.Have a look at https://github.com/tensorflow/tensorflow/issues/26182
This is a bit tricky because t5 has an explicit requirement on earlier tensorflow versions.
For those who have the same issue, I did use a conda environment to install the following packages:
and
Now after starting the code I see: