Issue when enabling full_tensorflow_log
See original GitHub issueWhen enabling full_tensorflow_log, an issue arises after the first update of the model. Here is the issue:
Traceback (most recent call last):
File "sb_train.py", line 240, in <module>
model.learn(n_timesteps, callback=callback_learning, seed=None, log_interval=args.log_interval)
File "/data_GPU/hassan/SurgerySimulator/Local_RL_Cataract/ml-agents-gym/stable_baselines/ppo2/ppo2.py", line 362, in learn
update=timestep, cliprange_vf=cliprange_vf_now))
File "/data_GPU/hassan/SurgerySimulator/Local_RL_Cataract/ml-agents-gym/stable_baselines/ppo2/ppo2.py", line 294, in _train_step
writer.add_run_metadata(run_metadata, 'step%d' % (update * update_fac))
File "/opt/conda/lib/python3.6/site-packages/tensorflow/python/summary/writer/writer.py", line 262, in add_run_metadata
raise ValueError("The provided tag was already used for this event type")
ValueError: The provided tag was already used for this event type
Code example I’ve just initialized PPO2 with this full_tensorflow_log enabled… something like:
model = PPO2(env=env, tensorboard_log=tensorboard_log, verbose=1, **hyperparams,
full_tensorboard_log=True)
I’ve added some prints to double check what’s happening… something like this one: In ppo2.py:
print('step %d * %d = %d' % (update, update_fac, (update * update_fac)))
writer.add_run_metadata(run_metadata, 'step%d' % (update * update_fac))
Here is an example of the output :
noptepochs 10
n_batch 2048
batch_size 32
num_timesteps 2048
update_fac 4.000000
epoch_num 0
step 1159 * 4 = 4636
step 1169 * 4 = 4676
step 1179 * 4 = 4716
step 1189 * 4 = 4756
step 1199 * 4 = 4796
step 1209 * 4 = 4836
epoch_num 1
step 1219 * 4 = 4876
step 1229 * 4 = 4916
step 1239 * 4 = 4956
step 1249 * 4 = 4996
step 1259 * 4 = 5036
step 1269 * 4 = 5076
step 1279 * 4 = 5116
epoch_num 2
step 1289 * 4 = 5156
step 1299 * 4 = 5196
step 1309 * 4 = 5236
step 1319 * 4 = 5276
step 1329 * 4 = 5316
step 1339 * 4 = 5356
epoch_num 3
step 1349 * 4 = 5396
step 1359 * 4 = 5436
step 1369 * 4 = 5476
step 1379 * 4 = 5516
step 1389 * 4 = 5556
step 1399 * 4 = 5596
epoch_num 4
step 1409 * 4 = 5636
step 1419 * 4 = 5676
step 1429 * 4 = 5716
step 1439 * 4 = 5756
step 1449 * 4 = 5796
step 1459 * 4 = 5836
step 1469 * 4 = 5876
epoch_num 5
step 1479 * 4 = 5916
step 1489 * 4 = 5956
step 1499 * 4 = 5996
step 1509 * 4 = 6036
step 1519 * 4 = 6076
step 1529 * 4 = 6116
epoch_num 6
step 1539 * 4 = 6156
step 1549 * 4 = 6196
step 1559 * 4 = 6236
step 1569 * 4 = 6276
step 1579 * 4 = 6316
step 1589 * 4 = 6356
step 1599 * 4 = 6396
epoch_num 7
step 1609 * 4 = 6436
step 1619 * 4 = 6476
step 1629 * 4 = 6516
step 1639 * 4 = 6556
step 1649 * 4 = 6596
step 1659 * 4 = 6636
epoch_num 8
step 1669 * 4 = 6676
step 1679 * 4 = 6716
step 1689 * 4 = 6756
step 1699 * 4 = 6796
step 1709 * 4 = 6836
step 1719 * 4 = 6876
epoch_num 9
step 1729 * 4 = 6916
step 1739 * 4 = 6956
step 1749 * 4 = 6996
step 1759 * 4 = 7036
step 1769 * 4 = 7076
step 1779 * 4 = 7116
step 1789 * 4 = 7156
------------------------------------
| approxkl | 0.11996295 |
| clipfrac | 0.3465332 |
| ep_len_mean | 589 |
| ep_reward_mean | -0.959 |
| explained_variance | 0.0642 |
| fps | 13 |
| n_updates | 1 |
| policy_entropy | 9.949189 |
| policy_loss | -0.04627157 |
| serial_timesteps | 2048 |
| time_elapsed | 2.62e-05 |
| total_timesteps | 2048 |
| value_loss | 0.2877953 |
------------------------------------
batch_size 32
num_timesteps 4096
update_fac 4.000000
epoch_num 0
step 1669 * 4 = 6676 #THIS KEY ALREADY EXISTS.. THIS IS THE ISSUE
PPO parameter file:
n_agents: 1
n_timesteps: !!float 1e8
policy: 'CnnPolicy'
normalize: true
n_steps: 2048 # 2048 / number of agents
nminibatches: 64
lam: 0.95
gamma: 0.99
noptepochs: 10
ent_coef: 0.0
learning_rate: 0.001
cliprange: 0.2
System Info Describe the characteristic of your environment:
- Describe how the library was installed: Actually, I’m using the source code as it is so I can modify it and add things to it.
- GPU models and configuration: GTX 1080 Ti
- Python version 3.5
- Tensorflow version 1.8
I know that I can add something like date and time to get this key unique. Whether there is a something missing from my side or it’s a bug in the library, I’m still looking for a solution. If anyone has a quick fix that would be great. Thanks.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7
Top Results From Across the Web
Tensorflow lite logs do not get enabled until any LOG(FATAL ...
I am unable to generate any logs on console using toco tool of tensorflow. _ use export TF_CPP_MIN_LOG_LEVEL=1 , the info logs do...
Read more >Disable Tensorflow debugging information - python
You can disable all debugging logs using os.environ : import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' import tensorflow as tf.
Read more >Debugging Numerical Issues in TensorFlow Programs Using ...
The debug information covers various aspects of TensorFlow runtime. In TF2, it includes the full history of eager execution, graph building ...
Read more >tf.debugging.set_log_device_placement | TensorFlow v2.11.0
Turns logging for device placement decisions on or off. ... so knowing where operations execute is useful for debugging performance issues.
Read more >tf.debugging.enable_check_numerics | TensorFlow v2.11.0
Enable tensor numerics checking in an eager/graph unified fashion. ... The function call generates an -infinity when the Log
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, just wanted to mention that I also faced this issue while using PPO2 with full_tensorboard_log=True on a custom environment.
Okay, thanks for the fast reply 😃