tensorflow.python.framework.errors_impl.InvalidArgumentError message while trying to training a model
See original GitHub issue(musika) C:\Users\ПК>python X:\musika\musika_train.py --train_path X:\musika\encodings --log_path X:\logs --mixed_precision False
Using GPU without mixed precision...
Calculating total number of samples in data folder...
Found 1720 total samples
Dataset is ready!
Checking if models are already available...
Models are available!
X:\cnd\envs\musika\lib\site-packages\keras\initializers\initializers_v2.py:120: UserWarning: The initializer HeUniform is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
warnings.warn(
Encoders/Decoders loaded from checkpoints/ae
Networks initialized
Critic params: 20786689
Generator params: 15499530
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
CLICK ON LINK BELOW TO OPEN GRADIO INTERFACE
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
--------------------------------
--------------------------------
--------------------------------
--------------------------------
--------------------------------
Traceback (most recent call last):
File "X:\musika\musika_train.py", line 31, in <module>
T.train(ds, models_ls)
File "X:\musika\train.py", line 134, in train
train_summary_writer = tf.summary.create_file_writer(train_log_dir)
File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 559, in create_file_writer_v2
return _ResourceSummaryWriter(
File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\summary_ops_v2.py", line 311, in __init__
self._init_op = init_op_fn(self._resource)
File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\ops\gen_summary_ops.py", line 145, in create_summary_file_writer
_ops.raise_from_not_ok_status(e, name)
File "X:\cnd\envs\musika\lib\site-packages\tensorflow\python\framework\ops.py", line 7209, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__CreateSummaryFileWriter_device_/job:localhost/replica:0/task:0/device:CPU:0}} Failed to create a NewWriteableFile: X:\logs/MUSIKA_latlen_256_latdepth_64_sr_44100/20221119-234355/train/events.out.tfevents.1668890635.??-??.6756.0.v2 : ?????????????? ?????? ? ????? ?????, ????? ????? ??? ????? ????.
; no protocol option
Creating writable file X:\logs/MUSIKA_latlen_256_latdepth_64_sr_44100/20221119-234355/train/events.out.tfevents.1668890635.??-??.6756.0.v2
Could not initialize events writer. [Op:CreateSummaryFileWriter]
Are there any options to solve this issue?
Issue Analytics
- State:
- Created 10 months ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
tensorflow.python.framework.errors_impl ... - GitHub
While training a custom object detector using TensorFlow Object Detection API on Colab I got this error. I was using tensorflow-gpu==1.15.0 ...
Read more >tensorflow.python.framework.errors_impl.InvalidArgumentError
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on ...
Read more >Help_request Training terminated after few minutes
i everyone, I have a deep learning code, it works great in ... _handle, device_name, op_name, tensorflow.python.framework.errors_impl.
Read more >InvalidArgumentError: required broadcastable shapes [Op:Mul]
When running model.evaluate_tflite('model.tflite', test_data) after training an object detection model with the Tensorflow Lite Model Maker ...
Read more >Can't predict dataset with varying resolution - TensorFlow Forum
Hi. After a many attempts of trial and error, I was able to create a model that would train on a dataset with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I will have to investigate on this If anyone else reading this thread is experiencing the same problem please report it!
In the meantime as a temporary solution, you can manually comment out from
train.py
the two instances oftrain_summary_writer
:train_summary_writer = tf.summary.create_file_writer(train_log_dir)
and
You will not be able to use tensorboard to track losses, but in case training collapses you will still notice it (Nan values) from the loss values in the tqdm bar
This worked for me, but there was a problem with the XLA (with
-- xla False
musika_train.py works fine). First there was a common problem withInternalError: libdevice not found at ./libdevice.10.bc
, which was solved by adding a new system variable XLA_FLAGS --xla_gpu_cuda_data_dir=X:\cnd\envs\musika. However, after that there was a problem with ptxas:I can say for sure that there is enough disk space. Also i additionally installed cuda-nvcc in conda environment by running
conda install -c nvidia cuda-nvcc
. Could this be a issue with version of tensorflow, cudnn, cudatoolkit, cuda-nvcc or path issue?