Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to train with multiple videos in my custom dataset?

See original GitHub issue

Hello, I’ve tried to train with my own dataset whose folder is as below: /dataset_folder/HR/video_num/*.png /dataset_folder/LR/X4/video_num/*.png

And I’ve organized them following the instructions in Data/datasets.yaml: Root: /home/user

Path: CUSTOM-TRAINHR[video]: dataset_folder/HR CUSTOM-TRAINLR[video]: dataset_folder/LR/X4

Dataset: CUSTOM[video]: train: hr: CUSTOM_TRAINHR lr: CUSTOM_TRAINLR my valset is the same format as organized above.

However, Traceback was thown when I tried the command: python train.py sofvsr --dataset custom --epochs 100 --cuda which is:

Traceback (most recent call last): File "train.py", line 99, in <module> main() File "train.py", line 93, in main t.fit([lt, lv], config) File "/home/zp/VideoSuperResolution-master/VSR/Backend/Torch/Framework/Trainer.py", line 110, in fit
memory_limit=mem) File "/home/zp/VideoSuperResolution-master/VSR/DataLoader/Loader.py", line 322, in make_one_shot_iterator raise fs.exception() File "/home/user/.conda/envs/zp/lib/python3.6/concurrent/futures/thread.py", line 56, in run result = self.fn(*self.args, **self.kwargs) File "/home/zp/VideoSuperResolution-master/VSR/DataLoader/Loader.py", line 393, in _prefecth_chunk self.cache['hr'].append(img.read_frame(img.frames)) File "/home/zp/VideoSuperResolution-master/VSR/DataLoader/VirtualFile.py", line 362, in read_frame image_bytes = [BytesIO(self.read()) for _ in range(frames)] File "/home/zp/VideoSuperResolution-master/VSR/DataLoader/VirtualFile.py", line 362, in <listcomp> image_bytes = [BytesIO(self.read()) for _ in range(frames)] File "/home/zp/VideoSuperResolution-master/VSR/DataLoader/VirtualFile.py", line 129, in read raise EOFError(f'End of File! {self.name}') EOFError: End of File! 068

It usually occurs after one epoch, but sometimes the epoch_num can also be 2 or 3. Besides, totally my trainset has 240 videos or 240*100=24000 frames. However, it can just read 200 batchs when my batch_size is set to 4.

I just wonder how to train with multiple videos as mentioned above? Cause when I just use one video folder containing 100 frames, everything is ok.

Thx.

Issue Analytics

State:
Created 3 years ago
Comments:17 (9 by maintainers)

Top GitHub Comments

1reaction

LoSealLcommented, Apr 24, 2020

@iPrayerr Affirmative. It’s a bug. To work around it, use --threads=1. It takes me some time to fix this 😦

1reaction

LoSealLcommented, Apr 17, 2020

Usually 2 ways to debug:

Record the training image patches (through SummaryWriter/tensorboard, or just save on the disk). Check the training pair to see if they matched as desired.
Check the hyper-parameters, especially the learning rate and batch size. The convergence may be very sensitive to them.
Training on the pre-trained weights, it will be easier to converge.