no training: Total training time: 0:00:00
See original GitHub issueI debugged the training process. There is no problem loading the data set. But when it runs to the 205th line of /home/zhaojing/BCNet/detectron2/engine/, the training ends.
data = next(self._data_loader_iter)
How to solve this problem,thank you!
The log is as follows: … [07/15 18:54:16]: Loading /repository01/nucleus_seg_data/instanceSeg/MoNuSeg/annotations/instances_train2017bcnet.json takes 16.16 seconds. [07/15 18:54:17]: Loaded 7680 images in COCO format from /repository01/nucleus_seg_data/instanceSeg/MoNuSeg/annotations/instances_train2017bcnet.json [07/15 19:00:58]: Removed 0 images with no usable annotations. 7680 images left. [07/15 19:01:12]: Distribution of instances among all 1 categories:
category | #instances |
building | 310262 |
[07/15 19:01:16]: Serializing 7680 elements to byte tensors and concatenating them all … [07/15 19:01:20]: Serialized dataset takes 253.14 MiB [07/15 19:02:52]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(600,), max_size=900, sample_style=‘choice’), RandomFlip()] [07/15 19:03:31]: Using training sampler TrainingSampler [07/15 19:05:52 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/zhaojing/BCNet/pretrainmodel/R-101.pkl … [07/15 19:05:52 d2.checkpoint.c2_model_loading]: Remapping C2 weights … [07/15 19:05:54 d2.checkpoint.c2_model_loading]: backbone.bottom_up.res2.0.conv1.norm.bias loaded from res2_0_branch2a_bn_beta of shape (64,) [07/15 19:05:54 d2.checkpoint.c2_model_loading]: backbone.bottom_up.res2.0.conv1.norm.running_mean loaded from res2_0_branch2a_bn_running_mean of shape (64,) [07/15 19:05:54 d2.checkpoint.c2_model_loading]: backbone.bottom_up.res2.0.conv1.norm.running_var loaded from res2_0_branch2a_bn_running_var of shape (64,) …
[07/15 16:52:25 d2.checkpoint.c2_model_loading]: The checkpoint state_dict contains keys that are not used by the model: fc1000_b fc1000_w [07/15 16:52:25 d2.engine.train_loop]: Starting training from iteration 0 [07/15 16:52:25 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks) [07/15 16:52:26]: Loaded 896 images in COCO format from /repository01/nucleus_seg_data/instanceSeg/MoNuSeg/annotations/instances_test2017bcnet.json [07/15 16:52:26]: Distribution of instances among all 1 categories:
category | #instances |
building | 24384 |
[07/15 16:52:26]: Serializing 896 elements to byte tensors and concatenating them all … [07/15 16:52:26]: Serialized dataset takes 7.96 MiB [07/15 16:52:27 d2.evaluation.evaluator]: Start inference on 896 images [07/15 16:52:28 d2.evaluation.evaluator]: Inference done 11/896. 0.0775 s / img. ETA=0:01:30 [07/15 16:52:33 d2.evaluation.evaluator]: Inference done 61/896. 0.0761 s / img. ETA=0:01:24
Top GitHub Comments
I inserted the following code in
. This problem was solved.I had the same bug in windows 10. And I find it’s caused by the PYTORCH VERSION==1.4, which has been reported in github/pytorch before.
One solution is to upgrade PYTORCH==1.8, and TORCHVISION. The BCNet has many compatibility issues with PYTORCH_1.8 . Step by step debugging, finally, it works out.