no training: Total training time: 0:00:00
See original GitHub issueI debugged the training process. There is no problem loading the data set. But when it runs to the 205th line of /home/zhaojing/BCNet/detectron2/engine/train_loop.py, the training ends.
data = next(self._data_loader_iter)
How to solve this problem,thank you!
The log is as follows: … [07/15 18:54:16 d2.data.datasets.coco]: Loading /repository01/nucleus_seg_data/instanceSeg/MoNuSeg/annotations/instances_train2017bcnet.json takes 16.16 seconds. [07/15 18:54:17 d2.data.datasets.coco]: Loaded 7680 images in COCO format from /repository01/nucleus_seg_data/instanceSeg/MoNuSeg/annotations/instances_train2017bcnet.json [07/15 19:00:58 d2.data.build]: Removed 0 images with no usable annotations. 7680 images left. [07/15 19:01:12 d2.data.build]: Distribution of instances among all 1 categories:
category | #instances |
---|---|
building | 310262 |
[07/15 19:01:16 d2.data.common]: Serializing 7680 elements to byte tensors and concatenating them all … [07/15 19:01:20 d2.data.common]: Serialized dataset takes 253.14 MiB [07/15 19:02:52 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(600,), max_size=900, sample_style=‘choice’), RandomFlip()] [07/15 19:03:31 d2.data.build]: Using training sampler TrainingSampler [07/15 19:05:52 fvcore.common.checkpoint]: [Checkpointer] Loading from /home/zhaojing/BCNet/pretrainmodel/R-101.pkl … [07/15 19:05:52 d2.checkpoint.c2_model_loading]: Remapping C2 weights … [07/15 19:05:54 d2.checkpoint.c2_model_loading]: backbone.bottom_up.res2.0.conv1.norm.bias loaded from res2_0_branch2a_bn_beta of shape (64,) [07/15 19:05:54 d2.checkpoint.c2_model_loading]: backbone.bottom_up.res2.0.conv1.norm.running_mean loaded from res2_0_branch2a_bn_running_mean of shape (64,) [07/15 19:05:54 d2.checkpoint.c2_model_loading]: backbone.bottom_up.res2.0.conv1.norm.running_var loaded from res2_0_branch2a_bn_running_var of shape (64,) …
[07/15 16:52:25 d2.checkpoint.c2_model_loading]: The checkpoint state_dict contains keys that are not used by the model: fc1000_b fc1000_w [07/15 16:52:25 d2.engine.train_loop]: Starting training from iteration 0 [07/15 16:52:25 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks) [07/15 16:52:26 d2.data.datasets.coco]: Loaded 896 images in COCO format from /repository01/nucleus_seg_data/instanceSeg/MoNuSeg/annotations/instances_test2017bcnet.json [07/15 16:52:26 d2.data.build]: Distribution of instances among all 1 categories:
category | #instances |
---|---|
building | 24384 |
[07/15 16:52:26 d2.data.common]: Serializing 896 elements to byte tensors and concatenating them all … [07/15 16:52:26 d2.data.common]: Serialized dataset takes 7.96 MiB [07/15 16:52:27 d2.evaluation.evaluator]: Start inference on 896 images [07/15 16:52:28 d2.evaluation.evaluator]: Inference done 11/896. 0.0775 s / img. ETA=0:01:30 [07/15 16:52:33 d2.evaluation.evaluator]: Inference done 61/896. 0.0761 s / img. ETA=0:01:24
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
bzip2 1.0.8 h7b6447c_0
ca-certificates 2021.7.5 h06a4308_1
certifi 2021.5.30 py37h06a4308_0
cffi 1.14.6 py37h400218f_0
charset-normalizer 2.0.1 pypi_0 pypi
cloudpickle 1.6.0 pypi_0 pypi
cudatoolkit 10.0.130 0 nvidia
cycler 0.10.0 pypi_0 pypi
cython 3.0.0a8 pypi_0 pypi
detectron2 0.1 dev_0 <develop>
ffmpeg 4.3 hf484d3e_0 pytorch
freetype 2.10.4 h5ab3b9f_0
future 0.18.2 pypi_0 pypi
gmp 6.2.1 h2531618_2
gnutls 3.6.15 he1e5248_0
idna 3.2 pypi_0 pypi
intel-openmp 2021.2.0 h06a4308_610
iopath 0.1.9 pypi_0 pypi
jpeg 9b 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
kiwisolver 1.3.1 pypi_0 pypi
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 9.3.0 h5101ec6_17
libiconv 1.14 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
libidn2 2.3.1 h27cfd23_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 hd4cf53a_17
libtasn1 4.16.0 h27cfd23_0
libtiff 4.2.0 h85742a9_0
libunistring 0.9.10 h27cfd23_0
libuv 1.40.0 h7b6447c_0
libwebp-base 1.2.0 h27cfd23_0
lz4-c 1.9.3 h2531618_0
mkl 2021.2.0 h06a4308_296
mkl-service 2.4.0 py37h7f8727e_0
mkl_fft 1.3.0 py37h42c9631_2
mkl_random 1.2.1 py37ha9443f7_2
ncurses 6.2 he6710b0_1
nettle 3.7.3 hbbd107a_1
ninja 1.7.2 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
numpy 1.20.2 py37h2d18471_0
numpy-base 1.20.2 py37hfae3a4d_0
olefile 0.46 py37_0
openh264 2.1.0 hd408876_0
openjpeg 2.3.0 h05c96fa_1
openssl 1.1.1k h27cfd23_0
pillow 6.2.2 pypi_0 pypi
pip 21.1.3 py37h06a4308_0
portalocker 2.3.0 pypi_0 pypi
pycocotools 2.0 pypi_0 pypi
pycparser 2.20 py_2
pydot 1.4.2 pypi_0 pypi
pyparsing 3.0.0b2 pypi_0 pypi
python 3.7.10 h12debd9_4
python-dateutil 2.8.1 pypi_0 pypi
pytorch 1.4.0 py3.7_cuda10.0.130_cudnn7.6.3_0 pytorch
pywavelets 1.1.1 py37h7b6447c_2
pyyaml 5.4.1 pypi_0 pypi
readline 8.1 h27cfd23_0
requests 2.26.0 pypi_0 pypi
scipy 1.6.2 py37had2a1c9_1
setuptools 52.0.0 py37h06a4308_0
six 1.16.0 pyhd3eb1b0_0
sqlite 3.36.0 hc218d9a_0
tabulate 0.8.9 pypi_0 pypi
tensorboard 2.5.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.0 pypi_0 pypi
tk 8.6.10 hbc83047_0
torchaudio 0.4.0 py37 pytorch
torchvision 0.5.0 py37_cu100 pytorch
tqdm 4.61.2 pypi_0 pypi
typing 3.10.0.0 py37h06a4308_0
typing_extensions 3.10.0.0 pyh06a4308_0
urllib3 1.26.6 pypi_0 pypi
werkzeug 2.0.1 pypi_0 pypi
wheel 0.36.2 pypi_0 pypi
xz 5.2.5 h7b6447c_0
yacs 0.1.8 pypi_0 pypi
zlib 1.2.11 0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
zstd 1.4.9 haebb681_0
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
I inserted the following code in
BCNet/detectron2/engine/train_loop.py
. This problem was solved.I had the same bug in windows 10. And I find it’s caused by the PYTORCH VERSION==1.4, which has been reported in github/pytorch before.
One solution is to upgrade PYTORCH==1.8, and TORCHVISION. The BCNet has many compatibility issues with PYTORCH_1.8 . Step by step debugging, finally, it works out.