Training speed of mxnet-ssd slows down?
See original GitHub issue I have use record file(voc07+12) to train old-style ssd at a speed of 40 images/s ,The speed is about 25 images/s when I try the new train_ssd.py in gluoncv.
I use rec dataset and transform to replace origin file datasets in new ssd code. But when I set **num-workers=4** the gdata.DetectionDataLoader failed ,while **num-workers=1** , It works but the speed is almost as slow as original data reading method.
The error infomation is as following:
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/dataloader.py", line 134, in worker_loop
batch = batchify_fn([dataset[i] for i in samples])
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/dataset.py", line 126, in __getitem__
self.run()
item = self._data[idx]
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/vision/datasets.py", line 257, in __getitem__
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
record = super(ImageRecordDataset, self).__getitem__(idx)
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/dataset.py", line 180, in __getitem__
return self._record.read_idx(self._record.keys[idx])
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/recordio.py", line 265, in read_idx
self._target(*self._args, **self._kwargs)
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/gluon/data/dataloader.py", line 134, in worker_loop
return self.read()
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/recordio.py", line 163, in read
ctypes.byref(size)))
File "/home/deep/workssd/mxnet/incubator-mxnet/python/mxnet/base.py", line 149, in check_call
batch = batchify_fn([dataset[i] for i in samples])
raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [16:12:48] src/recordio.cc:65: Check failed: header[0] == RecordIOWriter::kMagic Invalid RecordIO File
It seems a multi-process problem with old rec file dataset?
Issue Analytics
- State:
- Created 5 years ago
- Comments:16 (9 by maintainers)
Top Results From Across the Web
Training speed in MXNet is nearly 2.5x times slower than ...
Today I started using MXNet's Gluon.cv imagenet training script. I used the MobileNet1.0 bash config presented here(classification.html).
Read more >Performance comparison between MXNet and Tensorflow
Training speed from MXNet is generally slower than TensorFlow in all datasets. However, MXNet is much more efficient than TensorFlow as it only ......
Read more >SSD-SGD: Communication Sparsification for Distributed Deep ...
show that SSD-SGD can accelerate distributed training speed under different ... bring about obvious computation overheads, which slow down the train-.
Read more >Train With Mixed Precision - NVIDIA Documentation Center
This can greatly improve the training speed as well as the inference speed ... number of layers and parameters, which slows down training....
Read more >Simplify mixed precision training with MXNet-AMP - Medium
Novel model architectures tend to have an increasing number of layers and parameters, which slows down training.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@WalterMa Yes, this bug should be easy to fix, but need to be careful not to change current api, so we are still discussing.
An temporary solution is added to RecordFileDetection so multi worker can be enabled. I am closing this due to lack of activity. Feel free to ping me to reopen.