Custom action error
See original GitHub issueHi,
Iβm trying to train a custom action, I followed the instructions provided in:
but I got an error and I donβt know what exactly is occuring, for me itβs not really clear:
My working directory have the following structure:
data
βββ custom_dataset
β βββ rawframes
β β βββ train
β β | βββvideo1
β | | | βββ 00001.jpg
β | | | βββ ...
β | | βββvideo2
β | | | βββ 00001.jpg
β | | | βββ ...
β β βββ val
β β | βββvideo1
β | | | βββ 00001.jpg
β | | | βββ ...
β | | βββvideo2
β | | | βββ 00001.jpg
β | | | βββ ...
β βββ val.txt
β βββ train.txt
βββ mean_std_list.txt
my train.txt and val.txt files look like this:
train.txt
data/custom_dataset/rawframes/train/video1 0 1 3821 1343 1397 23.98
data/custom_dataset/rawframes/train/video1 0 1 3821 1398 1484 23.98
data/custom_dataset/rawframes/train/video1 0 1 3821 1485 1563 23.98
data/custom_dataset/rawframes/train/video1 0 1 3821 1564 1627 23.98
data/custom_dataset/rawframes/train/video2 0 1 2458 1 170 29.97
data/custom_dataset/rawframes/train/video2 0 1 2458 300 467 29.97
data/custom_dataset/rawframes/train/video2 0 1 2458 577 724 29.97
data/custom_dataset/rawframes/train/video2 0 1 2458 725 896 29.97
data/custom_dataset/rawframes/train/video2 0 1 2458 1032 1177 29.97
data/custom_dataset/rawframes/train/video2 0 1 2458 1255 1361 29.97
data/custom_dataset/rawframes/train/video2 0 1 2458 1442 1589 29.97
...
val.txt
data/custom_dataset/rawframes/val/video1 0 1 6471 1263 1327 29.97
data/custom_dataset/rawframes/val/video1 0 1 6471 1384 1450 29.97
data/custom_dataset/rawframes/val/video1 0 1 6471 1561 1618 29.97
data/custom_dataset/rawframes/val/video1 0 1 6471 1723 1828 29.97
data/custom_dataset/rawframes/val/video1 0 1 6471 1908 2003 29.97
data/custom_dataset/rawframes/val/video1 0 1 6471 2076 2132 29.97
data/custom_dataset/rawframes/val/video1 0 1 6471 2133 2198 29.97
data/custom_dataset/rawframes/val/video1 0 1 6471 2199 2258 29.97
data/custom_dataset/rawframes/val/video1 0 1 6471 2259 2338 29.97
...
python train.py --load-weights ${WORK_DIR}/snapshot.pth --train-ann-files ${TRAIN_ANN_FILE} --train-data-roots ${TRAIN_DATA_ROOT} --val-ann-files ${VAL_ANN_FILE} --val-data-roots ${VAL_DATA_ROOT} --save-checkpoints-to ${WORK_DIR}/outputs
INFO:root:Commandline:
train.py --load-weights /home/user/training/picking/working_dir/snapshot.pth --train-ann-files train.txt --train-data-roots /home/user/training/picking/working_dir/data --val-ann-files val.txt --val-data-roots /home/user/training/picking/working_dir/data --save-checkpoints-to /home/user/training/picking/working_dir/outputs
INFO:root:Training started ...
INFO:root:Training on GPUs started ...
WARNING:root:available_gpu_num < args.gpu_num: 1 < 2
WARNING:root:decreased number of gpu to: 1
/home/user/training/picking/training_extensions/external/mmaction2/mmaction/models/backbones/resnet_tin.py:12: UserWarning: Please install mmcv-full to support "tin_shift"
warnings.warn('Please install mmcv-full to support "tin_shift"')
2022-02-03 16:36:09,408 - mmaction - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.6.9 (default, Dec 8 2021, 21:08:43) [GCC 8.4.0]
CUDA available: True
GPU 0: NVIDIA GeForce GTX 1050
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.1+cu102
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.2+cu102
OpenCV: 4.5.3-openvino
MMCV: 1.3.9
MMCV Compiler: n/a
MMCV CUDA Compiler: n/a
MMAction2: 0.6.0+496cec3
------------------------------------------------------------
2022-02-03 16:36:09,408 - mmaction - INFO - Distributed training: True
2022-02-03 16:36:09,408 - mmaction - INFO - Config: /home/user/training/picking/working_dir/model.py
# global parameters
num_videos_per_gpu = 12
num_workers_per_gpu = 3
train_sources = ('custom_dataset', )
test_sources = ('custom_dataset', )
root_dir = 'data'
work_dir = None
load_from = None
resume_from = None
reset_layer_prefixes = ['cls_head']
reset_layer_suffixes = None
# model settings
input_img_size = 224
input_clip_length = 16
input_frame_interval = 2
# training settings
enable_clip_mixing = False
num_train_clips = 2 if enable_clip_mixing else 1
# model definition
model = dict(
type='Recognizer3D',
backbone=dict(
type='MobileNetV3_S3D',
num_input_layers=3,
mode='large',
pretrained=None,
pretrained2d=False,
width_mult=1.0,
pool1_stride_t=1,
# block ids: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
temporal_strides=(1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1),
temporal_kernels=(5, 3, 3, 3, 3, 5, 5, 3, 3, 5, 3, 3, 3, 3, 3),
use_temporal_avg_pool=True,
input_bn=False,
out_conv=True,
out_attention=False,
weight_norm='none',
center_conv_weight=None,
dropout_cfg=dict(
dist='gaussian',
p=0.1,
mu=0.1,
sigma=0.03,
),
),
reducer=dict(
type='AggregatorSpatialTemporalModule',
modules=[
dict(type='AverageSpatialTemporalModule',
temporal_size=4,
spatial_size=7),
],
),
cls_head=dict(
type='ClsHead',
num_classes=700,
temporal_size=1,
spatial_size=1,
dropout_ratio=None,
in_channels=960,
embedding=True,
embd_size=256,
num_centers=1,
st_scale=10.0,
reg_weight=1.0,
reg_threshold=0.1,
enable_sampling=False,
adaptive_sampling=False,
sampling_angle_std=3.14 / 2 / 5,
enable_class_mixing=False,
class_mixing_alpha=0.2,
loss_cls=dict(
type='AMSoftmaxLoss',
target_loss='ce',
scale_cfg=dict(
type='PolyScalarScheduler',
start_scale=30.0,
end_scale=5.0,
power=1.2,
num_epochs=40.0,
),
pr_product=False,
margin_type='cos',
margin=0.35,
gamma=0.0,
t=1.0,
conf_penalty_weight=0.085,
filter_type='positives',
top_k=None,
enable_class_weighting=False,
enable_adaptive_margins=False,
),
),
)
# model training and testing settings
train_cfg = dict(
self_challenging=dict(enable=False, drop_p=0.33),
clip_mixing=dict(enable=enable_clip_mixing, mode='logits', num_clips=num_train_clips,
scale=10.0, weight=0.2),
loss_norm=dict(enable=False, gamma=0.9),
sample_filtering=dict(enable=False, warmup_epochs=1),
)
test_cfg = dict(
average_clips=None
)
# dataset settings
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_bgr=False
)
train_pipeline = [
dict(type='SampleFrames',
clip_len=input_clip_length,
frame_interval=input_frame_interval,
num_clips=num_train_clips,
temporal_jitter=True,
enable_negatives=False),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='RandomRotate', delta=10, prob=0.5),
dict(type='MultiScaleCrop',
input_size=input_img_size,
scales=(1, 0.875, 0.75, 0.66),
random_crop=False,
max_wh_scale_gap=1),
dict(type='Resize', scale=(input_img_size, input_img_size), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5),
dict(type='PhotometricDistortion',
brightness_range=(65, 190),
contrast_range=(0.6, 1.4),
saturation_range=(0.7, 1.3),
hue_delta=18),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label', 'dataset_id'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label', 'dataset_id'])
]
val_pipeline = [
dict(type='SampleFrames',
clip_len=input_clip_length,
frame_interval=input_frame_interval,
num_clips=1,
test_mode=True),
dict(type='RawFrameDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=(input_img_size, input_img_size)),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=num_videos_per_gpu,
workers_per_gpu=num_workers_per_gpu,
train_dataloader=dict(
drop_last=True,
num_instances_per_batch=None,
),
shared=dict(
type='RawframeDataset',
data_subdir='rawframes',
filename_tmpl='{:05d}.jpg'
),
train=dict(
source=train_sources,
ann_file='train.txt',
pipeline=train_pipeline,
),
val=dict(
source=test_sources,
ann_file='val.txt',
pipeline=val_pipeline
),
test=dict(
source=test_sources,
ann_file='val.txt',
pipeline=val_pipeline
)
)
# optimizer
optimizer = dict(
type='SGD',
lr=1e-3,
momentum=0.9,
weight_decay=1e-4
)
optimizer_config = dict(
grad_clip=dict(
max_norm=40,
norm_type=2
)
)
# parameter manager
params_config = dict(
type='FreezeLayers',
epochs=5,
open_layers=['cls_head']
)
# learning policy
lr_config = dict(
policy='customstep',
step=[30, 50],
gamma=0.1,
fixed='constant',
fixed_epochs=5,
fixed_ratio=10.0,
warmup='cos',
warmup_epochs=5,
warmup_ratio=1e-2,
)
total_epochs = 65
# workflow
workflow = [('train', 1)]
checkpoint_config = dict(
interval=1
)
evaluation = dict(
interval=1,
metrics=['top_k_accuracy', 'mean_class_accuracy', 'ranking_mean_average_precision'],
topk=(1, 5),
)
log_level = 'INFO'
log_config = dict(
interval=10,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook'),
]
)
# runtime settings
dist_params = dict(
backend='nccl'
)
find_unused_parameters = True
2022-02-03 16:36:09,413 - mmaction - INFO - Pipeline:
Compose(
SampleFrames(clip_len=16, frame_interval=2, num_clips=1)
RawFrameDecode(io_backend=disk, decoding_backend=cv2)
Resize(scale=(inf, 256), keep_ratio=True, interpolation=['bilinear'], lazy=False)
RandomRotate(delta=10.0, prob=0.5)
MultiScaleCrop(input_size=(224, 224), scales=(1, 0.875, 0.75, 0.66), max_wh_scale_gap=1, random_crop=False, num_fixed_crops=5, lazy=False)
Resize(scale=(224, 224), keep_ratio=False, interpolation=['bilinear'], lazy=False)
Flip(flip_ratio=0.5, direction=horizontal, lazy=False)
PhotometricDistortion (brightness_range=[65, 190], contrast_range=[0.6, 1.4], hue_delta=18, saturation_range=[0.7, 1.3], )
Normalize(mean=[123.675 116.28 103.53 ], std=[58.395 57.12 57.375], to_bgr=False, adjust_magnitude=False)
FormatShape(input_format='NCTHW')
Collect(keys=['imgs', 'label', 'dataset_id'], meta_keys=[])
ToTensor(keys=['imgs', 'label', 'dataset_id'])
)
2022-02-03 16:36:09,414 - mmaction - INFO - Train datasets:
+----------------+----------+---------+-----------+
| name | # labels | # items | imbalance |
+----------------+----------+---------+-----------+
| custom_dataset | 1 | 38 | 1.00 |
| total | 1 | 38 | |
+----------------+----------+---------+-----------+
Traceback (most recent call last):
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "/home/user/training/picking/training_extensions/external/mmaction2/mmaction/models/heads/cls_head.py", line 92, in __init__
st_scale, reg_weight, reg_threshold)
File "/home/user/training/picking/training_extensions/external/mmaction2/mmaction/core/ops/linear.py", line 22, in __init__
assert num_classes >= 2
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/mmcv/utils/registry.py", line 51, in build_from_cfg
return obj_cls(**args)
File "/home/user/training/picking/training_extensions/external/mmaction2/mmaction/models/recognizers/base.py", line 96, in __init__
self.cls_head = builder.build_head(cls_head, class_sizes)
File "/home/user/training/picking/training_extensions/external/mmaction2/mmaction/models/builder.py", line 54, in build_head
heads = [build(cfg, HEADS, dict(class_sizes=cs)) for cs in class_sizes]
File "/home/user/training/picking/training_extensions/external/mmaction2/mmaction/models/builder.py", line 54, in <listcomp>
heads = [build(cfg, HEADS, dict(class_sizes=cs)) for cs in class_sizes]
File "/home/user/training/picking/training_extensions/external/mmaction2/mmaction/models/builder.py", line 30, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.__name__}: {e}')
AssertionError: ClsHead:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/training/picking/training_extensions/external/mmaction2/tools/train.py", line 232, in <module>
main()
File "/home/user/training/picking/training_extensions/external/mmaction2/tools/train.py", line 206, in main
class_maps=datasets[0].class_maps
File "/home/user/training/picking/training_extensions/external/mmaction2/mmaction/models/builder.py", line 90, in build_model
return build_recognizer(cfg, train_cfg, test_cfg, class_sizes, class_maps)
File "/home/user/training/picking/training_extensions/external/mmaction2/mmaction/models/builder.py", line 65, in build_recognizer
dict(train_cfg=train_cfg, test_cfg=test_cfg, class_sizes=class_sizes, class_maps=class_maps))
File "/home/user/training/picking/training_extensions/external/mmaction2/mmaction/models/builder.py", line 30, in build
return build_from_cfg(cfg, registry, default_args)
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg
raise type(e)(f'{obj_cls.__name__}: {e}')
AssertionError: Recognizer3D: ClsHead:
/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
FutureWarning,
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 23338) of binary: /home/user/training/picking/training_extensions/models/action_recognition/venv/bin/python
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/user/training/picking/training_extensions/models/action_recognition/venv/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/user/training/picking/training_extensions/external/mmaction2/tools/train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2022-02-03_16:36:14
host : Nitro
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 23338)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
INFO:root:... training on GPUs completed.
INFO:root:... training completed.
Are my train and val files wrong? or maybe the python lib:
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 23338) of binary: /home/user/training/picking/training_extensions/models/action_recognition/venv/bin/python
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Returning Error Messages from Custom Actions - Win32 apps
To send an error message from a custom action that uses a dynamic-link library (DLL), have the custom action call MsiProcessMessage. Note thatΒ ......
Read more >Create a custom action to throw an error - ServiceNow Docs
Create a custom action to throw an error Β· Navigate to All > Process Automation > Flow Designer. Β· Select New > Action...
Read more >wix - Interrupt installation when custom action returns error
Suspected causes: Β· 1 ) Wrong C++ custom action code configuration, often forgetting to create a CA. Β· 2 ) Wrong path to...
Read more >Custom action on error - Advanced Installer
We need to have a custom action to be launched whenever the installation fails for any reason: I have tried a custom action...
Read more >CustomAction Element - WiX Toolset
Name Type Required
Id String Yes
BinaryKey String
Directory String
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @morkovka1337,
Thanks! is working now
Iβm closing this since the problem was successfully solved.