'train_acc': -1, 'valid_acc': -1
See original GitHub issueThe following error occurred when I was using the official documentation tutorial:
env: torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchtext==0.13.0 as same as official documentation windos+python3.8.5
time_limit=auto
set to time_limit=7200
.
Reset labels to [0, 1, 2, 3]
Randomly split train_data into train[720]/validation[80] splits.
The number of requested GPUs is greater than the number of available GPUs.Reduce the number to 1
Starting fit without HPO
modified configs(<old> != <new>): {
root.img_cls.model resnet101 != resnet50
root.train.epochs 200 != 50
root.train.early_stop_baseline 0.0 != -inf
root.train.batch_size 32 != 16
root.train.early_stop_max_value 1.0 != inf
root.train.early_stop_patience -1 != 10
root.misc.seed 42 != 428
root.misc.num_workers 4 != 12
}
Saved config to C:\Users\HP\Desktop\autogloun\2b000416.trial_0\config.yaml
Model resnet50 created, param count: 23516228
AMP not enabled. Training in float32.
Disable EMA as it is not supported for now.
Start training from [Epoch 0]
time_limit=auto
set to time_limit=7200
.
Reset labels to [0, 1, 2, 3]
Randomly split train_data into train[720]/validation[80] splits.
The number of requested GPUs is greater than the number of available GPUs.Reduce the number to 1
Starting fit without HPO
modified configs(<old> != <new>): {
root.img_cls.model resnet101 != resnet50
root.misc.num_workers 4 != 12
root.misc.seed 42 != 204
root.train.early_stop_patience -1 != 10
data/
├── test/
└── train/
root.train.early_stop_max_value 1.0 != inf
root.train.batch_size 32 != 16
root.train.early_stop_baseline 0.0 != -inf
root.train.epochs 200 != 50
}
Saved config to C:\Users\HP\Desktop\autogloun\1444b6bf.trial_0\config.yaml
Model resnet50 created, param count: 23516228
AMP not enabled. Training in float32.
Disable EMA as it is not supported for now.
Start training from [Epoch 0]
Finished, total runtime is 1.50 s
{ ‘best_config’: { ‘batch_size’: 16,
‘dist_ip_addrs’: None,
‘early_stop_baseline’: -inf,
‘early_stop_max_value’: inf,
‘early_stop_patience’: 10,
‘epochs’: 50,
‘final_fit’: False,
‘gpus’: [0],
‘lr’: 0.01,
‘model’: ‘resnet50’,
‘ngpus_per_trial’: 8,
‘nthreads_per_trial’: 128,
‘num_workers’: 12,
‘searcher’: ‘random’,
‘seed’: 204,
‘time_limits’: 7200},
‘total_time’: 1.4981746673583984,
‘train_acc’: -1,
‘valid_acc’: -1}
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
thank u , I can successfully run the tutorial on colab
I agree, please let me know how it goes on colab. I can also locate a windows environment later to look into this issue.