Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

'train_acc': -1, 'valid_acc': -1

See original GitHub issue

The following error occurred when I was using the official documentation tutorial：

env： torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchtext==0.13.0 as same as official documentation windos+python3.8.5

time_limit=auto set to time_limit=7200. Reset labels to [0, 1, 2, 3] Randomly split train_data into train[720]/validation[80] splits. The number of requested GPUs is greater than the number of available GPUs.Reduce the number to 1 Starting fit without HPO modified configs(<old> != <new>): { root.img_cls.model resnet101 != resnet50 root.train.epochs 200 != 50 root.train.early_stop_baseline 0.0 != -inf root.train.batch_size 32 != 16 root.train.early_stop_max_value 1.0 != inf root.train.early_stop_patience -1 != 10 root.misc.seed 42 != 428 root.misc.num_workers 4 != 12 } Saved config to C:\Users\HP\Desktop\autogloun\2b000416.trial_0\config.yaml Model resnet50 created, param count: 23516228 AMP not enabled. Training in float32. Disable EMA as it is not supported for now. Start training from [Epoch 0] time_limit=auto set to time_limit=7200. Reset labels to [0, 1, 2, 3] Randomly split train_data into train[720]/validation[80] splits. The number of requested GPUs is greater than the number of available GPUs.Reduce the number to 1 Starting fit without HPO modified configs(<old> != <new>): { root.img_cls.model resnet101 != resnet50 root.misc.num_workers 4 != 12 root.misc.seed 42 != 204 root.train.early_stop_patience -1 != 10 data/ ├── test/ └── train/ root.train.early_stop_max_value 1.0 != inf root.train.batch_size 32 != 16 root.train.early_stop_baseline 0.0 != -inf root.train.epochs 200 != 50 } Saved config to C:\Users\HP\Desktop\autogloun\1444b6bf.trial_0\config.yaml Model resnet50 created, param count: 23516228 AMP not enabled. Training in float32. Disable EMA as it is not supported for now. Start training from [Epoch 0] Finished, total runtime is 1.50 s { ‘best_config’: { ‘batch_size’: 16, ‘dist_ip_addrs’: None, ‘early_stop_baseline’: -inf, ‘early_stop_max_value’: inf, ‘early_stop_patience’: 10, ‘epochs’: 50, ‘final_fit’: False, ‘gpus’: [0], ‘lr’: 0.01, ‘model’: ‘resnet50’, ‘ngpus_per_trial’: 8, ‘nthreads_per_trial’: 128, ‘num_workers’: 12, ‘searcher’: ‘random’, ‘seed’: 204, ‘time_limits’: 7200}, ‘total_time’: 1.4981746673583984, ‘train_acc’: -1, ‘valid_acc’: -1}