Semantic Segmentation Task Trainer
See original GitHub issueHello, firstly I apologise for the vagueness of the question - but having dug into this issue for a couple of days I haven’t been able to understand it very well and so defining it as clearly as I would like has been difficult. I would like to know if there is some mistake in my training setup that is causing the issue outlined below or whether my understanding of the relationship between the sampler, dataloader and trainer is incorrect.
Task background: I have a single channel input layer and a single channel mask output layer. I am using a randomBatchGeoSampler for getting training samples, which has been tested by printing/plotting samples obtained from the dataloader I am using. Currently I am using a batch_size = 2
and length = 2
(one batch per epoch) as a simple test case to see if I can get my trainer to run. In this case I receive 2 samples in total as expected.
I have then defined a trainer object as follows:
model = SemanticSegmentationTask(
segmentation_model="unet",
encoder_name="resnet18",
encoder_weights="imagenet",
in_channels=1,
num_classes=2,
num_filters=32,
loss="ce",
ignore_zeros=False,
learning_rate=0.1,
learning_rate_schedule_patience=5,
)
wandb_logger = WandbLogger(project="Wildfires", log_model="all", name="simple_land_simple_dataloader")
trainer = Trainer(gpus=1, logger=wandb_logger,
callbacks=callbacks,
max_epochs=1,
precision=16,
log_every_n_steps=1,
max_steps=1,
)
wandb_logger.watch(model)
trainer.fit(model=model, datamodule=datamodule)
Expected behaviour: I expect the trainer to run for a very short time, given that only one batch of 2 samples is being used and only 1 training step will be completed. I expect that the length of each training step should not increase with the size of the dataset used since the random samples taken are of equal patch and batch sizes. My current understanding is that the progress metric (6081025 in the picture below) should be the total batches (or something similar).
Observed behaviour: The single training step results in a progress bar appearing where the progress metric is measured in the millions (meanwhile actual progress increases slowly in increments of 20), resulting in a massively long training process. Furthermore, the progress metric increases to larger and larger values as I attempt to add more data into my dataset. As far as I can see, the amount of data used in my dataset is the only thing that effects the size of this progress metric (including the segmentation model used). If I set max_steps = 0, the training loop does not run at all, as expected.
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (3 by maintainers)
Top GitHub Comments
All resolved! @calebrob6 suggestion did the trick
@Hamish-Cam are you still having issues or did you figure this one out?