Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error training model - band regex?

See original GitHub issue

Hello!

I’ve got a custom datamodule for Landcover / Modis / Sentinel data. The data module works fine when called directly (I can plot the 1 mask 4 channels) by sampling with a dataloader.

Issue comes when trying to run this datamodule with a semantic segmentation binary task - there’s an issue with the bands within the geo.py raster dataset - ’ no such group’ . I’ve looked at the source code in datasets/geo.py but I’m not clear how to solve.

It seems to be some kind of issue with the band and the regex - we seem to match the date ok, but potentially fail with the band. However I copied the form of the band regex from the torchgeo sentinel2class.

from torchgeo.datasets import Sentinel2

class Sentinel2(Sentinel2):
    filename_glob = '*B03.tif'
    filename_regex = '^(?P<date>\d{6})\S{4}(?P<band>B[018][\dA]).tif$'
    date_format = '%Y%m'
    all_bands = ['B03', 'B08', 'B11']

def main():

    datamodule = MODISJDLandcoverSimpleDataModule(
      modis_root_dir="MODIS/",
      landcover_root_dir="landcover/Classified/",
      sentinel_root_dir ='sentinel/',
      patch_size=250,
      batch_size=10,
      length=10,
      num_workers=0,
      one_hot_encode=False,
      balance_samples=False,
      burn_prop = 0, 
      grid_sampler = False,
      units = Units.PIXELS
)

    # ignore_zeros=True corresponds to ignoring the background class
    # in metrics evaluation
    model = BinarySemanticSegmentationTask(
        segmentation_model="unet",
        encoder_name="resnet18",
        encoder_weights=None, #"imagenet",
        in_channels=4,
        num_filters=64,
        num_classes=2,
        loss="jaccard",
        # tversky_alpha=0.7,
        # tversky_beta=0.3,
        # tversky_gamma=1.0,
        learning_rate=0.1,
        ignore_zeros=False,
        learning_rate_schedule_patience=5,
    )

    trainer = Trainer(gpus=1, fast_dev_run=True)


    # this is used when automatically finding the learning rate
    trainer.tune(
        model, datamodule
    )  
    trainer.fit(model, datamodule)


if __name__ == "__main__":

    # set random seed for reproducibility
    pl.seed_everything(0)

    # TRAIN
    main()

Global seed set to 0
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 1 batch(es).
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type             | Params
---------------------------------------------------
0 | model         | Unet             | 14.3 M
1 | loss          | JaccardLoss      | 0     
2 | train_metrics | MetricCollection | 0     
3 | val_metrics   | MetricCollection | 0     
4 | test_metrics  | MetricCollection | 0     
---------------------------------------------------
14.3 M    Trainable params
0         Non-trainable params
14.3 M    Total params
57.325    Total estimated model params size (MB)
/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/data_loading.py:433: UserWarning: The number of training samples (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"
Epoch 0: 0%
0/2 [00:00<?, ?it/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[<ipython-input-33-74529b3a24c6>](https://localhost:8080/#) in <module>()
     72 
     73     # TRAIN
---> 74     main()

26 frames
[/usr/local/lib/python3.7/dist-packages/torchgeo/datasets/geo.py](https://localhost:8080/#) in __getitem__(self, query)
    415                     if match:
    416                         if "date" in match.groupdict():
--> 417                             start = match.start("band")
    418                             end = match.end("band")
    419                             filename = filename[:start] + band + filename[end:]

IndexError: no such group

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

1reaction

graceebc9commented, Mar 16, 2022

yes that worked, thanks so much!!!

0reactions

adamjstewartcommented, Mar 16, 2022

This seems related to the following bug reports. Basically, the UNet that comes with SMP requires images with patch_size divisible by 32. Can you try switching from 250 to 256 and see if that solves your issue?