question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error training model - band regex?

See original GitHub issue

Hello!

I’ve got a custom datamodule for Landcover / Modis / Sentinel data. The data module works fine when called directly (I can plot the 1 mask 4 channels) by sampling with a dataloader.

Issue comes when trying to run this datamodule with a semantic segmentation binary task - there’s an issue with the bands within the geo.py raster dataset - ’ no such group’ . I’ve looked at the source code in datasets/geo.py but I’m not clear how to solve.

It seems to be some kind of issue with the band and the regex - we seem to match the date ok, but potentially fail with the band. However I copied the form of the band regex from the torchgeo sentinel2class.

from torchgeo.datasets import Sentinel2

class Sentinel2(Sentinel2):
    filename_glob = '*B03.tif'
    filename_regex = '^(?P<date>\d{6})\S{4}(?P<band>B[018][\dA]).tif$'
    date_format = '%Y%m'
    all_bands = ['B03', 'B08', 'B11']
def main():

    datamodule = MODISJDLandcoverSimpleDataModule(
      modis_root_dir="MODIS/",
      landcover_root_dir="landcover/Classified/",
      sentinel_root_dir ='sentinel/',
      patch_size=250,
      batch_size=10,
      length=10,
      num_workers=0,
      one_hot_encode=False,
      balance_samples=False,
      burn_prop = 0, 
      grid_sampler = False,
      units = Units.PIXELS
)

    # ignore_zeros=True corresponds to ignoring the background class
    # in metrics evaluation
    model = BinarySemanticSegmentationTask(
        segmentation_model="unet",
        encoder_name="resnet18",
        encoder_weights=None, #"imagenet",
        in_channels=4,
        num_filters=64,
        num_classes=2,
        loss="jaccard",
        # tversky_alpha=0.7,
        # tversky_beta=0.3,
        # tversky_gamma=1.0,
        learning_rate=0.1,
        ignore_zeros=False,
        learning_rate_schedule_patience=5,
    )

    trainer = Trainer(gpus=1, fast_dev_run=True)


    # this is used when automatically finding the learning rate
    trainer.tune(
        model, datamodule
    )  
    trainer.fit(model, datamodule)


if __name__ == "__main__":

    # set random seed for reproducibility
    pl.seed_everything(0)

    # TRAIN
    main()
Global seed set to 0
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
Running in fast_dev_run mode: will run a full train, val, test and prediction loop using 1 batch(es).
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type             | Params
---------------------------------------------------
0 | model         | Unet             | 14.3 M
1 | loss          | JaccardLoss      | 0     
2 | train_metrics | MetricCollection | 0     
3 | val_metrics   | MetricCollection | 0     
4 | test_metrics  | MetricCollection | 0     
---------------------------------------------------
14.3 M    Trainable params
0         Non-trainable params
14.3 M    Total params
57.325    Total estimated model params size (MB)
/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/data_loading.py:433: UserWarning: The number of training samples (1) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  f"The number of training samples ({self.num_training_batches}) is smaller than the logging interval"
Epoch 0: 0%
0/2 [00:00<?, ?it/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[<ipython-input-33-74529b3a24c6>](https://localhost:8080/#) in <module>()
     72 
     73     # TRAIN
---> 74     main()

26 frames
[/usr/local/lib/python3.7/dist-packages/torchgeo/datasets/geo.py](https://localhost:8080/#) in __getitem__(self, query)
    415                     if match:
    416                         if "date" in match.groupdict():
--> 417                             start = match.start("band")
    418                             end = match.end("band")
    419                             filename = filename[:start] + band + filename[end:]

IndexError: no such group

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
graceebc9commented, Mar 16, 2022

yes that worked, thanks so much!!!

0reactions
adamjstewartcommented, Mar 16, 2022

This seems related to the following bug reports. Basically, the UNet that comes with SMP requires images with patch_size divisible by 32. Can you try switching from 250 to 256 and see if that solves your issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Regex to match ERROR or INFO, but with different actions ...
I am iterating through a log file to list how many ERROR messages and INFO messages each user has generated. I ...
Read more >
Best Practices for Regular Expressions in .NET - Microsoft Learn
Learn how to create efficient, effective regular expressions in .NET.
Read more >
Parse Variable Patterns Using Regex - Sumo Logic Docs
Use the following parse regex expression to match the "error" in the logs. The (?i) tells the parser to ignore case for the...
Read more >
Everything you need to know about Regular Expressions
I'd recommend solving a problem like this by capturing each group of digits using a regex pattern, then converting captured items to integers ......
Read more >
Regular expression syntax cheatsheet - JavaScript | MDN
Note: A disjunction is another way to specify "a set of choices", but it's not a character class. Disjunctions are not atoms —...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found