Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PatchCore results are much worth than reported

See original GitHub issue

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

Go to the Main directory

Run python tools/train.py --model patchcore

Expected behavior

The image AUCROC to be .98 for the category carpet of the MvTech dataset but it is very low. Fastflow work as expected so the problem seems to be patchcore. Screenshots

Hardware and Software Configuration

OS: [Ubuntu]
NVIDIA Driver Version [470.141.03]
CUDA Version [11.4]
CUDNN Version [e.g. v11.4.120]

Log

WARNING: CPU random generator seem to be failing, disabling hardware random number generation WARNING: RDRND generated: 0xffffffff 0xffffffff 0xffffffff 0xffffffff ----------------------------------/anomalib/config/config.py:166: UserWarning: config.project.unique_dir is set to False. This does not ensure that your results will be written in an empty directory and you may overwrite files. warn( 2022-11-16 11:52:49,662 - anomalib.data - INFO - Loading the datamodule 2022-11-16 11:52:49,662 - anomalib.pre_processing.pre_process - WARNING - Transform configs has not been provided. Images will be normalized using ImageNet statistics. 2022-11-16 11:52:49,663 - anomalib.pre_processing.pre_process - WARNING - Transform configs has not been provided. Images will be normalized using ImageNet statistics. 2022-11-16 11:52:49,663 - anomalib.models - INFO - Loading the model. 2022-11-16 11:52:49,667 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpk5fh8j6r 2022-11-16 11:52:49,667 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpk5fh8j6r/_remote_module_non_scriptable.py 2022-11-16 11:52:49,674 - anomalib.models.components.base.anomaly_module - INFO - Initializing PatchcoreLightning model. /home/-/code/anomalib/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `PrecisionRecallCurve` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint. warnings.warn(*args, kwargs) 2022-11-16 11:52:50,882 - timm.models.helpers - INFO - Loading pretrained weights from url (https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/wide_resnet50_racm-8234f177.pth) 2022-11-16 11:52:51,009 - anomalib.utils.loggers - INFO - Loading the experiment logger(s) 2022-11-16 11:52:51,009 - anomalib.utils.callbacks - INFO - Loading the callbacks /home/-/code/anomalib/src/anomalib/anomalib/utils/callbacks/init**.py:141: UserWarning: Export option: None not found. Defaulting to no model export warnings.warn(f"Export option: {config.optimization.export_mode} not found. Defaulting to no model export") 2022-11-16 11:52:51,012 - pytorch_lightning.utilities.rank_zero - INFO - GPU available: True, used: True 2022-11-16 11:52:51,012 - pytorch_lightning.utilities.rank_zero - INFO - TPU available: False, using: 0 TPU cores 2022-11-16 11:52:51,012 - pytorch_lightning.utilities.rank_zero - INFO - IPU available: False, using: 0 IPUs 2022-11-16 11:52:51,012 - pytorch_lightning.utilities.rank_zero - INFO - HPU available: False, using: 0 HPUs 2022-11-16 11:52:51,012 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used… 2022-11-16 11:52:51,012 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used… 2022-11-16 11:52:51,012 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used… 2022-11-16 11:52:51,012 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_predict_batches=1.0)` was configured so 100% of the batches will be used… 2022-11-16 11:52:51,012 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch… 2022-11-16 11:52:51,012 - anomalib - INFO - Training the model. 2022-11-16 11:52:51,016 - anomalib.data.mvtec - INFO - Found the dataset. 2022-11-16 11:52:51,018 - anomalib.data.mvtec - INFO - Setting up train, validation, test and prediction datasets. /-/-/code/anomalib/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `ROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint. warnings.warn(*args, **kwargs) 2022-11-16 11:52:52,479 - pytorch_lightning.accelerators.gpu - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1] /-/-/code/anomalib/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py:183: UserWarning: `LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer rank_zero_warn( 2022-11-16 11:52:52,482 - pytorch_lightning.callbacks.model_summary - INFO - | Name | Type | Params

0 | image_threshold | AnomalyScoreThreshold | 0
1 | pixel_threshold | AnomalyScoreThreshold | 0
2 | model | PatchcoreModel | 24.9 M 3 | image_metrics | AnomalibMetricCollection | 0
4 | pixel_metrics | AnomalibMetricCollection | 0
5 | normalization_metrics | MinMax | 0

24.9 M Trainable params 0 Non-trainable params 24.9 M Total params 99.450 Total estimated model params size (MB) Epoch 0: 8%|▊ | 1/13 [00:01<00:16, 1.37s/it, loss=nan]/-/-/code/anomalib/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:137: UserWarning: training_step returned None. If this was on purpose, ignore this warning… self.warning_cache.warn(“training_step returned None. If this was on purpose, ignore this warning…”) Epoch 0: 69%|██████▉ | 9/13 [00:01<00:00, 4.67it/s, loss=nan] Validation: 0it [00:00, ?it/s]2022-11-16 11:52:54,414 - anomalib.models.patchcore.lightning_model - INFO - Aggregating the embedding extracted from the training set. 2022-11-16 11:52:54,415 - anomalib.models.patchcore.lightning_model - INFO - Applying core-set subsampling to get the embedding. Epoch 0: 69%|██████▉ | 9/13 [00:20<00:08, 2.22s/it, loss=nan] Validation: 0%| | 0/4 [00:00<?, ?it/s] Validation DataLoader 0: 0%| | 0/4 [00:00<?, ?it/s] Validation DataLoader 0: 25%|██▌ | 1/4 [00:00<00:00, 4.13it/s] Epoch 0: 77%|███████▋ | 10/13 [00:59<00:17, 5.94s/it, loss=nan] Validation DataLoader 0: 50%|█████ | 2/4 [00:00<00:00, 4.04it/s] Epoch 0: 85%|████████▍ | 11/13 [00:59<00:10, 5.42s/it, loss=nan] Validation DataLoader 0: 75%|███████▌ | 3/4 [00:00<00:00, 4.02it/s] Epoch 0: 92%|█████████▏| 12/13 [00:59<00:04, 4.99s/it, loss=nan] Validation DataLoader 0: 100%|██████████| 4/4 [00:00<00:00, 4.63it/s] Epoch 0: 100%|██████████| 13/13 [01:00<00:00, 4.67s/it, loss=nan, pixel_F1Score=0.548, pixel_AUROC=0.986] Epoch 0: 100%|██████████| 13/13 [01:01<00:00, 4.69s/it, loss=nan, pixel_F1Score=0.548, pixel_AUROC=0.986] 2022-11-16 11:53:53,628 - anomalib.utils.callbacks.timer - INFO - Training took 61.15 seconds 2022-11-16 11:53:53,628 - anomalib - INFO - Loading the best model weights. 2022-11-16 11:53:53,628 - anomalib - INFO - Testing the model. 2022-11-16 11:53:53,632 - anomalib.data.mvtec - INFO - Found the dataset. 2022-11-16 11:53:53,633 - anomalib.data.mvtec - INFO - Setting up train, validation, test and prediction datasets. /-/code/anomalib/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric ROC will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint. warnings.warn(*args, **kwargs) 2022-11-16 11:53:53,716 - pytorch_lightning.accelerators.gpu - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1] 2022-11-16 11:53:53,718 - anomalib.utils.callbacks.model_loader - INFO - Loading the model from /home/-/code/anomalib/src/anomalib/results/patchcore/mvtec/carpet/run/weights/model.ckpt Testing DataLoader 0: 100%|██████████| 4/4 [00:19<00:00, 4.65s/it]2022-11-16 11:54:14,762 - anomalib.utils.callbacks.timer - INFO - Testing took 20.9255051612854 seconds Throughput (batch_size=32) : 5.591262867883519 FPS Testing DataLoader 0: 100%|██████████| 4/4 [00:19<00:00, 4.97s/it] ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Test metric DataLoader 0 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── image_AUROC 0.4036917984485626 image_F1Score 0.8640776872634888 pixel_AUROC 0.9860672950744629 pixel_F1Score 0.5481611490249634 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Process finished with exit code 0

Issue Analytics

State:
Created 10 months ago
Comments:13 (8 by maintainers)

Top GitHub Comments

2reactions

Jonas1302commented, Dec 14, 2022

Thank you very much! I can confirm that the results are definitely better!

Just in case anyone is interested, I did a few experiments during the last few days which I wanted to share. But feel free to skip the rest of this comment.

I trained different settings twice over all categories with the seeds 0 and 42. I had an average image AUROC of 0.944 whereas the paper states 0.990. Using the newest fix #791 and cropping, I get 0.987, so I’d say it’s close enough.

The settings are as followed:

“old implementation”: the main branch a few days ago with my fix but without the latest one
“original embedding”: I replaced PatchcoreModel.generate_embedding with the algorithm from the original Patchcore implementation (although I think it has two errors (or I messed up the parameters 🤷 ) but it still returns good results)
“PR-791”: well, anomalib after merging #791
“+ cropping”: resized the images to 256x256 and cropped to 224x224
“paper - 10” are the results provided by the paper (for Patchcore-10%)
“anomalib README” are the results stated here

Bildschirmfoto vom 2022-12-14 18-20-34

1reaction

djdamelncommented, Dec 14, 2022

We managed to narrow down the performance deterioration to a small change in the Average Pooling layer that was made some time ago, and we’ve reverted that commit for now. I ran a quick experiment on a few MVTec categories where I compared our numbers to those obtained by running the original implementation. These are some results:

--- Bottle
-- Original
image AUROC: 1.0
pixel AUROC: 0.980
-- Anomalib
image AUROC: 1.0
pixel AUROC: 0.983

--- Carpet
-- Original
image AUROC: 0.986
pixel AUROC: 0.988
-- Anomalib
image AUROC: 0.986
pixel AUROC: 0.988

--- Grid
-- Original
image AUROC: 0.982
pixel AUROC: 0.986
-- Anomalib
image AUROC: 0.956
pixel AUROC: 0.970

(The seed was set to 0 in both the implementations, and all other parameters were kept at default, in case anyone would like to repeat these numbers).

There is still a small difference in the grid category, but this could possibly be attributed to the absence of center-cropping in Anomalib. The original implementation first resizes the images to 256x256 and then center-crops to 224x224, while we directly resize to 224x244. When I increase the image_size to 256x256 in the Anomalib config, the numbers are already much closer: