Training on new custom dataset
See original GitHub issueHello! Thank you for the code.
I am trying to train the model on my own custom dataset (folder of images). But it’s a bit hard to do so. I tried to follow this:
` Create a new python script in the datasets package. Implement a dataset that inherits the fcdd.datasets.bases.TorchvisionDataset class. Your implementation needs to process all parameters of the fcdd.datasets.bases.load_dataset function in its initialization. You can use the preproc parameter to switch between different data preprocessing pipelines (augmentation, etc). In the end, your implementation needs to have at least all attributes defined in fcdd.datasets.bases.BaseADDataset class. Most importantly, the _train_set attribute and the _test_set attribute containing the corresponding torchvision-style datasets. Have a look at the already available implementations.
Add a name for your dataset to the fcdd.datasets.init.DS_CHOICES variable. Add your dataset to the “switch-case” in the fcdd.datasets.init.load_dataset function. Add the number of available class for your dataset to the fcdd.datasets.init.no_classes function and add the class names to fcdd.datasets.init.str_labels.
`
But I am finding it hard to do. I cant see where shall I put the images folder exactly and how to prepare those scripts. Can anyone help?
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Top GitHub Comments
Hi. You can put your image folder anywhere you want, in your implementation you just need to point to the correct location. Let’s go through this step by step. Consider your dataset being named foodata.
fcdd.datasets
. Implement a typical torchvision-style dataset for your image folder. You can reuse the PyTorch default implementationtorchvision.datasets.ImageFolder
. Read https://pytorch.org/docs/1.4.0/torchvision/datasets.html#datasetfolder for how your folder needs to be structured for this. So I imagine something like this:fcdd.datasets.__init__
so that it can be loaded by the trainer.I simplified things here a bit and ignored some optional arguments (such as oe_limit). To train on your new data, just run
python runners/run_imagenet.py --dataset foodata --net FCDD_CNN224_VGG --datadir PATH_TO_YOUR_DATA
. The --datadir argument should point to your data (for example --datadir /home/salmiachraf/fcdd/data/datasets for your data being in /home/salmiachraf/fcdd/data/datasets/foodata/train/class_0/sample_x). The runner uses an imagenet-pre-trained network applicable for input images of shape 224x224 (i.e., you need to have set shape in ADFooData to 224x224) and imagenet21k data as outlier exposure. Using the imagenet runner might seem a bit confusing here, but the runners are all the same, they just use a different set of default parameters. However, you can just add your own runner with your own default parameters for more convenience.Does this help you? Otherwise, feel free to provide more information. I’m glad to help.
That will be nicer to provide a more general data loader. And I think an external config file will be more convenient, all the variable parameters in this config file can be modified to complete our customized experiments. Because sometimes, I just want to test the algorithm on my dataset quikly to get a rough conclusion. Consider me as an user, please.