Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A minimal example with toy data set

See original GitHub issue

I was trying to use the toy data-sets but when I got errors like train doesn’t exist when trying to loop through the batches. Can we have a tiny minimal example to loop through the data for toy data sets?

My attempt

from torchmeta.toy import Sinusoid
#from torchmeta.datasets.helpers import omniglot
from torchmeta.utils.data import BatchMetaDataLoader

from tqdm import tqdm

num_samples_per_task = 10
dataset = Sinusoid(num_samples_per_task, num_tasks=10, noise_std=None,
    transform=None, target_transform=None, dataset_transform=None)
#dataset = omniglot("data", ways=5, shots=5, test_shots=15, meta_train=True, download=True)
dataloader = BatchMetaDataLoader(dataset, batch_size=5, num_workers=4)
print(f'len(dataset) = {len(dataset)}')
print(f'len(dataloader) = {len(dataloader)}')
for batch in dataloader:
    train_inputs, train_targets = batch["train"]

other weird things was like the tensors being of size 16 but my meta-batch size being of size 5…

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:12 (5 by maintainers)

Top GitHub Comments

1reaction

tristandeleucommented, Jul 21, 2020

MiniImagenet does not have a num_samples_per_task argument (this is specific to toy regression datasets). But you can indeed see this as being similar to the 600 images per class: it corresponds to the number of possible examples to sample from for this task. In the case of toy regression tasks, this is simply the number of support + number of query examples (5 + 10 here).

1reaction

tristandeleucommented, Jul 10, 2020

Data transforms (like ClassSplitter) can either be used as a data_transform, or as a wrapper (the wrapper is here just as syntactic sugar). The following two are equivalent

ClassSplitter as a dataset_transform argument

from torchmeta.toy import Sinusoid
from torchmeta.transforms import ClassSplitter

dataset = Sinusoid(num_samples_per_task=15,
    dataset_transform=ClassSplitter(num_train_per_class=5, num_test_per_class=10))

task = dataset.sample_task()
print(task)  # OrderedDict([('train', <torchmeta.utils.data.task.SubsetTask object at 0x11ba07dd8>), ('test', <torchmeta.utils.data.task.SubsetTask object at 0x11ba10240>)])

ClassSplitter as a wrapper

from torchmeta.toy import Sinusoid
from torchmeta.transforms import ClassSplitter

dataset = Sinusoid(num_samples_per_task=15)
dataset = ClassSplitter(dataset, num_train_per_class=5, num_test_per_class=10)

task = dataset.sample_task()
print(task)  # OrderedDict([('train', <torchmeta.utils.data.task.SubsetTask object at 0x12078eda0>), ('test', <torchmeta.utils.data.task.SubsetTask object at 0x120797208>)])

Top Results From Across the Web

Toy Dataset - Kaggle

A fictional dataset for exploratory data analysis (EDA) and to test simple prediction models. This toy dataset features 150000 rows and 6 columns....

7.1. Toy datasets — scikit-learn 1.2.0 documentation

The Linnerud dataset is a multi-output regression dataset. It consists of three exercise (data) and three physiological (target) variables collected from twenty ...

Top 5 Benchmark Datasets - Towards Data Science

Toy datasets can be used to teach important concepts in machine learning without having to deal with the challenges of data engineering.

Example of machine learning classification on a simple toy ...

Example of machine learning classification on a simple toy dataset. The artificial dataset is composed of two variables X1 and X2, with data...

A Guide to Getting Datasets for Machine Learning in Python

For example, the ImageNet dataset is over 160 GB. ... There are a handful of similar functions to load the “toy datasets” from...