question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A minimal example with toy data set

See original GitHub issue

I was trying to use the toy data-sets but when I got errors like train doesn’t exist when trying to loop through the batches. Can we have a tiny minimal example to loop through the data for toy data sets?

My attempt

from torchmeta.toy import Sinusoid
#from torchmeta.datasets.helpers import omniglot
from torchmeta.utils.data import BatchMetaDataLoader

from tqdm import tqdm

num_samples_per_task = 10
dataset = Sinusoid(num_samples_per_task, num_tasks=10, noise_std=None,
    transform=None, target_transform=None, dataset_transform=None)
#dataset = omniglot("data", ways=5, shots=5, test_shots=15, meta_train=True, download=True)
dataloader = BatchMetaDataLoader(dataset, batch_size=5, num_workers=4)
print(f'len(dataset) = {len(dataset)}')
print(f'len(dataloader) = {len(dataloader)}')
for batch in dataloader:
    train_inputs, train_targets = batch["train"]

other weird things was like the tensors being of size 16 but my meta-batch size being of size 5…

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
tristandeleucommented, Jul 21, 2020

MiniImagenet does not have a num_samples_per_task argument (this is specific to toy regression datasets). But you can indeed see this as being similar to the 600 images per class: it corresponds to the number of possible examples to sample from for this task. In the case of toy regression tasks, this is simply the number of support + number of query examples (5 + 10 here).

1reaction
tristandeleucommented, Jul 10, 2020

Data transforms (like ClassSplitter) can either be used as a data_transform, or as a wrapper (the wrapper is here just as syntactic sugar). The following two are equivalent

  • ClassSplitter as a dataset_transform argument
from torchmeta.toy import Sinusoid
from torchmeta.transforms import ClassSplitter

dataset = Sinusoid(num_samples_per_task=15,
    dataset_transform=ClassSplitter(num_train_per_class=5, num_test_per_class=10))

task = dataset.sample_task()
print(task)  # OrderedDict([('train', <torchmeta.utils.data.task.SubsetTask object at 0x11ba07dd8>), ('test', <torchmeta.utils.data.task.SubsetTask object at 0x11ba10240>)])
  • ClassSplitter as a wrapper
from torchmeta.toy import Sinusoid
from torchmeta.transforms import ClassSplitter

dataset = Sinusoid(num_samples_per_task=15)
dataset = ClassSplitter(dataset, num_train_per_class=5, num_test_per_class=10)

task = dataset.sample_task()
print(task)  # OrderedDict([('train', <torchmeta.utils.data.task.SubsetTask object at 0x12078eda0>), ('test', <torchmeta.utils.data.task.SubsetTask object at 0x120797208>)])
Read more comments on GitHub >

github_iconTop Results From Across the Web

Toy Dataset - Kaggle
A fictional dataset for exploratory data analysis (EDA) and to test simple prediction models. This toy dataset features 150000 rows and 6 columns....
Read more >
7.1. Toy datasets — scikit-learn 1.2.0 documentation
The Linnerud dataset is a multi-output regression dataset. It consists of three exercise (data) and three physiological (target) variables collected from twenty ...
Read more >
Top 5 Benchmark Datasets - Towards Data Science
Toy datasets can be used to teach important concepts in machine learning without having to deal with the challenges of data engineering.
Read more >
Example of machine learning classification on a simple toy ...
Example of machine learning classification on a simple toy dataset. The artificial dataset is composed of two variables X1 and X2, with data...
Read more >
A Guide to Getting Datasets for Machine Learning in Python
For example, the ImageNet dataset is over 160 GB. ... There are a handful of similar functions to load the “toy datasets” from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found