Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improving ImageNet-1k support

See original GitHub issue

W.r.t the current support for ImageNet-1k, we can improve things:

First, let’s start leveraging TFDS. It significantly reduces the work expected to be done by a user. Let’s walk through an example.

First, the user needs to keep the ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar archives to this path: gs://[BUCKET-NAME]/tensorflow_datasets/downloads/manual.

One this is done, the user does the following:

import tensorflow_datasets as tfds

data_dir = "gs://[BUCKET-NAME]/tensorflow_datasets"
builder = tfds.builder("imagenet2012", data_dir=data_dir)
builder.download_and_prepare()

builder.download_and_prepare() takes some time but it’s lesser than what the current process of obtaining the initial TFRecords takes.

The the user can load the ImageNet-1k dataset with tfds.load("imagenet2012", data_dir=data_dir) and that is it.

The above two points assume the user already has access to the GCS bucket and all the necessary privileges to write data into it.

General recommendations

W.r.t

https://github.com/keras-team/keras-cv/blob/e607e05e7d73dda4fcc1f11eca11d1f71d83bee4/keras_cv/datasets/imagenet/load.py#L92

enable interleaved reading by setting num_parallel_reads=tf.data.AUTOTUNE.

W.r.t

https://github.com/keras-team/keras-cv/blob/e607e05e7d73dda4fcc1f11eca11d1f71d83bee4/keras_cv/datasets/imagenet/load.py#L113

enable prefetching of a few batches so that the accelerator doesn’t have to wait by using dataset.prefetch(tf.data.AUTOTUNE).

Issue Analytics

State:
Created a year ago
Reactions:6
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

sayakpaulcommented, Aug 31, 2022

Agreed with TFDS approach for simplicity.

I think it’s also possible to use local path instead of GCS bucket.

Yes, it’s possible. However, keeping things inside a GCS Bucket is necessary to leverage TPU-based training runs. So, it kind of solves different purposes.

0reactions

tanzhenyucommented, Oct 27, 2022

tfds still requires you to download the dataset manually. Are you referring to the process of converting from .tar.gz to TFRecords?

Top Results From Across the Web

[2205.01580] Better plain ViT baselines for ImageNet-1k - arXiv

This note presents a few minor modifications to the original Vision Transformer (ViT) vanilla training setting that dramatically improve the ...

Does someone reproduce the accuracy of imagenet1k? #47

Could someone share the log of reproducing the accuracy of imagenet1k?

Is ImageNet21k a Better Dataset for Transfer Learning in ...

This informal report describes short experiments comparing ImageNet21k and ImageNet1k on a steganalysis task. Made by Yassine Yousfi using ...

imagenet-1k · Datasets at Hugging Face

Supported Tasks and Leaderboards. image-classification : The goal of this task is to ... Increasing the shape bias improves the accuracy and robustness....

Achieving Deep Learning Training in less than 40 Minutes on ...

By using the ILSVRC2014 validation data, we consistently increase the top-1 validation accuracy by 0.3%-0.4%, thus all models trained for at ...