question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improving ImageNet-1k support

See original GitHub issue

W.r.t the current support for ImageNet-1k, we can improve things:

  • First, let’s start leveraging TFDS. It significantly reduces the work expected to be done by a user. Let’s walk through an example.

First, the user needs to keep the ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar archives to this path: gs://[BUCKET-NAME]/tensorflow_datasets/downloads/manual.

  • One this is done, the user does the following:
import tensorflow_datasets as tfds

data_dir = "gs://[BUCKET-NAME]/tensorflow_datasets"
builder = tfds.builder("imagenet2012", data_dir=data_dir)
builder.download_and_prepare()

builder.download_and_prepare() takes some time but it’s lesser than what the current process of obtaining the initial TFRecords takes.

  • The the user can load the ImageNet-1k dataset with tfds.load("imagenet2012", data_dir=data_dir) and that is it.

The above two points assume the user already has access to the GCS bucket and all the necessary privileges to write data into it.

General recommendations

W.r.t

https://github.com/keras-team/keras-cv/blob/e607e05e7d73dda4fcc1f11eca11d1f71d83bee4/keras_cv/datasets/imagenet/load.py#L92

enable interleaved reading by setting num_parallel_reads=tf.data.AUTOTUNE.

W.r.t

https://github.com/keras-team/keras-cv/blob/e607e05e7d73dda4fcc1f11eca11d1f71d83bee4/keras_cv/datasets/imagenet/load.py#L113

enable prefetching of a few batches so that the accelerator doesn’t have to wait by using dataset.prefetch(tf.data.AUTOTUNE).

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:6
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
sayakpaulcommented, Aug 31, 2022

Agreed with TFDS approach for simplicity.

I think it’s also possible to use local path instead of GCS bucket.

Yes, it’s possible. However, keeping things inside a GCS Bucket is necessary to leverage TPU-based training runs. So, it kind of solves different purposes.

0reactions
tanzhenyucommented, Oct 27, 2022

tfds still requires you to download the dataset manually. Are you referring to the process of converting from .tar.gz to TFRecords?

Read more comments on GitHub >

github_iconTop Results From Across the Web

[2205.01580] Better plain ViT baselines for ImageNet-1k - arXiv
This note presents a few minor modifications to the original Vision Transformer (ViT) vanilla training setting that dramatically improve the ...
Read more >
Does someone reproduce the accuracy of imagenet1k? #47
Could someone share the log of reproducing the accuracy of imagenet1k?
Read more >
Is ImageNet21k a Better Dataset for Transfer Learning in ...
This informal report describes short experiments comparing ImageNet21k and ImageNet1k on a steganalysis task. Made by Yassine Yousfi using ...
Read more >
imagenet-1k · Datasets at Hugging Face
Supported Tasks and Leaderboards. image-classification : The goal of this task is to ... Increasing the shape bias improves the accuracy and robustness....
Read more >
Achieving Deep Learning Training in less than 40 Minutes on ...
By using the ILSVRC2014 validation data, we consistently increase the top-1 validation accuracy by 0.3%-0.4%, thus all models trained for at ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found