question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support returning indices in text_dataset_from_directory

See original GitHub issue

For data logging I need the indices ids of a row/ batch. Current implementation is limited in this regard:

tf.keras.utils.text_dataset_from_directory(
    directory
)
>> {"attention_mask" , "input_ids"}

For specific tasks the row id is needed:

tf.keras.utils.text_dataset_from_directory(
    directory,
    with_indices=True
)
>>  {"attention_mask" , "input_ids", "indices_ids"}

@haifeng-jin, @Haaris-Rahman, @mattdangerw, @guberti, @edumucelli

Issue Analytics

  • State:closed
  • Created 9 months ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
chenmoneygithubcommented, Dec 8, 2022

yea I did not realize the map_fn has to be stateless. Another way is:

idx_ds = tf.data.experimental.Counter().take(ds.cardinality())
new_ds = tf.data.Dataset.zip((ds, idx_ds))

The new_ds object will have the index .

0reactions
franz101commented, Dec 8, 2022
counter = -1
def fn(*data):
  global counter
  counter += 1
  return data, counter
a_iter = iter(train_ds.unbatch().map(fn))
a = next(a_iter)

trying this out in google colab, actually does not increment the counter

Read more comments on GitHub >

github_iconTop Results From Across the Web

tf.keras.utils.text_dataset_from_directory | TensorFlow v2.11.0
Generates a tf.data.Dataset from text files in a directory.
Read more >
python - Get labels from dataset when using tensorflow ...
My images are organized in directories having the label as the name. The documentation says the function returns a tf.data.Dataset object.
Read more >
Load - Hugging Face
Wherever a dataset is stored, Datasets can help you load it. This guide will show you how to load a dataset from: The...
Read more >
Add an index to a list or library column - Microsoft Support
Learn how to add indexes to SharePoint list and library columns to make ... you are using to filter data both in views...
Read more >
Image data loading - Keras
Dataset from image files in a directory. ... labels='inferred') will return a tf.data. ... Supported image formats: jpeg, png, bmp, gif.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found