question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`keras.utils.get_file` does not support gzip as advertised

See original GitHub issue

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): glinux 5.17.11-1rodete2-amd64
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.9.1
  • Python version: 3.10.0
  • Bazel version (if compiling from source): n/a
  • GPU model and memory: n/a
  • Exact command to reproduce:
gzip_path = "https://storage.googleapis.com/tf_model_garden/nlp/bert/v3/uncased_L-12_H-768_A-12.tar.gz"
# `/content/bert_base_uncased` is still a `tar.gz` file
ungzip_file = keras.utils.get_file(
    "/content/bert_base_uncased",
    gzip_path,
    extract=True,
    archive_format="tar", # bug occurs whether this arg is specified
)

Describe the problem.

get_file documentation claims to support gzip in the archive_format argument docstring (see https://www.tensorflow.org/api_docs/python/tf/keras/utils/get_file). However, I have tried several tar.gz files like the example above and they are not extracted.

Describe the current behavior. tar.gz files are downloaded but not extracted.

Describe the expected behavior. tar.gz files are downloaded and extracted. bert_base_uncased should be a folder with the following files:

tmp/temp_dir/raw/
tmp/temp_dir/raw/vocab.txt
tmp/temp_dir/raw/bert_model.ckpt.index
tmp/temp_dir/raw/bert_model.ckpt.data-00000-of-00001
tmp/temp_dir/raw/bert_config.json

Contributing.

  • Do you want to contribute a PR? (yes/no): No

Standalone code to reproduce the issue. Please see https://colab.research.google.com/drive/1OcIuIcii7CFhNudp9rIvNWNqU-VZg9SI?usp=sharing

Source code / logs. n/a see colab

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
mattdangerwcommented, Sep 8, 2022

I’m pretty sure the plan was to make this open for contributions. Is that right @jbischof ?

1reaction
jasonbrancaziocommented, Aug 31, 2022

I’m experiencing a similar issue with the Fashion MNIST data:

train_images_url = 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz'
train_labels_url = 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz'
test_images_url = 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz'
test_labels_url = 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz'

CACHE_DIR = '/content'
CACHE_SUBDIR = 'fashion_mnist'
for url in [train_images_url, train_labels_url, test_images_url, test_labels_url]:
    tf.keras.utils.get_file(url.split('/')[-1], url, extract=True, cache_dir=CACHE_DIR, cache_subdir=CACHE_SUBDIR, archive_format='zip')

The above code will download the .gz files to /content/fashion_mnist on Google Colab, but not extract them.

Read more comments on GitHub >

github_iconTop Results From Across the Web

tf.keras.utils.get_file | TensorFlow v2.11.0
Subdirectory under the Keras cache dir where the file is saved. If an absolute path /path/to/folder is specified the file will be saved...
Read more >
Can tf.keras.utils.get_file(), be used to load local zip files?
So in the documentation for tf.keras.utils.get_file() it states the first two arguments are mandatory, the rest can default per internals. These ...
Read more >
See raw diff - Hugging Face
If None, no hatching will be added to the contour. Hatching - is supported in the PostScript, PDF, SVG and Agg backends -...
Read more >
notebookffad25d0e2 - Kaggle
Explore and run machine learning code with Kaggle Notebooks | Using data from 60k Stack Overflow Questions with Quality Rating.
Read more >
Search Results - CVE
The mission of the CVE® Program is to identify, define, and catalog publicly disclosed cybersecurity vulnerabilities.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found