tf.errors.OutOfRangeError error when used with ray tune
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): RHEL 7
- Ray installed from (source or binary): No
- Ray version: 0.5.2
- Python version: 2.7
- Exact command to reproduce: See below
Describe the problem
I’m trying to optimize my tf.dataset input pipeline using Ray tune. A tf.errors.OutOfRangeError
error is thrown by tensorflow when used in Ray’s environment, but works fine when ran independently. In the source code below, test_tfrecords_reader
when run independently from main works fine, whereas the same code when run with ray experiment in _data_fetch_for_search
fails with tf.errors.OutOfRangeError
with zero records read. What am I missing here?
Source code / logs
def test_tfrecords_reader(records_to_fetch=10000, timeout_s=60 * 30):
sess = tf.keras.backend.get_session()
records_fetched = 0
dataset = tfrecords_reader(DATA_FILES,
parallel_file_reads=48,
batch_size=512,
num_parallel_data_fetch=24,
buffer_size=250,
prefetch_buffer=250)
next_batch = dataset.make_one_shot_iterator().get_next()
start = time()
while True:
try:
records = sess.run(next_batch) # an array of x's and y's
records_fetched += records[1].shape[0]
if records_fetched >= records_to_fetch or time() - start > timeout_s:
break
except tf.errors.OutOfRangeError:
print("Exhausted all data")
break
print("Finished fetch in {}s and fetched {} records".format(time() - start, records_fetched))
def _data_fetch_for_search(config, reporter):
records_to_fetch = 10000
sess = tf.keras.backend.get_session()
records_fetched = 0
dataset = tfrecords_reader(DATA_FILES,
parallel_file_reads=48,
batch_size=512,
num_parallel_data_fetch=24,
buffer_size=250,
prefetch_buffer=250)
next_batch = dataset.make_one_shot_iterator().get_next()
start = time()
while True:
try:
records = sess.run(next_batch) # an array of x's and y's
records_fetched += 512 # records[1].shape[0]
reporter(samples_freq=records_fetched / duration, duration=duration, samples_fetched=records_fetched)
if records_fetched >= records_to_fetch:
break
except tf.errors.OutOfRangeError:
print("Exhausted all data after fetching {} records".format(records_fetched))
break
duration = time() - start
reporter(samples_freq=records_fetched / duration, duration=duration, samples_fetched=records_fetched)
def ray_search(total_trails=80, experiment_name="nqs_input_pipeline"):
num_parallel_trails = 1 # number of GPUs
max_time_for_trial_s = 60 * 60 # 1 hour
cores = multiprocessing.cpu_count()
ray_results_save_dir = "./ray_search_results"
ahb = AsyncHyperBandScheduler(
time_attr="time_total_s",
reward_attr="duration",
grace_period=1,
max_t=max_time_for_trial_s)
space = {
"batch_size": hp.uniform("batch_size", 128, 1024), # integer only
"num_parallel_data_fetch": hp.uniform("num_parallel_data_fetch", cores / 2, cores * 2), # integer only
"buffer": hp.uniform("buffer", 250, 2048), # integer only
"parallel_file_reads": hp.uniform("parallel_file_reads", cores / 2, cores * 2), # integer only
"prefetch_buffer_size": hp.uniform("prefetch_buffer_size", 250, 2048),
}
experiment_spec = {
experiment_name: {
"run": _data_fetch_for_search,
"stop": {
"time_total_s": max_time_for_trial_s
},
"num_samples": total_trails,
"local_dir": ray_results_save_dir,
"max_failures": 2
}
}
algo = HyperOptSearch(space, max_concurrent=num_parallel_trails, reward_attr="duration")
ray.init(redirect_output=True)
run_experiments(experiment_spec, scheduler=ahb, search_alg=algo, verbose=False)
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (4 by maintainers)
Top Results From Across the Web
tf.errors.OutOfRangeError | TensorFlow v2.11.0
Raised when an operation iterates past the valid input range.
Read more >Using RayTune with tensorflow_recommenders library and PBT
I was trying to use ray tune with tensorflow_recommenders library but it gives me constatnly error: /usr/local/lib/python3.6/dist-packages/six.
Read more >why this Tensorflow code raises tf.errors.OutOfRangeError?
From tf.errors.OutofRangeError doc: Raised when an operation iterates past the valid input range. This exception is raised in "end-of-file" ...
Read more >ray-tune-with-tfrecords - Colaboratory - Google Colab
import tensorflow as tf from tensorflow import keras import pandas as pd import os import multiprocessing import ray from ray.tune import run_experiments
Read more >Release 2.12.0 - Google Git
If you use masked losses with Keras the loss values may be different in TensorFlow 2.12 ... DepthwiseConv2D ) now operate deterministically (and...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Awesome! Glad it worked out.
BTW, you can do
ray.init(ignore_reinit_error=True)
to achieve the same effect as your try-catch.On Fri, Oct 12, 2018 at 11:45 AM Nitin Pasumarthy notifications@github.com wrote:
Sorry for the late reply. This is so odd; it works on the colab environment for me too. This is a nice notebook BTW!
Do you have a version of the notebook that I can run on a local machine?