Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Predicting with PrettyBigModel `InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024)`

See original GitHub issue

Hi, I was interested in testing your PrettyBig model. I’ve downloaded the model and edited the PrettyBig.json to point to the downloaded encoder and model paths. When running:

python3 main.py --model PrettyBig.eval.json --predict_text "Hello there! My name is"

I get the following error:

{'n_head': 16, 'encoder_path': '/Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/encoder', 'n_vocab': 50257, 'embed_dropout': 0.0, 'lr': 0.00025, 'warmup_steps': 2000, 'weight_decay': 0.01, 'beta1': 0.9, 'beta2': 0.98, 'epsilon': 1e-09, 'opt_n
ame': 'adam', 'train_batch_size': 256, 'attn_dropout': 0.0, 'train_steps': 10000, 'eval_steps': 10, 'max_steps': 604800, 'data_path': 'gs://connors-datasets/openwebtext/', 'scale': 0.14433756729740646, 'res_dropout': 0.1, 'predict_batch_s
ize': 1, 'eval_batch_size': 256, 'iterations': 100, 'n_embd': 1024, 'input': 'openwebtext_longbiased', 'model': 'GPT2', 'model_path': '/Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/PrettyBig', 'n_ctx': 1024, 'predict_path': 'logs/prediction
s_SortaBig.txt', 'n_layer': 25, 'use_tpu': False, 'precision': 'float32'}
Using config: {'_model_dir': '/Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/PrettyBig', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': , '_keep_checkpo
int_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x13fbf8ef0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}                                                                                                                                                                                                                          Generating predictions...
From /Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.                      Instructions for updating:                                                                                                                                                                                                                    Colocations handled automatically by placer.                                                                                                                                                                                                  Calling model_fn.
From /Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/models/gpt2/sample.py:57: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
From /Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/models/gpt2/sample.py:59: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.random.categorical instead.
Done calling model_fn.
Graph was finalized.
2019-06-08 15:55:47.498527: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
From /Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Restoring parameters from /Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/PrettyBig/model.ckpt
Running local_init_op.
Done running local_init_op.
Traceback (most recent call last):
  File "/Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024)
         [[{{node sample_sequence/while/model/GatherV2_1}}]]

python3 --version                                                                                                                                                                                                                      
Python 3.6.8 :: Anaconda, Inc.

pip3 list | grep tensorflow
mesh-tensorflow                    0.0.5
tensorflow                         1.13.1
tensorflow-datasets                1.0.1
tensorflow-estimator               1.13.0
tensorflow-metadata                0.13.0
tensorflow-probability             0.6.0

Any ideas appreciated. Thanks!

Issue Analytics

State:
Created 4 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

3reactions

minimaxircommented, Jun 19, 2019

I encountered the same issue working on gpt-2-simple: https://github.com/minimaxir/gpt-2-simple/issues/38

The solution was to subtract the length of the prefix tokens from the maximum length to prevent OOB.

3reactions

ConnorJLcommented, Jun 11, 2019

This is a known bug. I haven’t yet had the time to track down the exact cause. Three things you can try are setting the precision to float32, use a GPU instead of a CPU or change the “train_batch_size” and “predict_batch_size” parameters to 1. Some of these seem to fix it sometimes. I will fix this bug when I have the time to actually track down its source.

The bug also shouldn’t happen if you predict with a single word.