Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can I change the vocabulary size? Getting this error in AutoKeras (TextClassifier):

See original GitHub issue

Bug Description

Getting the following error using multi label classification: Is there a way to increase the vocabulary?

`Epoch 1/1000 2020-03-17 17:56:31.128359: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-03-17 17:56:31.665519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-03-17 17:56:33.256128: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. This message will be only logged once.

1/606 […] - ETA: 31:35 - loss: 0.6975 - accuracy: 0.4538 2/606 […] - ETA: 16:35 - loss: 0.6923 - accuracy: 0.5070 3/606 […] - ETA: 11:29 - loss: 0.6879 - accuracy: 0.5554 4/606 […] - ETA: 8:56 - loss: 0.6833 - accuracy: 0.5962 5/606 […] - ETA: 7:24 - loss: 0.6788 - accuracy: 0.6288 6/606 […] - ETA: 6:24 - loss: 0.6741 - accuracy: 0.6549 7/606 […] - ETA: 5:40 - loss: 0.6685 - accuracy: 0.67802020-03-17 17:56:34.376333: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: indices[28,106] = 20000 is not in [0, 20000) [[{{node model/embedding/embedding_lookup}}]] [[VariableShape/_50]] 2020-03-17 17:56:34.376719: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: indices[28,106] = 20000 is not in [0, 20000) [[{{node model/embedding/embedding_lookup}}]]

8/606 […] - ETA: 5:11 - loss: 0.6685 - accuracy: 0.6780WARNING:tensorflow:Early stopping conditioned on metric val_loss which is not available. Available metrics are: loss,accuracy WARNING:tensorflow:Can save best model only with val_loss available, skipping. Traceback (most recent call last): File “C:\Development\Python\Python376\lib\contextlib.py”, line 130, in exit self.gen.throw(type, value, traceback) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\ops\variable_scope.py”, line 2803, in variable_creator_scope yield File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py”, line 342, in fit total_epochs=epochs) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py”, line 128, in run_one_epoch batch_outs = execution_function(iterator) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py”, line 98, in execution_function distributed_function(input_fn)) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\def_function.py”, line 568, in call result = self._call(*args, **kwds) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\def_function.py”, line 599, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\function.py”, line 2363, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\function.py”, line 1611, in _filtered_call self.captured_inputs) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\function.py”, line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\function.py”, line 545, in call ctx=ctx) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\execute.py”, line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File “<string>”, line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: indices[28,106] = 20000 is not in [0, 20000) [[node model/embedding/embedding_lookup (defined at \Development\Python\Python376\lib\site-packages\autokeras\engine\tuner.py:71) ]] [[VariableShape/_50]] (1) Invalid argument: indices[28,106] = 20000 is not in [0, 20000) [[node model/embedding/embedding_lookup (defined at \Development\Python\Python376\lib\site-packages\autokeras\engine\tuner.py:71) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_distributed_function_3354]

Errors may have originated from an input operation. Input Source operations connected to node model/embedding/embedding_lookup: model/embedding/embedding_lookup/2981 (defined at \Development\Python\Python376\lib\contextlib.py:112)

Input Source operations connected to node model/embedding/embedding_lookup: model/embedding/embedding_lookup/2981 (defined at \Development\Python\Python376\lib\contextlib.py:112)

Function call stack: distributed_function -> distributed_function`

Bug Reproduction

Code for reproducing the bug:

Data used by the code:

Expected Behavior

Setup Details

Include the details about the versions of:

OS type and version:
Python:
autokeras:
keras-tuner:
scikit-learn:
numpy:
pandas:
tensorflow:

Additional context

Issue Analytics

State:
Created 4 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

2reactions

haifeng-jincommented, Mar 23, 2020

You have to use the https://autokeras.com/preprocessor/#texttointsequence-class. The max tokens is the vocabulary size.

1reaction

realjanpauluscommented, Mar 24, 2020

@haifeng-jin This did solved my problem, thank you. My problem was mainly not to use AutoModel. I had used TextClassifier before but this didn’t work. The following code worked:

max_features = 10000

input_node = ak.TextInput()
output_node = ak.TextToIntSequence(max_tokens=max_features)(input_node)
output_node = ak.ClassificationHead()(output_node)
clf = ak.AutoModel(inputs=input_node, outputs=output_node, max_trials=1)

Top Results From Across the Web

TextClassifier - AutoKeras

It will search for the best model based on the performances on validation data. Arguments. x: numpy.ndarray or tensorflow.Dataset. Training data x. The...

Intro to text classification with Keras: automatically tagging ...

To improve that, we could experiment with various hyperparameters: Changing the vocab size the BOW model uses; Changing batch size, number of ...

Basic text classification | TensorFlow Core

This notebook trains a sentiment analysis model to classify movie reviews as positive or negative, based on the text of the review. This...

Solution for submission 171996 | Posts - AIcrowd

Getting Started with Programming Language Classification ... uninstalled fastai-1.0.61 ERROR: pip's dependency resolver does not currently ...

Text classification made easy with AutoKeras

AutoML can support users with limited knowledge in… ... In this post, we will use AutoKeras to train a text classifier able to...