question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can I change the vocabulary size? Getting this error in AutoKeras (TextClassifier):

See original GitHub issue

Bug Description

Getting the following error using multi label classification: Is there a way to increase the vocabulary?

`Epoch 1/1000 2020-03-17 17:56:31.128359: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-03-17 17:56:31.665519: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-03-17 17:56:33.256128: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only Relying on driver to perform ptx compilation. This message will be only logged once.

1/606 […] - ETA: 31:35 - loss: 0.6975 - accuracy: 0.4538 2/606 […] - ETA: 16:35 - loss: 0.6923 - accuracy: 0.5070 3/606 […] - ETA: 11:29 - loss: 0.6879 - accuracy: 0.5554 4/606 […] - ETA: 8:56 - loss: 0.6833 - accuracy: 0.5962 5/606 […] - ETA: 7:24 - loss: 0.6788 - accuracy: 0.6288 6/606 […] - ETA: 6:24 - loss: 0.6741 - accuracy: 0.6549 7/606 […] - ETA: 5:40 - loss: 0.6685 - accuracy: 0.67802020-03-17 17:56:34.376333: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: indices[28,106] = 20000 is not in [0, 20000) [[{{node model/embedding/embedding_lookup}}]] [[VariableShape/_50]] 2020-03-17 17:56:34.376719: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: indices[28,106] = 20000 is not in [0, 20000) [[{{node model/embedding/embedding_lookup}}]]

8/606 […] - ETA: 5:11 - loss: 0.6685 - accuracy: 0.6780WARNING:tensorflow:Early stopping conditioned on metric val_loss which is not available. Available metrics are: loss,accuracy WARNING:tensorflow:Can save best model only with val_loss available, skipping. Traceback (most recent call last): File “C:\Development\Python\Python376\lib\contextlib.py”, line 130, in exit self.gen.throw(type, value, traceback) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\ops\variable_scope.py”, line 2803, in variable_creator_scope yield File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py”, line 342, in fit total_epochs=epochs) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py”, line 128, in run_one_epoch batch_outs = execution_function(iterator) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py”, line 98, in execution_function distributed_function(input_fn)) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\def_function.py”, line 568, in call result = self._call(*args, **kwds) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\def_function.py”, line 599, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\function.py”, line 2363, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\function.py”, line 1611, in _filtered_call self.captured_inputs) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\function.py”, line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\function.py”, line 545, in call ctx=ctx) File “C:\Development\Python\Python376\lib\site-packages\tensorflow_core\python\eager\execute.py”, line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File “<string>”, line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: indices[28,106] = 20000 is not in [0, 20000) [[node model/embedding/embedding_lookup (defined at \Development\Python\Python376\lib\site-packages\autokeras\engine\tuner.py:71) ]] [[VariableShape/_50]] (1) Invalid argument: indices[28,106] = 20000 is not in [0, 20000) [[node model/embedding/embedding_lookup (defined at \Development\Python\Python376\lib\site-packages\autokeras\engine\tuner.py:71) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_distributed_function_3354]

Errors may have originated from an input operation. Input Source operations connected to node model/embedding/embedding_lookup: model/embedding/embedding_lookup/2981 (defined at \Development\Python\Python376\lib\contextlib.py:112)

Input Source operations connected to node model/embedding/embedding_lookup: model/embedding/embedding_lookup/2981 (defined at \Development\Python\Python376\lib\contextlib.py:112)

Function call stack: distributed_function -> distributed_function`

Bug Reproduction

Code for reproducing the bug:

Data used by the code:

Expected Behavior

Setup Details

Include the details about the versions of:

  • OS type and version:
  • Python:
  • autokeras:
  • keras-tuner:
  • scikit-learn:
  • numpy:
  • pandas:
  • tensorflow:

Additional context

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
haifeng-jincommented, Mar 23, 2020

You have to use the https://autokeras.com/preprocessor/#texttointsequence-class. The max tokens is the vocabulary size.

1reaction
realjanpauluscommented, Mar 24, 2020

@haifeng-jin This did solved my problem, thank you. My problem was mainly not to use AutoModel. I had used TextClassifier before but this didn’t work. The following code worked:

max_features = 10000

input_node = ak.TextInput()
output_node = ak.TextToIntSequence(max_tokens=max_features)(input_node)
output_node = ak.ClassificationHead()(output_node)
clf = ak.AutoModel(inputs=input_node, outputs=output_node, max_trials=1)
Read more comments on GitHub >

github_iconTop Results From Across the Web

TextClassifier - AutoKeras
It will search for the best model based on the performances on validation data. Arguments. x: numpy.ndarray or tensorflow.Dataset. Training data x. The...
Read more >
Intro to text classification with Keras: automatically tagging ...
To improve that, we could experiment with various hyperparameters: Changing the vocab size the BOW model uses; Changing batch size, number of ...
Read more >
Basic text classification | TensorFlow Core
This notebook trains a sentiment analysis model to classify movie reviews as positive or negative, based on the text of the review. This...
Read more >
Solution for submission 171996 | Posts - AIcrowd
Getting Started with Programming Language Classification ... uninstalled fastai-1.0.61 ERROR: pip's dependency resolver does not currently ...
Read more >
Text classification made easy with AutoKeras
AutoML can support users with limited knowledge in… ... In this post, we will use AutoKeras to train a text classifier able to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found