question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Predicting a conv1d model in keras that takes index of words in a sentence as input

See original GitHub issue

So I have prepared a sentiment analysis model and I am trying to predict it with new input but I am faced with an error:

data = getFile('Cleaned Data.xlsx')

data['Description'] = data['Description'].apply(lambda x: x.lower())
data['Description'] = data['Description'].apply((lambda x: re.sub('[^a-zA-z0-9\s]','',x)))

print(data[ data['Classification'] == 1].size)
print(data[ data['Classification'] == 0].size)

for idx,row in data.iterrows():
    row[1] = row[1].replace('rt',' ')

tokenizer = Tokenizer(split=' ')
tokenizer.fit_on_texts(data['Description'].values)

#vectorizer = CountVectorizer()
#X = vectorizer.fit_transform(jobSpec['Description']).toarray()

X = tokenizer.texts_to_sequences(data['Description'].values)
X = pad_sequences(X, maxlen=100, value=0.)

display(X.shape)

vocab_size = len(tokenizer.word_index) + 1
max_length = max([len(s.split()) for s in data['Description']])

Y = pd.get_dummies(data['Classification']).values
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.33, random_state = 42)


tokens_docs = [doc.split(" ") for doc in data['Description'].values]
all_tokens = itertools.chain.from_iterable(tokens_docs)
my_dict = {token: token if token.isdigit() else idx for idx, token in enumerate(set(all_tokens))}

print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)

raw_embedding = load_embedding('glove.6B.100d.txt')

embedding_vectors= get_weight_matrix(raw_embedding, tokenizer.word_index)
embedding_layer = Embedding(vocab_size, 100, weights=[embedding_vectors], input_length = X.shape[1], trainable=False, name="Embeddings")

display(embedding_vectors.shape)

# define model
sequence_input = Input(shape=(100,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(18)(x)
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
preds = Dense(2, activation='softmax')(x)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['acc'])


# input for which we need the embedding
input_str = "the company our client is a renowned civil actor who have consistently and safely delivered major civil infrastructure  ts across a"

# build index based on our `vocabulary`
word_to_idx = OrderedDict({w:all_tokens.index(w) for w in input_str.split() if w in all_tokens})

ynew = model.predict([1],[3],[5],[7])
display(ynew)

when I try to predict this model with new input:

ynew = model.predict([1],[3],[5],[7])
display(ynew)

it gives me an error message:

ValueError: Error when checking input: expected input_29 to have shape (100,) but got array with shape (1,) I have tried to change the shapes of the model to None and 1 but it gives me other new errors. I am quite new to the machine learning stuff so really not sure how to fix this one.

Any help will be appreciated

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
bileschicommented, Sep 17, 2018

Your shape error is because your model is expecting an input of size 100. You should pass in a tensor of size [batch_size, 100]

0reactions
sushiboocommented, Sep 21, 2018

The typical solution is to use an “Out of Vocabulary” (OOV) integer. Sometimes developers will use exactly one integer for their OOV tokens. Other times developers will use a hash function to map words to one of several “OOV buckets”

Okay, for now, I will use 0 for all of my Out of vocabulary Tokens. Thank you for your assistance.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Predicting a conv1d model in keras that takes index of words ...
I have fixed it by changing the input to the following: ... value=0) print(text) sentiment = model.predict(text)[0] display(sentiment).
Read more >
Keras: CNNs With Conv1D For Text Classification Tasks
This method tokenizes text examples and retrieves their token indexes from the vocabulary. We know that each text example has a different size ......
Read more >
Practical Text Classification With Python and Keras
The vocabulary in this case is a list of words that occurred in our text where each word has its own index. This...
Read more >
How to Use Word Embedding Layers for Deep Learning with ...
Now I want to use that model for input into Conv1D layers. Can you please tell me how to load the word2vec model...
Read more >
Text classification from scratch - Keras
Option 2: Apply it to the text dataset to obtain a dataset of word indices, then feed it into a model that expects...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found