Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Predicting a conv1d model in keras that takes index of words in a sentence as input

See original GitHub issue

So I have prepared a sentiment analysis model and I am trying to predict it with new input but I am faced with an error:

data = getFile('Cleaned Data.xlsx')

data['Description'] = data['Description'].apply(lambda x: x.lower())
data['Description'] = data['Description'].apply((lambda x: re.sub('[^a-zA-z0-9\s]','',x)))

print(data[ data['Classification'] == 1].size)
print(data[ data['Classification'] == 0].size)

for idx,row in data.iterrows():
    row[1] = row[1].replace('rt',' ')

tokenizer = Tokenizer(split=' ')
tokenizer.fit_on_texts(data['Description'].values)

#vectorizer = CountVectorizer()
#X = vectorizer.fit_transform(jobSpec['Description']).toarray()

X = tokenizer.texts_to_sequences(data['Description'].values)
X = pad_sequences(X, maxlen=100, value=0.)

display(X.shape)

vocab_size = len(tokenizer.word_index) + 1
max_length = max([len(s.split()) for s in data['Description']])

Y = pd.get_dummies(data['Classification']).values
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.33, random_state = 42)


tokens_docs = [doc.split(" ") for doc in data['Description'].values]
all_tokens = itertools.chain.from_iterable(tokens_docs)
my_dict = {token: token if token.isdigit() else idx for idx, token in enumerate(set(all_tokens))}

print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)

raw_embedding = load_embedding('glove.6B.100d.txt')

embedding_vectors= get_weight_matrix(raw_embedding, tokenizer.word_index)
embedding_layer = Embedding(vocab_size, 100, weights=[embedding_vectors], input_length = X.shape[1], trainable=False, name="Embeddings")

display(embedding_vectors.shape)

# define model
sequence_input = Input(shape=(100,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(18)(x)
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
preds = Dense(2, activation='softmax')(x)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['acc'])


# input for which we need the embedding
input_str = "the company our client is a renowned civil actor who have consistently and safely delivered major civil infrastructure  ts across a"

# build index based on our `vocabulary`
word_to_idx = OrderedDict({w:all_tokens.index(w) for w in input_str.split() if w in all_tokens})

ynew = model.predict([1],[3],[5],[7])
display(ynew)

when I try to predict this model with new input:

ynew = model.predict([1],[3],[5],[7])
display(ynew)

it gives me an error message:

ValueError: Error when checking input: expected input_29 to have shape (100,) but got array with shape (1,) I have tried to change the shapes of the model to None and 1 but it gives me other new errors. I am quite new to the machine learning stuff so really not sure how to fix this one.

Any help will be appreciated

Issue Analytics

State:
Created 5 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

bileschicommented, Sep 17, 2018

Your shape error is because your model is expecting an input of size 100. You should pass in a tensor of size [batch_size, 100]

0reactions

sushiboocommented, Sep 21, 2018

The typical solution is to use an “Out of Vocabulary” (OOV) integer. Sometimes developers will use exactly one integer for their OOV tokens. Other times developers will use a hash function to map words to one of several “OOV buckets”

Okay, for now, I will use 0 for all of my Out of vocabulary Tokens. Thank you for your assistance.

Top Results From Across the Web

Predicting a conv1d model in keras that takes index of words ...

I have fixed it by changing the input to the following: ... value=0) print(text) sentiment = model.predict(text)[0] display(sentiment).

Keras: CNNs With Conv1D For Text Classification Tasks

This method tokenizes text examples and retrieves their token indexes from the vocabulary. We know that each text example has a different size ......

Practical Text Classification With Python and Keras

The vocabulary in this case is a list of words that occurred in our text where each word has its own index. This...

How to Use Word Embedding Layers for Deep Learning with ...

Now I want to use that model for input into Conv1D layers. Can you please tell me how to load the word2vec model...

Text classification from scratch - Keras

Option 2: Apply it to the text dataset to obtain a dataset of word indices, then feed it into a model that expects...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Predicting a conv1d model in keras that takes index of words in a sentence as input

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

keras-applications required==1.0.4 rather than >=

backend argmax has none for gradients. Can you even define one?