Predicting a conv1d model in keras that takes index of words in a sentence as input
See original GitHub issueSo I have prepared a sentiment analysis model and I am trying to predict it with new input but I am faced with an error:
data = getFile('Cleaned Data.xlsx')
data['Description'] = data['Description'].apply(lambda x: x.lower())
data['Description'] = data['Description'].apply((lambda x: re.sub('[^a-zA-z0-9\s]','',x)))
print(data[ data['Classification'] == 1].size)
print(data[ data['Classification'] == 0].size)
for idx,row in data.iterrows():
row[1] = row[1].replace('rt',' ')
tokenizer = Tokenizer(split=' ')
tokenizer.fit_on_texts(data['Description'].values)
#vectorizer = CountVectorizer()
#X = vectorizer.fit_transform(jobSpec['Description']).toarray()
X = tokenizer.texts_to_sequences(data['Description'].values)
X = pad_sequences(X, maxlen=100, value=0.)
display(X.shape)
vocab_size = len(tokenizer.word_index) + 1
max_length = max([len(s.split()) for s in data['Description']])
Y = pd.get_dummies(data['Classification']).values
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.33, random_state = 42)
tokens_docs = [doc.split(" ") for doc in data['Description'].values]
all_tokens = itertools.chain.from_iterable(tokens_docs)
my_dict = {token: token if token.isdigit() else idx for idx, token in enumerate(set(all_tokens))}
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)
raw_embedding = load_embedding('glove.6B.100d.txt')
embedding_vectors= get_weight_matrix(raw_embedding, tokenizer.word_index)
embedding_layer = Embedding(vocab_size, 100, weights=[embedding_vectors], input_length = X.shape[1], trainable=False, name="Embeddings")
display(embedding_vectors.shape)
# define model
sequence_input = Input(shape=(100,), dtype='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation='relu')(embedded_sequences)
x = MaxPooling1D(2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(2)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(18)(x)
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
preds = Dense(2, activation='softmax')(x)
model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['acc'])
# input for which we need the embedding
input_str = "the company our client is a renowned civil actor who have consistently and safely delivered major civil infrastructure ts across a"
# build index based on our `vocabulary`
word_to_idx = OrderedDict({w:all_tokens.index(w) for w in input_str.split() if w in all_tokens})
ynew = model.predict([1],[3],[5],[7])
display(ynew)
when I try to predict this model with new input:
ynew = model.predict([1],[3],[5],[7])
display(ynew)
it gives me an error message:
ValueError: Error when checking input: expected input_29 to have shape (100,) but got array with shape (1,)
I have tried to change the shapes of the model to None and 1 but it gives me other new errors. I am quite new to the machine learning stuff so really not sure how to fix this one.
Any help will be appreciated
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Predicting a conv1d model in keras that takes index of words ...
I have fixed it by changing the input to the following: ... value=0) print(text) sentiment = model.predict(text)[0] display(sentiment).
Read more >Keras: CNNs With Conv1D For Text Classification Tasks
This method tokenizes text examples and retrieves their token indexes from the vocabulary. We know that each text example has a different size ......
Read more >Practical Text Classification With Python and Keras
The vocabulary in this case is a list of words that occurred in our text where each word has its own index. This...
Read more >How to Use Word Embedding Layers for Deep Learning with ...
Now I want to use that model for input into Conv1D layers. Can you please tell me how to load the word2vec model...
Read more >Text classification from scratch - Keras
Option 2: Apply it to the text dataset to obtain a dataset of word indices, then feed it into a model that expects...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Your shape error is because your model is expecting an input of size 100. You should pass in a tensor of size [batch_size, 100]
Okay, for now, I will use 0 for all of my Out of vocabulary Tokens. Thank you for your assistance.