[BUG] Different behavior in 0.1.8 and 0.2.1
See original GitHub issueEnvironment
- Colab.research.google.com
- Kashgari 1.1.8 / 0.2.1
Issue Description
Different behavior in 0.1.8 and 0.2.1 In Kashgari 0.1.8 BLSTModel converge in training process and I see val_acc: 0.98 and train_acc: 0.9594 In Kashgari 0.2.1 BLSTModel is overfitting and I see val_acc ~0.5 and train_acc ~0.96 There is no difference in my code, only different versions of library.
Reproduce
code:
from sklearn.model_selection import train_test_split
import pandas as pd
import nltk
from kashgari.tasks.classification import BLSTMModel
# get and process data
!wget https://www.dropbox.com/s/265kphxkijj1134/fontanka.zip
df1 = pd.read_csv('fontanka.zip')
df1.fillna(' ', inplace = True)
nltk.download('punkt')
# split on train/test
X_train, X_test, y_train, y_test = train_test_split(df1.full_text[:3570].values, df1.textrubric[:3570].values, test_size=0.2, random_state=42)
X_train = [nltk.word_tokenize(sentence) for sentence in X_train]
X_test = [nltk.word_tokenize(sentence) for sentence in X_test]
y_train = y_train.tolist()
y_test = y_test.tolist()
# train model
model = BLSTMModel()
model.fit(X_train, y_train, x_validate=X_test, y_validate=y_test, epochs = 10)
code in colab: https://colab.research.google.com/drive/1yTBMeiBl2y7-Yw0DS_vTn2A4y_Vj3N-8
Result
Last epoch:
Kashgari 0.1.8
Epoch 10/10 55/55 [==============================] - 90s 2s/step - loss: 0.1378 - acc: 0.9615 - val_loss: 0.0921 - val_acc: 0.9769
Kashgari 0.2.1
Epoch 10/10 44/44 [==============================] - 76s 2s/step - loss: 0.0990 - acc: 0.9751 - val_loss: 2.3739 - val_acc: 0.5323
Other Comment
In 0.2.1 all models now in different file and lr hyperparameter is given explicitly (1e-3) In 0.1.8 lr hyperparameter was omitted, I suppose that it used keras default, which is the same (1e-3)
Also in 0.1.8 you had (dense size = +1 classes on classifier) https://github.com/BrikerMan/Kashgari/issues/21 and ommited it in 0.2.1. I don’t see how this could affect training process.
I couldn’t find more differences between versions, could you help with this - why models began to overfit in new version of library?
Issue Analytics
- State:
- Created 5 years ago
- Comments:17 (5 by maintainers)
Top GitHub Comments
I have reproduced the problem. 0.2.1 does overfit in my dataset too. I have compared 0.1.8 with 0.2.1, we have changed BLSTMModel’s activation function from
sigmoid
tosoftmax
. Please try this with 0.2.1.Really sorry to hear that, I have tested 0.2.4 and tf.keras version on two datasets, all works just fine. Maybe it is a bug in 0.1.8.