Finding the precision and auc scores.
See original GitHub issueI am building a recommendation model for user-article dataset where each interaction is represented by 1.
model = LightFM(loss='warp', item_alpha=ITEM_ALPHA, user_alpha=USER_ALPHA, no_components=NUM_COMPONENTS, learning_rate=LEARNING_RATE, learning_schedule=LEARNING_SCHEDULE)
model = model.fit(train, item_features=itemf, user_features=uf, epochs=NUM_EPOCHS, num_threads=NUM_THREADS)
print("train shape: ",train.shape) print("test shape: ",test.shape)
train shape: (25900, 790) test shape: (25900, 790)
My predict model looks like this:
predictions = model.predict( user_id, pid_array, user_features=uf, item_features=itemf, num_threads=4)
where pid_array are indexes of number of items
train_precision = precision_at_k(model, train, k=10).mean()
I am trying to predict the precision and subsequently want auc score also. But I get this error.
Traceback (most recent call last): File “new_light_fm.py”, line 366, in <module> train_precision = precision_at_k(model, train, k=10).mean() File “/home/nt/anaconda3/lib/python3.6/site-packages/lightfm/evaluation.py”, line 69, in precision_at_k check_intersections=check_intersections, File “/home/nt/anaconda3/lib/python3.6/site-packages/lightfm/lightfm.py”, line 807, in predict_rank raise ValueError(‘Incorrect number of features in item_features’) ValueError: Incorrect number of features in item_features
Issue Analytics
- State:
- Created 5 years ago
- Comments:12
Top GitHub Comments
There is no problem to do so. You build one dataset to which you fit all your users, user-features, items and item-features by calling
fit
andfit_partial
. Similarly you callbuild_user_features
andbuild_item_features
to build the feature matrices for all users and items. Next you callbuild_interactions
twice, once with the interactions of the first user group (1-11 months) to get the test interaction matrix, and the second time with the interactions of the second user group (12th month) to get the test matrix.I have a sparse matrix(train/test data) of shape (1407580, 235061), which means there are around 330Bn combinations of user_id and item_id. This is causing precision_at_k and others to take way too much time to calculate. I am thinking about calculating the precision at k only for a small set of data by writing code myself. Will this be good enough for model validation?