Duplicates found in y_true list
See original GitHub issueCurrent Version : 0.4.0
In function print_metrics_ranking, I have added in the lines:
rec_list = pd.DataFrame.from_dict(y_reco_list, orient="index").reset_index()
rec_list.columns = ['user', 'rec_1','rec_2','rec_3', 'rec_4','rec_5','rec_6', 'rec_7','rec_8','rec_9', 'rec_10']
true_list = pd.DataFrame.from_dict(y_true_list, orient="index").reset_index()
true_list.columns = ['user', 'true_1', 'true_2', 'true_3' ,'true_4']
print(true_list)
print(rec_list)
I have noticed the following from the printed dataframe:
- In y_true_list, there can be more than more item and in one example the item number is repeated. An example output is shown below: 9089:array(‘I’, [7, 7, 7]) 9023: array(‘I’, [17, 14]) Is it possible to have more than one item per user and have duplicate items for a specific user?
A full version of the updated function can be found below:
def print_metrics_ranking(self,df,metrics, y_prob=None, y_true=None, y_reco_list=None,
y_true_list=None, users=None, k=10, train=True):
#global eval_df
#global v
eval_df=pd.DataFrame()
if train:
for m in metrics:
if m in ["log_loss", "loss"]:
log_loss_ = log_loss(y_true, y_prob, eps=1e-7)
print(f"\t train log_loss: {log_loss_:.4f}")
eval_df['log_loss']=[log_loss_]
else:
for m in metrics:
if m in ["log_loss", "loss"]:
log_loss_ = log_loss(y_true, y_prob, eps=1e-7)
print(f"\t eval log_loss: {log_loss_:.4f}")
eval_df['log_loss_']=[log_loss_]
elif m == "balanced_accuracy":
y_pred = np.round(y_prob)
accuracy = balanced_accuracy_score(y_true, y_pred)
print(f"\t eval balanced accuracy: {accuracy:.4f}")
eval_df['accuracy']=[round(accuracy,4)]
elif m == "roc_auc":
roc_auc = roc_auc_score(y_true, y_prob)
print(f"\t eval roc_auc: {roc_auc:.4f}")
eval_df['roc_auc']=[round(roc_auc,4)]
elif m == "pr_auc":
precision, recall, _ = precision_recall_curve(y_true, y_prob)
pr_auc = auc(recall, precision)
eval_df['pr_auc']=[round(pr_auc,4)]
elif m == "precision":
precision_all = precision_at_k(y_true_list, y_reco_list,
users, k)
print(f"\t eval precision@{k}: {precision_all:.4f}")
eval_df['precision_all']=[round(precision_all,4)]
elif m == "recall":
recall_all = recall_at_k(y_true_list, y_reco_list, users, k)
print(f"\t eval recall@{k}: {recall_all:.4f}")
eval_df['recall_all']=[round(recall_all,4)]
elif m == "map":
map_all = map_at_k(y_true_list, y_reco_list, users, k)
print(f"\t eval map@{k}: {map_all:.4f}")
eval_df['map_all']=[round(map_all,4)]
elif m == "ndcg":
ndcg_all = ndcg_at_k(y_true_list, y_reco_list, users, k)
print(f"\t eval ndcg@{k}: {ndcg_all:.4f}")
eval_df['ndcg_all']=[round(ndcg_all,4)]
#print(y_true_list)
df = pd.concat([df,eval_df],ignore_index=True)
#print(test_results_df)
rec_list = pd.DataFrame.from_dict(y_reco_list, orient="index").reset_index()
rec_list.columns = ['user', 'rec_1','rec_2','rec_3', 'rec_4','rec_5','rec_6', 'rec_7','rec_8','rec_9', 'rec_10']
true_list = pd.DataFrame.from_dict(y_true_list, orient="index").reset_index()
true_list.columns = ['user', 'true_1', 'true_2', 'true_3' ,'true_4', 'true_5', 'true_6', 'true_7' ,'true_8', 'true_9', 'true_10', 'true_11']
self.rec_results = pd.concat([self.rec_results ,rec_list],ignore_index=True)
self.true_results = pd.concat([self.true_results ,true_list],ignore_index=True)
return df,self.true_results,self.rec_results
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
How to find duplicates from a list in Python - Educative.io
If an integer or string or any items in a list are repeated more than one time, they are duplicates. Example. Consider a...
Read more >How To Find Duplicates in a Python List - Intellisoft Training
It's easy to find duplicates in a Python list. ... that will check if duplicate items exist, and will return a True or...
Read more >How do I find the duplicates in a list and create another list ...
Explanation: Here We create two empty lists, to start with. Then keep traversing through all the elements of the list, to see if...
Read more >Check if the list contains duplicate elements in Python
This article describes how to check if there are duplicate elements (= if all elements are unique) in a list in Python for...
Read more >How to find duplicates in Excel: identify, highlight, count, filter
The tutorial explains how to search for duplicates in Excel. You will learn a few formulas to identify duplicate values or find duplicate...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve released version
0.6.0
. See the updated User Guide.y_true_list is just all of the items that an user consumed in the training data, so it is natural to contain duplicate items.