Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Confusion Matrix Representation / Return Value

See original GitHub issue

Describe the workflow you want to enable

An enhancement to the output of confusion matrix function, better representing the true and predicted values for multilevel classes.

i.e. Current Representation with code: from sklearn.metrics import confusion_matrix y_true = ["cat", "ant", "cat", "cat", "ant", "bird"] y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"] confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"]) Output: array([[2, 0, 0], [0, 0, 1], [1, 0, 2]])

Describe your proposed solution

When you have multiple levels you can have difficulty reading the ndarray, associating the levels with the True and Predicted values.

Proposed solution should look similar to the table below, providing better readability of the confusion matrix.

		*Predicted*	*Value*
	*Levels*	ant	bird	cat
*True*	ant	2	0	0
*Value*	bird	0	0	1
	cat	1	0	2

Possible Solutions:

Provide a parameter to prettyprint the matrix. printMatrix [type:bool]
Include another parameter to return ndarray, index as true_values, columns as predicted_values For example: cm = confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"]) index=["true:ant", "true:bird", "true:cat"] columns=["pred:ant", "pred:bird", "pred:cat"] return cm, index, columns Which can be easily converted into a dataframe for further use

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:18 (10 by maintainers)

Top GitHub Comments

2reactions

shubhamdocommented, Jan 3, 2021

Returning tuples as keys within the nested dict work well with casting to DataFrame, please refer below dict structure representing the same.

{('pred', 'ant'): {('true', 'ant'): 2, ('true', 'bird'): 0, ('true', 'cat'): 1},
('pred', 'bird'): {('true', 'ant'): 0, ('true', 'bird'): 0, ('true', 'cat'): 0}, 
('pred', 'cat'): {('true', 'ant'): 0, ('true', 'bird'): 1, ('true', 'cat'): 2}}

Screenshot: nested_dict_w_tupl_key

Another Option: Flat Dict Here, I think the problem with flat dict option is the casting the dataframe will need extra steps. If we need one step conversion then the nested dict with tuples as keys is good as seen above.

Flat Dict:

{('pred_ant', 'true_ant'): 2, ('pred_ant', 'true_bird'): 0, ('pred_ant', 'true_cat'): 1,
 ('pred_bird', 'true_ant'): 0, ('pred_bird', 'true_bird'): 0, ('pred_bird', 'true_cat'): 0,
 ('pred_cat', 'true_ant'): 0, ('pred_cat', 'true_bird'): 1, ('pred_cat', 'true_cat'): 2}

Screenshot: flat_dict

I think nested dict with tuples as key represent the data very well preserving class names and easy conversion to dataframe. Please let me know your thoughts, I’ll make the changes accordingly. Thanks!

2reactions

shubhamdocommented, Dec 30, 2020

@jnothman @glemaitre taking your comments in due consideration, I’ve added another change so that it returns an output in a default data type i.e. dict() Eliminating need for hard dependencies, increasing usability with any other 3rd party libs too.

Please refer to the example code and screenshots. Thanks!

Example: from sklearn.metrics._classification import confusion_matrix y_true = [“cat”, “ant”, “cat”, “cat”, “ant”, “bird”] y_pred = [“ant”, “ant”, “cat”, “cat”, “ant”, “cat”] cm = confusion_matrix(y_true, y_pred, labels=[“ant”, “bird”, “cat”], pprint=True)

Code: Output as a dict()

    if pprint:
        labelList = labels.tolist()

        cm_lol = cm.tolist()
        cm_dict = {str(labelList[j]): {str(labelList[i]): cm_lol[i][j] for i in
                                                 range(0, len(labelList))} for j in range(0, len(cm_lol))}

        return cm_dict

Output w/o pprint(False):

array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]], dtype=int64)

Output w/ pprint(True):

{'ant': {'ant': 2, 'bird': 0, 'cat': 1}, 
'bird': {'ant': 0, 'bird': 0, 'cat': 0},
 'cat': {'ant': 0, 'bird': 1, 'cat': 2}}

For this solution changes required:

def confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None,
                     normalize=None, pprint=False):
....... 
 if pprint: # Logic
      return cm_dict #As the suggested output
.....

return cm

Option 2: For better understanding the true and pred values.

    if pprint:
        labelList = labels.tolist()

        cm_lol = cm.tolist()
        cm_dict = {"pred_" + str(labelList[j]): {"true_" + str(labelList[i]): cm_lol[i][j] for i in
                                                 range(0, len(labelList))} for j in range(0, len(cm_lol))}

        return cm_dict

Output:

{'pred_ant': {'true_ant': 2, 'true_bird': 0, 'true_cat': 1},
 'pred_bird': {'true_ant': 0, 'true_bird': 0, 'true_cat': 0}, 
'pred_cat': {'true_ant': 0, 'true_bird': 1, 'true_cat': 2}}

Top Results From Across the Web

Confusion Matrix for Machine Learning - Analytics Vidhya

Sklearn confusion_matrix() returns the values of the Confusion matrix. The output is, however, slightly different from what we have studied so ...

Understanding Confusion Matrix | by Sarang Narkhede

It is a table with 4 different combinations of predicted and actual values. It is extremely useful for measuring Recall, Precision, Specificity, Accuracy,...

Confusion Matrix: How To Use It & Interpret Results [Examples]

A confusion matrix is used for evaluating the performance of a machine learning model. Learn how to interpret it to assess your model's ......

What is a Confusion Matrix in Machine Learning

Given a list of expected values and a list of predictions from your machine learning model, the confusionMatrix() function will calculate a ...

Confusion Matrix - an overview | ScienceDirect Topics

Confusion matrices represent counts from predicted and actual values. The output “TN” stands for True Negative which shows the number of negative examples ......