What-If Tool: thresholded inference problem (confusion matrix/ROC)
See original GitHub issueVersion info:
- TensorBoard 1.12.0a0
- TensorFlow 1.8.0
- MacOS 10.13.6
- Python 2.7
Description: Running 2-class classification with a custom estimator results in incorrect confusion matrix/ROC curve values. When dragging the threshold slider, the “actual yes/no” percentages change (see screenshots). Other than that, when using a vocab file to specify the labels (“False”, “True”), the legend shows “False” and “undefined”. The inference scores seem to return correctly.
I would assume, even if my model would be incorrect, that the “actual” samples are unrelated to the threshold set.
Context: The classification API is used as
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: tf.estimator.export.ClassificationOutput(scores=softmax, classes=None)
with softmax a (?,2)
-shaped Tensor. This leads to the following signature:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: Placeholder:0
The given SavedModel SignatureDef contains the following output(s):
outputs['scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: softmax/Reshape_1:0
Method name is: tensorflow/serving/classify
The ground truth is specified via a numeric integer value in [0,1] (about 97% 0 and 3% 1).
Inference result as shown in the datapoint editor:
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:11 (5 by maintainers)
Top GitHub Comments
Thanks for the doc reference. I created #1471 to have the what-if tool handle empty labels by using indices for the class labels, which seems to do the right thing for your case.
@reinhouthooft Thanks for the bug report. Would you be willing to provide the saved model and a tf record file of examples for me to reproduce the problem with? Or is the model and/or data not for sharing?