Confusion matrix (I, II type errors)

Disclaimer: This terms applicable not only for Machine Learning

Lets talk about documents that we want to classify as relevant (mark as 1) or irrelevant (mark as 0). In this context does not necessary to specify classification method: it could be complex machine learning algorithms or just manual attached label. If we known actual label then we can compare our marks (known as predicted) with actual: compare actual and prediction.

If we got multiple documents than we can score (evaluate quality) our prediction mechanism with some sort of quality functions.

Predicted positive Predicted negative
Actual positive True Positive
False Negative
II type miss
Actual negative False Positive
I type false alarm
True Negative
correct rejection

Based on TP, FN, FP, TN we can evaluate prediction and recall:

Why accuracy is not all what you need? If we have imbalance classes like many positives and some negatives we can achieve high accuracy just predict positive class all the time.

Reduce I vs II type

All the time we should choose between I type "false alarm" and II type "miss".


Static code analysis. We want to detect bug in code expression so we can got "false alarm" (code is correct but tool is alarmed) / "hit" (correct but founded) / "miss" (missed bug).

Fraud detection. We want to detect fraud operation based on operation pattern.

Computer virus detection. We want to detect virus based on program activity.

Text search. We want to detect relevant documents. But in these case actually we want to found most relevant documents. That means that we want to use rank quality functions like @K (top K) for precition@K and accuracy@K.