Lets talk about documents that we want to classify as
relevant (mark as 1) or
irrelevant (mark as 0).
In this context does not necessary to specify classification method: it could be complex machine learning
algorithms or just manual attached label. If we known
actual label then we can compare our
marks (known as
predicted) with actual: compare
If we got multiple documents than we can score (evaluate quality) our prediction mechanism with some sort of quality functions.
|Predicted positive||Predicted negative|
II type miss
I type false alarm
Based on TP, FN, FP, TN we can evaluate
Why accuracy is not all what you need? If we have imbalance classes like many positives and some negatives we can achieve high accuracy just predict positive class all the time.
All the time we should choose between I type "false alarm" and II type "miss".
Static code analysis. We want to detect bug in code expression so we can got "false alarm" (code is correct but tool is alarmed) / "hit" (correct but founded) / "miss" (missed bug).
Fraud detection. We want to detect fraud operation based on operation pattern.
Computer virus detection. We want to detect virus based on program activity.
Text search. We want to detect relevant documents. But in these case actually we want to found most relevant documents. That means that we want to use rank quality functions like @K (top K) for precition@K and accuracy@K.