Would you believe an analyst who claims to build a model entirely in head which could identify possible defaulters in credit cards holders for a bank that too with very high accuracy of 99%? Well here is an example; simply label every card holder as a non-defaulter. By going with standard default rate for Indian banks of 1% we would easily achieve accuracy of 99%; we just labeled each customer as non-defaulter and our prediction would be incorrect for just 1% of possible defaulters.
This might sound impressive but Indian Banks would still like to build a model that could identify possible defaulters. Till now we understood accuracy of the model might not help us with best possible results. Data science world has any number of examples where for imbalanced data (biased data with very low percentage of one of the two possible categories) accuracy standalone cannot be considered as good measure of performance of classification models.
As we understood; labeling each credit card holder as non-defaulter will not help the bank instead we should build a model which is able to identify possible defaulters. This means for our model to identify possible defaulters the metric we should concentrate on is – True Positive Rate or Recall. The precise definition of True Positive rate is TRUE POSITIVE divided by sum of TRUE POSITIVE and FALSE NEGATIVE.
True Positives – True Positives are data points where our model predicts defaulter and customer actually defaults.
False Negative – False negative are the data points where our model predicts as non-defaulter but customer actually defaults.
True Positive Rate (also known as Recall or Sensitivity) is thought as models ability to identify all the data points of interest which in our case is possible defaulters.
Now another interesting point to the above example; if we label all the customers as defaulter then our recall or True Positive rate would be equal to 1; because we would identify all the defaulters with zero miss which means False negative will be zero.
Again labeling all the card holders as defaulters will not help the bank either. Our assumption of labeling each card holder as defaulter is suffering from low precision; in other words model’s ability to identify only the relevant data points in our example defaulters.
Precision: Precision is defined as number of TRUE POSITIVE divided by the sum of TRUE POSITIVE and FALSE POSITIVE.
False Positive: False Positives are the data points which are incorrectly identified as positive but actually those are negative. In our example False-Positives are those customers which were identified as defaulters by our model but actually customers were non-defaulters or good customers.
Till now we understood our first model of labeling all the customers as non-defaulters was not useful even though it had high accuracy but recall and precision both were zero as there were no defaulters so True Positives were zero.
Let us suppose we identified just one defaulter correctly; then our precision will be equal to 1 as False Positive is zero but Recall (True Positive Rate) will be very low as False-Negative will be high. If we go to other extreme of labeling each customer as defaulter will give us Recall ratio of 1 as False Negative will be zero but our Precision Ratio will be very low because of high number of False-Positive.
Which means if try increase Recall it will decrease Precision and vice-versa.
Adjusting Precision and Recall: In few cases either we need high recall or we need high precision but in most of the cases we have find an optimal combination of recall and precision. F1 score helps us to identify the best possible combination of precision and recall.
F1 score is the harmonic mean of precision and recall while considering both the metrics.
We use harmonic mean instead of simple average as harmonic mean takes care of extreme cases like for Recall ratio of 1 precision will we zero; in this case simple average will still give us F1 score of .5 but harmonic mean will give 0 in this case.
Visualizing Precision and Recall: We could visualize precision and recall with the help of either confusion matrix or ROC curve.
Confusion Matrix: Confusion matrix helps to easily calculate precision and recall ratios. A confusion matrix for binary classification gives four different outcomes: True Positive, True Negative, False Positive or False Negative.
So we can very easily calculate recall and precision using confusion matrix.
Receiver Operating Curve (ROC): ROC curve is plotted between True Positive Rate and False Positive Rates with TPR on Y-Axis and FPR on X-Axis.
A typical ROC curve is shown below.
Summary: While building classification model accuracy of the model should not be considered as the only metric to be looked into but we should also try to look into precision and recall ratio to build a good model.