Weight of Evidence: tells us the how well a variable can differentiate between the good or non-event (Y=0) and bad or event (Y=1). It is calculated using the formula-
Information Value: tells us how much information or importance does the independent variable holds. It can be calculated using the formula-
Since Information value gives us the importance of a variable it is a very important tool that can be used for variable reduction i.e. we can remove those variable form our analysis which have a very low IV score. For example- variables where IV < 0.02 can be removed.
Calculating WOE and IV:
In order to calculate WOE and IV you need to divide the data into groups/ranks (in this case we divide the data into 10 groups) this can be done by using the following code in SAS.
Fine classing: Once the data is divided in deciles (ranked). Calculate the good and bad in each decile. We need to summarize the data in the following format such that we are able to get the maximum information. Also, note the data need to be sorted before ranking.
Coarse classing: combining the deciles with similar WoE scores. Continuing with example
We can combine them as follows:
Once the coarse classing is done the variable values are imputed with the WOE values with are further used in logistics regression. Which is covered in a different module.