Accuracy and stability of the model parameters
The purpose of the model that we have created is to give the desired results that we had obtained during the time of development i.e. it is able to accurately predict the target variable for which the model was built in the first place.
Therefore, we need to check this on another sample of data. This can be dome on both out of time sample data and in-time sample data.
The objective remains the same –
- If the same variables and picked
- If the value of the parameter estimates are in the same range.
If our model output from the in-time/out of time data picks the same final variable we say that our model is accurate and are significant. However in case if it does not pick those variables we need to go back to the development data, drop the old variables and pick significant ones.
If the variable selected have the parameter estimates in the same range we say our variables are stable. Unstable variables will not give the same result as that of the development phase. Hence making the final variable redundant, we need to drop such variables and pick those that have stability.
Now, while developing the variable if we get an equation like -Assuming our target variable is the probability of default of a loan
Target_var = 1.25 – 1.25*income + 0.65*no_of_dependants + 1.95*Debt_service_ratio +1.2*gender
And on validating the model on an out of sample data we get-
Target_var = 1.25 – 1.43*income + 4.68*no_of_dependants + 1.75*Debt_service_ratio
The important points to note are-
- Gender variable is not a significant variable as it did not get selected with the Out of sample data.
- Income and Debt service ratio are significant and stable variavles.
- No of dependents is not a stable variable, given its parameters jumped from 0.65 to 4.25 with the change in data.
Therefore, we must go back to the development phase, drop the 2 variables and select significant variables.
Such that our model is able to generate significant variable with the time change.