Credit scoring models have offered benefits to lenders and borrowers for many years. However, in practice these models are normally built on a sample of accepted applicants and fail to consider the remaining rejected applicants. This may cause a sample bias which is an important statistical issue, especially in the online lending situation where a large proportion of requests are rejected.

Reject inference is a method for inferring how rejected applicants would have behaved if they had been granted and incorporating this information in rebuilding a more accurate credit scoring system.

**Introduction: **Reject inference is a process whereby the performance of the previously rejected applications is estimated. Reject inference serves the bank while approval rate is very low and banks are keen to relax the credit issuing criteria. By inferring performance of rejected applications impact of decision to reject the application can be studied. If approval rate is already high then Reject inference is not of much use for banks.

**Bias in Data**: While developing statistical models; one of the assumptions is that characteristics of development sample will be same as that of population on which model will be applied. This can be true while developing Collection models because collection models are built on approved population and implemented on population with same characteristics; whereas acquisition models are built on approved application and implemented on the entire population. Acquisition models suffer bias because as development sample is not true representative of entire population.

**Methods of Reject inference: **Several Reject inference methods are available to include the include the performance of rejected applications and make acquisition models applicable for the entire through the door applications.

- Hard Cut Off
- Single Weighted Approach
- Double Weighted Approach
- Augmentation

**Hard Cut Off**: One of the simplest approaches is hard cut off; wherein a score cut off is and any rejected applicant with a score below this cutoff is classified as a ‘bad’, and any applicant above the cutoff is classified as a ‘good’. The cutoff score is not arbitrary. A suggested starting is around the current accept/reject cutoff, with a safety factor, for the portfolio whose risk is being modeled. A safety factor is applied since business considerations dictate that the rejected population cannot carry the same risk as that of accepts.

**Single Weighted approach**: Single weighted approach consists of two steps; instead of classifying reject into good or bad in this approach rejected applications are classified as partial good or partial bad, with a weight assigned.

**Double Weighted Approach**: Double Weighted approach is a three step process.

- A logistic regression model is built first on all the approved applications to calculate the probability of default. All the rejected applications are scored based on this model to get probability of default of rejected applications.
- Another accept/reject model is built on the combined population of rejected and accepted applications. This will give Probability of approve or approval rate.
- Weights are assigned to all the applications; all the approved applications are given weight equal to 1; whereas each of the rejected application appears twice in the model dataset. Once as bad with weight wt_1 and then as good with weight as wt_2.

**Augmentation**: In this method rejects are togged as good or bad; instead approved applications are re-weighted to give rejected applications a similar chance of getting approved.

In Augmentation methodology; a simple accept/reject model is built to calculate the approval rate. The scored dataset from this model is divided into 10 equal bands with data being sorted based on probability of approval.

Approved applications in each band are assigned a weight which will be inversely proportional to average approval rate of that particular band. A final Logistic model; Know good/bad (KGB) is built on this augmented dataset with weights assigned to each application. Score bands with low approval rate will be assigned higher weights whereas bands with high approval rate will be assigned low weights.