Logistic regression comes into picture in instances where the dependent variable is binary. Example: Default/ Non-default, like/dislike, yes/no , 1/0 etc.
In the logistic model our main goal is to model the probability “P” that one of the two outcomes occurs. A Logistic function can be written as:
Now, as we know that the value of probability “P” varies from 0 to 1, the value of ln[p/(1-p)] varies from +∝ to -∝ .
The logistic function is the natural log of odds(Y) that equals various factors (example in this case- X1, X2). Speaking in a layman’s language it can be defined as a function used to calculate the probability of outcome (in our case- Default/ Non-default, like/dislike, yes/no, 1/0) based on the independent factors.
Now, let us understand this with the help of an example from the banking industry and how can we use this in our business:
Business Problem: We have a population of young individuals (age < 36) who are the current customers of the bank (i.e. they have taken a loan from the bank) and have applied for an extra loan, given the information available with bank, should the bank give the extra loan applied for or not?
Available data: bank has 2 kinds of data available with them for the existing customers.
- Application data – which would comprise of- gender, marital status, income, time spend with the employer, age of the applicant, Debt to service ratio, other loans etc.
- Behavioral data – which would comprise of the recent performance of the applicant i.e. existing loan amount, how many times the person has missed the payment in the last 18 months, has the applicant ever defaulted etc.
Once you have the data available with you, we can start with the building process of the model. Probability of default being the event in this case (Y = 1)
Steps to create a Logistics Regression are covered in a different module.