Exact Logistic Regression is used while modeling binary outcome in which log-odds of the outcome are modeled as a linear combination of independent variables. Exact Logistic regression can be used while sample size is small for the regular logistic regression. Maximum Likelihood estimation of the logistic model is well-known to suffer from small-sample bias. The degree of bias is strongly dependent on the number of cases in the less frequent of the two categories (events\non-events).
Suppose that we are interested in the factors that influence whether or not a high school senior is admitted into a very competitive engineering school. The outcome variable is binary (0/1): admit or not admit. The predictor variables of interest include student gender and whether or not the student took Advanced Placement calculus in high school. Because the response variable is binary, we need to use a model that handles 0/1 outcome variables correctly. Also, because of the number of students involved is small, we will need a procedure that can perform the estimation with a small sample size.
The data for this exact logistic data analysis include the number of students admitted, the total number of applicants broken down by gender (the variable female), and whether or not they had taken AP calculus (the variable apcalc). Since the dataset is so small, we will read it in directly.
A closer look into data would give the following insights. Of those, 15 were admitted and 15 were denied admission. There were 18 male and 12 female applicants. Sixteen of the applicants had taken AP calculus and 14 had not. Note that all of the females who took AP calculus were admitted, versus only about half the males.
Analysis methods you might consider
Below is a list of some analysis methods you may have encountered. Some of the methods listed are quite reasonable, while others have either fallen out of favor or have limitations.
- Exact logistic regression – This technique is appropriate because the outcome variable is binary, the sample size is small, and some cells are empty.
- Regular logistic regression– Due to the small sample size and the presence of cells with no subjects, regular logistic regression is not advisable, and it might not even be estimable.
- Two-way contingency tables– You may need to use the fisher or exact with proc freq option to get the Fisher’s exact test due to small expected values.
Using the exact logistic model
Let’s run the exact logistic analysis using proc logistic with the exact statement. We will include the option estimate = both on the exact statement so that we obtain both the point estimates and the odds ratios in the output. We will also need to use the freq statement, for which we will specify the frequency weight variable num.
In the output window we will get estimates both for regular logistic regression and estimates for exact logistic regression.
Regular Logistic regression
Estimates for Exact Option:
Things to consider
- Exact logistic regression is a very memory intensive procedure, and it is relatively easy to exceed the memory capacity of a given computer.
- Firth logit may be helpful if you have separation in your data. You can use the firth option on the model statement to run a Firth logit. This option was added in SAS version 9.2.
Exact logistic regression is an alternative to conditional logistic regression if you have stratification, since both conditions on the number of positive outcomes within each stratum. The estimates from these two analyses will be different because conditional logit conditions only on the intercept term, while exact logistic regression conditions on the sufficient statistics of the other regression parameters as well as the intercept term.