Boosting and Bagging are ensemble learning techniques, where a set of multiple learners are trained to solve the same statistical problem which results into better performance than using a single learner.
What is Ensemble Learning? Ensemble learning is a machine learning concept, where multiple models are trained using same algorithm. The basic principle behind ensemble learning is that multiple weak learners are grouped together to form a strong learner.
For Example, most of the marketing strategies use decision tree to segment the data in different categories and come up with final target population. Decision trees are known to suffer from bias and variance. With single decision tree we might have large variance and bias.
But if we run decision tree algorithm multiple times and every time we choose a random sample data with replacement. We would get several outputs which can be combined using various model averaging techniques like weighted average, majority vote or simple average. Final result we get would tend to have low variance and bias.
The biggest problem with any statistical algorithm is noise, bias and variance. The Ensemble learning techniques minimizes these factors and improves the stability of machine learning algorithms.
Bootstrapping: Before we learn more about ensemble learning it is good to know a little bit about bootstrapping. In machine learning bootstrapping is referred as a random sampling with replacement. This allows the model or algorithm to get a better understanding of the various biases, variances and features that exist within the data. Taking a random sample of the data with replacement allows machine learning algorithm to understand different characteristics that exist within data than running just single model on the complete dataset.
Ensemble learning algorithms can further be divided into two categories:
Bagging: Bagging is a simple ensemble learning technique in which several independent models are combined using one of the model averaging techniques like weighted average, majority vote or simple average.
Since for each of the models a random sample is selected with replacement, so each model would study the slightly different inherent characteristics of the data and accordingly results would be different. Since, this technique takes many uncorrelated learners to make a final model; it reduces the error by reducing variance. Example of bagging ensemble is Random Forest models.
To summarize, we can say in bagging several independent predictors are combined to come up with a robust algorithm.
Boosting: Gradient Boosting is an ensemble learning technique in which models are not run independently or simultaneously but sequentially. So, in boosting output of the first model impacts the algorithm or data selected in the following model.
In gradient boosting, subsequent models learn from the errors of previous models. Therefore In boosting, each observation has unequal probability of getting selected in the subsequent models as observations with highest error have greater probability of getting selected. Since, in gradient boosting subsequent models learn from the errors of the previous models, it generally takes less time to come up with final predictions. One of the biggest problems with boosting is overfitting, so we should choose stopping criteria very wisely.
Criteria to use Bagging or Boosting: Both of the ensemble learning techniques has pros and cons and we cannot decide on outright winner of the two. Both Bagging and Boosting decrease the variance of your single estimate as they combine several estimates from different models. So the result may be a model with higher stability.
If single model has problem of very low accuracy then bagging might not help to reduce bias to satisfactory level. As several low performance model would result into slight improvement.
In this case boosting performs better, as during boosting subsequent models learns from the errors of previous models and thus result into enhanced performance of final output.
But if problem with algorithm is of overfitting, then bagging tends to give better results. As boosting tend to over fit data which is not the problem with bagging.