Naïve Bayes is one of the most efficient Machine Learning Algorithms based on Bayes Theorem. Naïve Bayes Classifiers belong to the family of simple probabilistic classifiers with assumptions about the independence of each input variable.

**What is Bayes Theorem**: Bayes theorem is named after British Mathematician Thomas Bayes, helps to determine the conditional probability of an event. The word “conditional” is important as we try to ascertain the probability of occurrence of an even P(A) given the even B has already occurred.

**P(A|B)= ( P(A) * P(B|A) ) / ( P(B))**

**Naïve Bayes Algorithms**. Naïve Bayes is a conditional probability model, which describes how probability of an event is impacted by the prior knowledge of conditions that can impact the occurrence of the event.

For example, say a single card is drawn from a complete deck of 52 cards. The probability that the card is a king is 4 divided by 52, which equals 1/13 or approximately 7.69%. Remember that there are 4 kings in the deck. Now, suppose it is revealed that the selected card is a face card. The probability the selected card is a king, given it is a face card, is 4 divided by 12, or approximately 33.3%.The new information regarding selected card being a face card has increased the probability of king from7.69% to 33.3 percent.

Naïve Bayes Formula:

The equation used is:

** **

Where:

⦁ P(A) is the ⦁ prior probability or ⦁ marginal probability of A. It is “prior” in the sense that it does not take into account any information about B

⦁ P(A|B) is the ⦁ conditional probability of A, given B. It is also called the ⦁ posterior probability because it is derived from or depends upon the specified value of B.

⦁ P(B|A) is the conditional probability of B given A. It is also called the ⦁ likelihood.

⦁ P(B) is the prior or marginal probability of B, and acts as a ⦁ normalizing constant.

How Naïve Bayes algorithm Works?

Let us understand algorithm using an example. We have a training dataset of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather conditions. We can use the historical information regarding probability of play in a particular weather condition.

In the following figure we have three tables. First table has data regarding Weather and Play(Yes\No), frequency table tells how often we Play in different weather conditions and likelihood table gives the probability of play.

Problem: What is the probability the players will play if weather is sunny. We can answer this using above discussed method of conditional probability.

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.

What are the Pros and Cons of Naive Bayes?

**Pros:**

⦁ It is easy and fast to predict class of test data set. It also perform well in multi class prediction

⦁ When assumption of independence holds, a Naive Bayes classifier performs better compare to other models like logistic regression and you need less training data.

⦁ It performs well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).

**Cons:**

⦁ If categorical variable has a category (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.

⦁ Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

**Applications of Naive Bayes Algorithms**

⦁ Real time Prediction:

⦁ Multi class Prediction:

⦁ Text classification/ Spam Filtering/ Sentiment Analysis

⦁ Recommendation System