Support vector machines are one of the most popular machine learning algorithms. SVM`s comes under the categories of Supervised learning. SVM`s are relatively easy to understand and require little or no prior knowledge of statistics or linear algebra.

A Support Vector Machine classifier formally defined by a hyper plane which separates different groups /classes within data. In two dimensional planes Hyper plane is a line dividing a plane into two parts where in each class lay on either side.

**Support Vector Machine Logic**:

Support Vector Machines are based on the logic of Hyper Planes we can call them decision planes which acts as a boundary between classes. A decision plane separates set of objects having different class membership. Below example will help to understand the logic in two dimensional spaces.

In this example, the objects belong either to class GREEN or RED. The separating line defines a boundary on the right side of which all objects are GREEN and to the left of which all objects are RED. Any new object (white circle) falling to the right is labeled, i.e., classified, as GREEN (or classified as RED should it fall to the left of the separating line).

The SVM shown above is a linear classifier. But in real world problems we might need to develop logic for more complex data structures. This situation is depicted in the illustration below. Compared to the previous schematic, it is clear that a full separation of the GREEN and RED objects would require a curve (which is more complex than a line). Classification tasks based on drawing separating lines to distinguish between objects of different class memberships are known as hyper plane classifiers. Support Vector Machines are particularly suited to handle such tasks.

**How to identify correct Hyper Plane.**

**Scenario-1:**

In the above diagram we have three hyper planes. Important to Remember “Always select hyper plane which segregates classes the best”. In the above diagram we can clearly see Hyper Plane B does the best job while classifying different objects.

**Scenario-2:**

In the above diagram we have three hyper planes namely A, B, C. Though all three planes classifying or able to segregate the objects correctly. But in case of A and B a slight change in data could result into miss classification of objects as A and B lying very close to the object or we can say the margin is very small. So choose a Hyper Plane with maximum distance from the nearest data point. This distance is called margin.

In the above diagram we can see, Hyper Plane C has the maximum Margin in comparison to Plane A and B. So we can say Plane c is the right Hyper Plane.

**Scenario 3:**** **

In the above diagram we might choose Plane B as it has the maximum Margin, but plane B classifying one of the objects incorrectly. So plane A should be the right option. SVM selects the planes which correctly classify the objects before trying to maximize the margin.

**Scenario 4:**

** **

In the above diagram, we can see a linear classifier would not be able to segregate all the objects, as one of the star lies in the territory of circles. SVM treats such cases as outliers and try to ignore those cases.** **

**Scenario 5:**

** ****In the above scenario linear classifier cannot segregate the classes. SVM’s has a logic of Kernel trick, which converts two dimensional space into higher dimensional space** i.e. it converts not separable problem to separable problem, these functions are called kernels. It is mostly useful in non-linear separation problem. Simply put, it does some extremely complex data transformations, then find out the process to separate the data based on the labels or outputs you’ve defined.

In the above diagram we have a two dimensional space. SVM’s can convert into higher dimensional space e.g. we can have a third dimension z=x^2+y^2. Now we can plot data points on with Z as one of the axis.

In above plot we can easily notice two key points:

- All values for z would be positive because z is the squared sum of both x and y
- In the original plot, red circles appear close to the origin of x and y axes, leading to lower value of z and star relatively away from the origin result to higher value of z