# Support Vector Machine: Regression

0
227 Support Vector Machine (SVM) is supervised machine learning algorithm which can either be used for regression or classification. As we know from previous topic SVM’s are based on the concept of decision planes that define the decision boundary separating the objects into different classes.

Support Vector Regression is a Generalization of SVM into Regression problems. SVR’s are supervised learning algorithms which require a training dataset including the target variable to develop an algorithm which can be used to build a model. SVR’s are different from SVM(Classifiers) in the type of response variable. In Classification problem response variable is categorical whereas in Regression the response variable is continuous. In SVR we do numeric prediction rather than classifying the objects.

How to Build an SVR:

1. Collect Training dataset
2. Choose a Kernel and it’s parameters as well as any regularization we need
3. Form a correlation matrix
4. Train Machine Learning algorithm to get the co-efficient
5. Using the Co-efficient create the estimator

A Brief about Kernel: The term kernel is derived from a word that can be traced back to c. 1000 and originally meant a seed (contained within a fruit. It was first used in mathematics when it was defined for integral equations in which the kernel is known and the other function(s) unknown, but now it has several meanings in math. The machine learning term kernel trick was first used in 1998.

A kernel is a similarity function. It is a function that we, as the domain expert provide to a machine learning algorithm. It takes two inputs and spits out how similar they are. Suppose your task is to learn to classify images. You have (image, label) pairs as training data. Consider the typical machine learning pipeline: you take your images, you compute features, you string the features for each image into a vector, and you feed these “feature vectors” and labels into a learning algorithm.

Data –> Features –> learning algorithm

Kernels offer an alternative. Instead of defining a slew of features, you define a single kernel function to compute similarity between images. You provide this kernel, together with the images and labels to the learning algorithm and outcome a classifier.