In data analytics feature scaling is method used to standardize the range of independent variables. Feature scaling helps to remove the dominating impact of independent variables with greater numeric range over those with smaller numeric range.
Since the range of values of raw variables could vary widely, some machine learning algorithms might not work efficiently if data has independent variable with different range. For example, while classifying the data into various categories we generally use “Euclidean distance” method. If one of the feature has very broad range then rest of the features of data then distance would be impacted by this particular feature.
Another reason for the feature scaling is, if used properly feature scaling helps in convergence of gradient descent and helps to avoid convergence in local minima.
Advantages of Feature scaling:
- It helps to train machine learning algorithms faster
- It prevents machine learning algorithm to get stuck in local minima
- It gives better error surface shape
- Weight decay and bayes optimization can be achieved efficiently
There are few algorithms which might have very minimal impact of difference in ranges of different features e.g. Logistic Regression and Decision Tree.
Let us understand Feature scaling with an example. Suppose we have a data set with information regarding employees of a company. Age can vary from 21 to 70, the size of the house they live in can vary from 500 to 5000 square feet and per month salaries can vary from $30000-$80000. In this situation if we use simple Euclidean distance method then age feature might not have an impact on the results we get but age can be a very important contributor. Now, if we normalize all the features between the range of [0,1] then three features would contribute equally while computing the distance. But at times normalization or feature scaling could result into loss of information so we have to careful while scaling or normalizing the data set.
Methods of Scaling:
Rescaling: One of the easiest way to scale the data is to scale the features between the range of [0,1] or [-1,1]. General formula for scaling the features is
Above equation would rescale all the features of a dataset to the range between [0,1]
Normalization: Another way to achieve above results is Mean Normalization.
Standardization: Another efficient way to scale a data can be achieved by dividing each data point for a particular feature with its standard deviation.
The above process will result into data points with standard deviation of 1.
e.g. Below is the data for age of employees of a company.
Normalization: The process of first centering and then scaling is called normalization.
Centering: Centering is process of subtracting mean of a series from each data point of the series is called centering which would result into a series with mean equal to zero.
Equation for Normalization is:
Normalized data will be centered at 0 with standard deviation equal to 1.