Welcome to the introduction to the Linear Regression section of the Machine Learning with Python.
This article is intended for someone who has basic understanding of Linear Regression; probably person has used some other tool like SAS or R for Linear Regression Analysis. In case user wants to know more about Linear Regression then please click on the given link:
For Linear Regression Analysis user must have installed mentioned libraries in the system.
If not, then use the below given commands to install libraries:
pip install numpy
pip install scikit-learn
pip install matplotlib
pip install pandas
We would need another very useful library “quandl” which has free to use financial and economic datasets for analysis.
pip install quandl
To begin with Linear Regression, our goal is to find an equation which helps us with the best fit line for the data so that we could predict the value for dependent variable based on the values of independent variables.
Linear Regression is a form of supervised machine learning algorithms, which tries to develop an equation or a statistical model which could be used over and over with very high accuracy of prediction.
Linear Regression is popularly used in modeling data for stock prices, so we can start with an example while modeling financial data. We could use sample financial data available in “quandl” library.
Let us first import the libraries (we are using spyder for the analysis but user could also opt for jupyter or pycharm or any other interface):
import pandas as pd
df = quandl.get(“WIKI/GOOGL”)
Incase, you face any issue with quandl, please try “Q” in capital it should solve your problem.
Our data which we are about to model should look like:
Our data set has in total 12 variables but we do not need all of them if closely look into dataset, we would find two types of variables. One the regular or the basic variables and few variables have prefix of “Adj”. We would need only those variables which have “Adj” as prefix because adjusted columns are derived from basic columns, keeping both regular and adjusted variables is redundant.
So, let select the variables which we need for our analysis:
df = df[[‘Adj. Open’, ‘Adj. High’, ‘Adj. Low’, ‘Adj. Close’, ‘Adj. Volume’]]
Now, we have just adjusted columns which are 5 in total. For better understanding of linear regression we would do some manipulation with data to make it more suitable for analysis.
df[‘HL_PCT’] = (df[‘Adj. High’] – df[‘Adj. Low’]) / df[‘Adj. Close’] * 100.0
df[‘PCT_change’] = (df[‘Adj. Close’] – df[‘Adj. Open’]) / df[‘Adj. Open’] * 100.0
Now we have a new data from which looks like:
df = df[[‘Adj. Close’, ‘HL_PCT’, ‘PCT_change’, ‘Adj. Volume’]]
Next please import few more libraries to which we would need for analysis
import quandl, math
import numpy as np
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
We need numpy, to convert data into numpy arrays which is readable into Scikit-learn. We would do one more data adjustment:
forecast_col = ‘Adj. Close’
forecast_out = int(math.ceil(0.01 * len(df)))
df[‘label’] = df[forecast_col].shift(-forecast_out)
We’ll then drop any still NaN information from the dataframe:
We are finally ready with our data to build linear regression model. Let us tag independent and dependent variables:
x = np.array(df.drop([‘label’], 1))
y = np.array(df[‘label’])
Standardizing the dependent variable.
x = preprocessing.scale(x)
Now divide the data into training and test datasets:
x_train, x_test, y_train, y_test = cross_validation.train_test_split(x, y, test_size=0.2)
Though there are several classifiers or regression algorithm available in sklearn but for this analysis we would use Support Vector Regression available in sklearn.
clf = svm.SVR()
Now, we have the classifier which will used for analysis: let us train our machine learning algorithm.
Check the accuracy of machine learning classifier:
confidence = clf.score(x_test, y_test)
We have uploaded all the commands used in this analysis separately. Please use below link to download the commands:
Let us know in case any help required or any query. Please do reach us @
firstname.lastname@example.org or leave a comment.