Welcome to the introduction to the Linear Regression section of the Machine Learning with Python.
This article is intended for someone who has basic understanding of Linear Regression; probably person has used some other tool like SAS or R for Linear Regression Analysis. In case user wants to know more about Linear Regression then please click on the given link:
https://www.fromthegenesis.com/category/statisticalmodelingproject/linearregression/
Python Libraries:
For Linear Regression Analysis user must have installed mentioned libraries in the system.

numpy

scikitlearn

matplotlib

pandas
If not, then use the below given commands to install libraries:
pip install numpy
pip install scikitlearn
pip install matplotlib
pip install pandas
We would need another very useful library “quandl” which has free to use financial and economic datasets for analysis.
pip install quandl
To begin with Linear Regression, our goal is to find an equation which helps us with the best fit line for the data so that we could predict the value for dependent variable based on the values of independent variables.
Linear Regression is a form of supervised machine learning algorithms, which tries to develop an equation or a statistical model which could be used over and over with very high accuracy of prediction.
Linear Regression is popularly used in modeling data for stock prices, so we can start with an example while modeling financial data. We could use sample financial data available in “quandl” library.
Let us first import the libraries (we are using spyder for the analysis but user could also opt for jupyter or pycharm or any other interface):
import pandas as pd
import quandl
df = quandl.get(“WIKI/GOOGL”)
print(df.head())
Incase, you face any issue with quandl, please try “Q” in capital it should solve your problem.
Our data which we are about to model should look like:
Our data set has in total 12 variables but we do not need all of them if closely look into dataset, we would find two types of variables. One the regular or the basic variables and few variables have prefix of “Adj”. We would need only those variables which have “Adj” as prefix because adjusted columns are derived from basic columns, keeping both regular and adjusted variables is redundant.
So, let select the variables which we need for our analysis:
df = df[[‘Adj. Open’, ‘Adj. High’, ‘Adj. Low’, ‘Adj. Close’, ‘Adj. Volume’]]
Now, we have just adjusted columns which are 5 in total. For better understanding of linear regression we would do some manipulation with data to make it more suitable for analysis.
df[‘HL_PCT’] = (df[‘Adj. High’] – df[‘Adj. Low’]) / df[‘Adj. Close’] * 100.0
df[‘PCT_change’] = (df[‘Adj. Close’] – df[‘Adj. Open’]) / df[‘Adj. Open’] * 100.0
Now we have a new data from which looks like:
df = df[[‘Adj. Close’, ‘HL_PCT’, ‘PCT_change’, ‘Adj. Volume’]]
print(df.head())
Next please import few more libraries to which we would need for analysis
import quandl, math
import numpy as np
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
We need numpy, to convert data into numpy arrays which is readable into Scikitlearn. We would do one more data adjustment:
forecast_col = ‘Adj. Close’
df.fillna(value=99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df[‘label’] = df[forecast_col].shift(forecast_out)
We’ll then drop any still NaN information from the dataframe:
df.dropna(inplace=True)
We are finally ready with our data to build linear regression model. Let us tag independent and dependent variables:
x = np.array(df.drop([‘label’], 1))
y = np.array(df[‘label’])
Standardizing the dependent variable.
x = preprocessing.scale(x)
Now divide the data into training and test datasets:
x_train, x_test, y_train, y_test = cross_validation.train_test_split(x, y, test_size=0.2)
Though there are several classifiers or regression algorithm available in sklearn but for this analysis we would use Support Vector Regression available in sklearn.
clf = svm.SVR()
Now, we have the classifier which will used for analysis: let us train our machine learning algorithm.
clf.fit(x_train, y_train)
Check the accuracy of machine learning classifier:
confidence = clf.score(x_test, y_test)
print(confidence)
We have uploaded all the commands used in this analysis separately. Please use below link to download the commands:
https://drive.google.com/open?id=1AvAQsvVhHjcxRYIrQNWXLPe7hhRRf0ga
Let us know in case any help required or any query. Please do reach us @
fromthegenesis@gmail.com or leave a comment.
Good article! We are linking to this great post on our website.
Keep up the good writing.
Wonderful work! This is the kind of information that should be shared around the
internet. Shame on the seek engines for now not positioning this
publish higher!
Thank you =)
Good blog you have here.. It’s hard to find high quality writing like yours these days.
I truly appreciate individuals like you! Take care!!
A person essentially assist to make severely articles I’d
state. That is the very first time I frequented your web page and up to
now? I surprised with the research you made to make this
actual put up extraordinary. Magnificent task!
Aw, this was an incredibly good post. Taking the time and actual effort
to make a very good article… but what can I say… I hesitate a
lot and never seem to get anything done.
Good day! This is my 1st comment here so I just wanted to give a quick shout out and say
I really enjoy reading your blog posts. Can you recommend any
other blogs/websites/forums that go over the same subjects?
Thank you!
I constantly spent my half an hour to read this webpage’s articles everyday
along with a mug of coffee.