Predicting Stock Prices: Linear Regression (Python)


Welcome to the introduction to the Linear Regression section of the Machine Learning with Python.

This article is intended for someone who has basic understanding of Linear Regression; probably person has used some other tool like SAS or R for Linear Regression Analysis. In case user wants to know more about Linear Regression then please click on the given link:

Python Libraries:

For Linear Regression Analysis user must have installed mentioned libraries in the system.

  1. numpy
  2. scikit-learn
  3. matplotlib
  4. pandas

If not, then use the below given commands to install libraries:

pip install numpy

pip install scikit-learn

pip install matplotlib

pip install pandas

We would need another very useful library “quandl” which has free to use financial and economic datasets for analysis.

pip install quandl

To begin with Linear Regression, our goal is to find an equation which helps us with the best fit line for the data so that we could predict the value for dependent variable based on the values of independent variables.

Linear Regression is a form of supervised machine learning algorithms, which tries to develop an equation or a statistical model which could be used over and over with very high accuracy of prediction.

Linear Regression is popularly used in modeling data for stock prices, so we can start with an example while modeling financial data. We could use sample financial data available in “quandl” library.

Let us first import the libraries (we are using spyder for the analysis but user could also opt for jupyter or pycharm or any other interface):

import pandas as pd

import quandl

df = quandl.get(“WIKI/GOOGL”)


Incase, you face any issue with quandl, please try “Q” in capital it should solve your problem.

Our data which we are about to model should look like:

Our data set has in total 12 variables but we do not need all of them if closely look into dataset, we would find two types of variables. One the regular or the basic variables and few variables have prefix of “Adj”. We would need only those variables which have “Adj” as prefix because adjusted columns are derived from basic columns, keeping both regular and adjusted variables is redundant.

So, let select the variables which we need for our analysis:

df = df[[‘Adj. Open’,  ‘Adj. High’,  ‘Adj. Low’,  ‘Adj. Close’, ‘Adj. Volume’]]

Now, we have just adjusted columns which are 5 in total. For better understanding of linear regression we would do some manipulation with data to make it more suitable for analysis.

df[‘HL_PCT’] = (df[‘Adj. High’] – df[‘Adj. Low’]) / df[‘Adj. Close’] * 100.0

df[‘PCT_change’] = (df[‘Adj. Close’] – df[‘Adj. Open’]) / df[‘Adj. Open’] * 100.0

Now we have a new data from which looks like:

df = df[[‘Adj. Close’, ‘HL_PCT’, ‘PCT_change’, ‘Adj. Volume’]]


Next please import few more libraries to which we would need for analysis

import quandl, math

import numpy as np

from sklearn import preprocessing, cross_validation, svm

from sklearn.linear_model import LinearRegression

We need numpy, to convert data into numpy arrays which is readable into Scikit-learn. We would do one more data adjustment:

forecast_col = ‘Adj. Close’

df.fillna(value=-99999, inplace=True)

forecast_out = int(math.ceil(0.01 * len(df)))

df[‘label’] = df[forecast_col].shift(-forecast_out)

We’ll then drop any still NaN information from the dataframe:


We are finally ready with our data to build linear regression model. Let us tag independent and dependent variables:

x = np.array(df.drop([‘label’], 1))

y = np.array(df[‘label’])

Standardizing the dependent variable.

x = preprocessing.scale(x)

Now divide the data into training and test datasets:

x_train, x_test, y_train, y_test = cross_validation.train_test_split(x, y, test_size=0.2)

Though there are several classifiers or regression algorithm available in sklearn but for this analysis we would use Support Vector Regression available in sklearn.

clf = svm.SVR()

Now, we have the classifier which will used for analysis: let us train our machine learning algorithm., y_train)

Check the accuracy of machine learning classifier:

confidence = clf.score(x_test, y_test)


We have uploaded all the commands used in this analysis separately. Please use below link to download the commands:

Let us know in case any help required or any query. Please do reach us @ or leave a comment.



  1. Wonderful work! This is the kind of information that should be shared around the
    internet. Shame on the seek engines for now not positioning this
    publish higher!

    Thank you =)

  2. Good blog you have here.. It’s hard to find high quality writing like yours these days.
    I truly appreciate individuals like you! Take care!!

  3. A person essentially assist to make severely articles I’d
    state. That is the very first time I frequented your web page and up to
    now? I surprised with the research you made to make this
    actual put up extraordinary. Magnificent task!

  4. Aw, this was an incredibly good post. Taking the time and actual effort
    to make a very good article… but what can I say… I hesitate a
    lot and never seem to get anything done.

  5. Good day! This is my 1st comment here so I just wanted to give a quick shout out and say
    I really enjoy reading your blog posts. Can you recommend any
    other blogs/websites/forums that go over the same subjects?
    Thank you!

  6. This is the perfect webpage for anyone who wishes to find out about this topic.
    You understand a whole lot its almost tough to argue with
    you (not that I really would want to…HaHa). You definitely put a
    fresh spin on a subject which has been discussed for
    a long time. Great stuff, just great!

  7. We absolutely love your blog and find many
    of your post’s to be exactly I’m looking for. Does one
    offer guest writers to write content available for you?

    I wouldn’t mind producing a post or elaborating on many of the subjects you write regarding here.

    Again, awesome site!

  8. After checking out a number of the articles on your web site, I honestly like your way of writing a blog. I saved as a favorite it to my bookmark site list and will be checking back soon. Take a look at my website as well and tell me what you think.

  9. Pretty nice post. I just stumbled upon your blog and wished to say that I have really enjoyed surfing around your blog posts.
    After all I’ll be subscribing to your rss feed and I hope you write again very soon!

  10. Aw, this has been good post. Within just presumed I must make a note of similar and additionally : spending time and additionally proper job to earn a pretty nice article… about the try this advice My personal say… One put it off plenty without the slightest bit appear go made to happen.

  11. Have you ever considered writing an ebook or guest authoring on other websites?
    I have a blog based on the same ideas you discuss and would really like to have you share some stories/information. I know
    my readers would value your work. If you are even remotely
    interested, feel free to shoot me an e mail.

  12. Hello There. I found your blog using msn. This is an extremely well written article.
    I will make sure to bookmark it and come back to read more of
    your useful information. Thanks for the post. I’ll certainly comeback.

  13. Hello would you mind letting me know which webhost you’re using? I’ve loaded your blog in 3 different internet browsers and I must say this blog loads a lot quicker then most. Can you suggest a good internet hosting provider at a fair price? Kudos, I appreciate it!


Please enter your comment!
Please enter your name here