Seaborn (Python) – Data Visualization tool

0
1492

In Analytics, best way to analyze data is through statistical info-graphics. Seaborn, in Python is a data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

To put in other words, Seaborn library with its data visualization capabilities make data analysis very easy. Seaborn is built on the top of matplotlib library, so for certain tasks we could actually invoke maplotlib directly.

Why Seaborn?

Seaborn offers a variety of functionality which makes it useful and easier than other frameworks. Some of these functionalities are:

  • A function to plot statistical time series data with flexible estimation and representation of uncertainty around the estimate
  • Functions for visualizing univariate and bivariate distributions or for comparing them between subsets of data
  • Functions that visualize matrices of data and use clustering algorithms to discover structure in those matrices
  • High-level abstractions for structuring grids of plots that let you easily build complex visualizations
  • Several built-in themes for styling matplotlib graphics
  • Tools for choosing color palettes to make beautiful plots that reveal patterns in your data
  • Tools that fit and visualize linear regression models for different kinds of independent and dependent variables

We would use Financial data to explore Seaborn library, please click HERE to download the file.

Once we have these python packages installed we can proceed with the installation. For pip installation, run the following command in the terminal:

pip install seaborn

conda install seaborn

Once seaborn library is installed, we are ready to explore the capabilities.

5
6
7
8
9
10
11
12
13
14
import pandas as pd
import seaborn as sns
#if using Jupyter Notebooks the below line allows us to display charts in the browser
%matplotlib inline
#load our data in a Pandas DataFrame
df = pd.read_excel(‘Financial Sample.xlsx’)
#set the style we wish to use for our plots
sns.set_style(“darkgrid”)
#print first 5 rows of data to ensure it is loaded correctly
df.head()

 

Once we run above commands, we will get following output:

 

In any data analysis we generally try to explore the data based on certain categories and try different type of charts like: Bar Chart or Line Chart.

Bar Chart Analysis:

To analyze data using bar charts run the following commands, to see the units sold by country:

sns.barplot(x=“Country”,y=“Units Sold”,data=df)

When we want a specific statistical characteristic across the different categories in bar-chart; we would also specify in the command, like in the following command we specify the standard deviation.

2
import numpy as np
sns.barplot(x=“Country”,y=“Units Sold”,data=df,estimator=np.std)

As a quick note, the black line that you see crossing through the top of each data bar is actually the confidence interval for that data, with the default being the 95% confidence interval.

Count_Plot: Let’s now move on to a “countplot” – this is in essence the same as a barplot except the estimator is explicitly counting the number of occurrences.

sns.countplot(x=“Segment”,data=df)

Boxplots: Boxplots show the distribution of quantitative data in a way that hopefully facilitates comparison between variables.

sns.boxplot(x=“Segment”,y=“Profit”,data=df)

Violin Plot:

1
sns.violinplot(x=“Segment”,y=“Profit”,data=df)

There are several other useful plot types we could try which makes seaborn a very helpful tool for any data analyst.

Conclusion: In this lesson, we have seen that Seaborn makes it easy to manipulate different graph plots. Seaborn makes it easy to visualize data in an attractive manner and make it easier to read and understand.

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here