Exploring Seaborn: Part 1: Creating Visualizations with Scatter Plots and Line Plots

Abhi
7 min readAug 8, 2023

--

Cover Image made with Canva
Image made with Canva

In Python, we have a bunch of libraries for data visualization; a few of them are Matplotlib, Plotly.js, and Seaborn. Matplotlib is also known as the mother of data visualization in Python. In this tutorial series, we will learn about Seaborn. We will cover almost everything in Seaborn that is required to get started with it, and practice will make you a pro.

Note: This tutorial is part of our ongoing data visualization series.

Pre-Requisites for this Tutorial Series

  • A solid understanding of Python is required.
  • Good understanding of pandas, because you will notice that we will frequently use pandas to convert data into proper shape and form.
  • If you have prior experience with data visualization, especially with Matplotlib, that would be the cherry on top.
  • The most important pre-requisite is the will to learn.

What is Seaborn?

Seaborn is an open-source data visualization library in Python built on top of Matplotlib. It is a top-rated and powerful tool for data visualization in Data Science. According to the official definition, Seaborn is a library for making statistical graphics in Python. It integrates closely with Pandas data structures. Seaborn helps you explore and understand your data.

How do I install Seaborn?

The first and most obvious is that you should have Python installed on your system. Since it’s a data visualization tutorial, I am not going into the depths of installing Python and pip.

  • Activate your virtual environment.
  • Install Seaborn via the following command:
pip install seaborn

For the sake of this tutorial, I am using Jupyter Notebook. You are free to use any IDE or notebook. Jupyter Notebook or Google Colab is recommended.

Pros and Cons of Seaborn

Before actually learning Seaborn, we should know about its benefits and limitations.

Pros

  • Seaborn provides a layer of abstraction over Matplotlib, making it simpler to use.
  • We get better aesthetics from graphs, which means better-looking graphs.
  • There are lots of themes to work with with Matplotlib-style graphics.
  • Ability to visualize both univariate and multivariate data.
  • Support for visualizing varieties of regression model data
  • Easy plotting of statistical data for time-series analytics
  • Seamless performance with Pandas, NumPy, and other Python libraries.
  • More graphs are included in Seaborn as compared to Matplotlib.
  • It is a popular data visualization tool, and many companies use it. So you are learning an industry skill.

Apart from all these pros, Seaborn does have some cons.

Cons

  • We cannot plot 3D graphs directly using Seaborn.
  • We cannot make animations using Seaborn.
  • We cannot plot an interactive plot using Seaborn, unlike we can with Plotly.

Seaborn Roadmap

Seaborn plots are mainly classified into six categories:

  • Relational Plot
  • Distribution Plot
  • Categorical Plot
  • Regression Plot
  • Matrix Plot
  • Multiplots
Seaborn Plots
Categorization of the Seaborn Plots

In this article series, we will take a detailed look at all of the above. For this part of the tutorial, we will cover Relational plots.

Relational Plots

Relational plots show the relationship between two or more variables. For example, you might want to use Seaborn to create a scatter plot or a line plot to show the relationship between continuous variables.

Scatter Plot

A scatter plot shows how two numerical variables are related. The dots on the plot help us see if there is a connection between the variables and how strong it is. The dots are known as markers. sns.scatterplot() is an axis-level function. We will discuss axis-level and figure-level functions in detail.

Create a Scatter Plot

Here, I’m using Seaborn’s default tips dataset. It’s about tipping information, where a waiter notes details for each tip received during a few months of working at one restaurant.

import pandas as pd
import seaborn as sns
tips = sns.load_dataset('tips')
tips.head()
Preview of tips dataset
Preview of tips dataset

In the code snippet above, we’re simply importing the required libraries and then loading the tips dataset from Seaborn. Seaborn offers a variety of built-in datasets. To see a list of available datasets, run the following code snippet:

sns.get_dataset_names()

To make a scatter plot in Python using Seaborn, use the sns.scatterplot() function. You have different parameters to input your data: pass arrays for X and Y axes using x and y arguments, use a dataset and specify keys with the data argument, or use a data frame. By default, the markers on the plot will be blue.

sns.scatterplot(data=tips, x='total_bill', y='tip')
Preview of scatter plot
Scatter Plot in Seaborn

In the provided code, the X-axis represents ‘total_bill’ and the Y-axis represents ‘tip’ from the ‘tips’ dataframe.

How to Add Color in Scatter Plots with Hue?

Our scatter plot currently displays the relationship between two variables. To include another variable, we can use color. This is achieved using the ‘hue=’ parameter, which accepts the column label.

The ‘hue=’ parameter behaves differently depending on the type of variable passed:

  1. For categorical variables, each colour represents a different category.
  2. For continuous variables, the color represents a gradient along the scale.

Let’s first use a categorical variable to see how we add more dimensionality to our data:

sns.scatterplot(data=tips, x='total_bill', y='tip', hue='sex')
Scatter Plot in Seaborn using hue parameter on the categorial column
Scatter Plot in Seaborn using hue parameter on the categorial column

Since sex is a categorical column, we can notice that blue markers represent males and orange markers represent females.

Now, Let’s use a continuous variable to see how we add more dimensionality to our data:

sns.scatterplot(data=tips, x='total_bill', y='tip', hue='size')
Scatter Plot in Seaborn using hue parameter on the continuous column
Scatter Plot in Seaborn using hue parameter on the continuous column

Since size is a continuous column, we can notice the color changes to a gradient where the values move along a certain color map indicating the particular scale of a continuous variable.

How to Change Marker Size in Scatter Plots?

Seaborn also allows you to customize the size of markers using the size= parameter. By passing in a Pandas DataFrame column label, the sizes of the markers will adjust relative to the values in the column. Let’s understand this with an example.

sns.scatterplot(data=tips, x='total_bill', y='tip', hue='sex', size='size')
Scatter Plot in Seaborn using size parameter
Scatter Plot in Seaborn using size parameter

Due to the size parameter, the size of markers changed. However, in this case, it does not make a significant difference.

Seaborn allows us to define the relative sizes of the by passing in a tuple of sizes into the sizes= parameter. This allows us to pass in the minimum and maximum sizes, as shown below:

sns.scatterplot(data=tips, x='total_bill', y='tip', hue='sex', size='size' ,sizes=(1,200))
Scatter Plot in Seaborn using size parameter
Scatter Plot in Seaborn using sizes parameter

How to Change Markers in Scatter Plots?

We can change the markers by passing a variable into the style= parameter. This can also be combined with the hue= parameter you learned about previously. This way, the variables will be colored and styled differently.

sns.scatterplot(data=tips, x='total_bill', y='tip', hue='sex', style='time', size='size')
Scatter Plot in Seaborn using style parameter
Scatter Plot in Seaborn using style parameter

We can see that this makes the resulting visualization much more accessible.

There are tons of parameters in the scatterplot function, you can play with each of them by referring to Seaborn Documentation.

Line Plot

Similar to scatter plots, Line plots are also used to show how two numerical values are related. Usually, the line plots display the overall trend or change over a continuous range of values. The dots in a scatter plot are connected with a single line, and the resulting graph is known as a line plot. A line plot is one of the most basic forms of representing data.

To make a line plot in Python using Seaborn, use the sns.lineplot() function. To showcase lineplot() , We’ll use the ‘gapminder’ dataset, which is available as part of Plotly, another data visualization library in Python. We’ll discuss Plotly in a future tutorial, but for now, we’re solely interested in using the ‘gapminder’ dataset.

The ‘gapminder’ dataset focuses on gathering and sharing statistics and other information about social, economic and environmental development at local, national and global levels.

You need to install Plotly to access the dataset.

import plotly.express as px
gap = px.data.gapminder()
gap.head()
Preview of ‘gapminder’ dataset
Preview of ‘gapminder’ dataset
# Filter the dataset where country is India and store in india variable.
india = gap[gap['country'] == 'India']

Create a Line Plot

sns.lineplot(data=india, x='year', y='lifeExp')
Scatter Plot in Seaborn
Scatter Plot in Seaborn

The above Line plot shows the trend of life expectancy in India from 1950 to 2000.

hue, style & size Parameter

These parameters work the same as scatterplot() function.

# Filter the dataset where country is India, Pakistan and Bangladesh
temp_df = gap[gap['country'].isin(['India','Pakistan','Bangladesh'])]

# Line plot
sns.lineplot(data=temp_df, x='year', y='lifeExp', hue='country', style='continent',size='continent')
Scatter Plot in Seaborn using hue, style, and size parameters
Scatter Plot in Seaborn using hue, style, and size parameters

The above Line plot shows the trend of population growth in India, Pakistan, and Bangladesh from 1950 to 2000.

lineplot() function also has lots of parameters but the good part is all the parameters are almost the same as the scatterplot() function. You can play with each of them by referring to Seaborn Documentation.

In the next tutorial of this Seaborn series we will discuss the difference between figure-level and axis-level functions in Seaborn, we will also discuss the relplot() function. In upcoming tutorials, we will also dive into a plethora of additional features, including labels and legends.

I hope you find this tutorial interesting and learn something new. If you have any queries or suggestions, you can always connect with me on my Website or email me at abhi@getifyme.com. Thank you so much for your time.

Keep Coding, Keep Smiling 😀

--

--