Introduction to Data Visualization with Python, Matplotlib and Pandas

Vinoth Saravanan
5 min readNov 17, 2019

--

Data Visualization is the final step of Data Analysis. To make our report more effective and understandable, we are using Data Visualization techniques. For this we have lots of libraries in Python. One of the most popular library is “Matplotlib”.

Visualization:

Visualizations are the easiest way to analyze and absorb information. Visuals help to easily understand the complex problem. They help in identifying patterns, relationships, and outliers in data.

What Is Python Matplotlib?

matplotlib.pyplot is a plotting library used for 2D graphics in python programming language. It can be used in python scripts, shell, web application servers and other graphical user interface toolkits.

Matplotlib is easy to learn and execute. In this article we can read the basic understanding about Data Visualization, Pandas and Matplotlib using Jupyter notebook.

Next, let us move forward in this blog and explore different types of plots available in python matplotlib.

Python Matplotlib : Types of Plots

There are various plots which can be created using python matplotlib. Some of them are listed below:

  1. Bar Graph
  2. Histogram
  3. Scatter plot
  4. Area plot
  5. Pie plot

Before diving into the plot types, let see how to import the matplotlib module.

Fig. 1 -Importing module

Now, we are going to create a sample graph for understanding. Here x-variable contains list of values and y-variable contains list of values. plot()-using to create a line-graph. And show() method using to display the graph.

Fig. 2 Sample graph

Output:

Fig. 3 Output

The above graph is looking not meaningful. So, we can give labels for x-axis and y-axis as well as title for the graph.

Fig. 4 Title and label

Output:

Fig. 5 Output

Multiple lines in one graph:

To perform comparison between two instances in one graph is more effective in representation. When we have multiple lines in one graph, then we should give line name to bring meaning to it. For this, we have one method called “Legend”.

Fig. 6 Legend method

Output:

Fig. 7 Output

Opening a .CSV file and creating a graph:

Now, we are going to read a .csv file and creating graph using more styling techniques includes line width, line color , base color.. etc., In this example we are going to take the world population report as our dataset (“countries.csv”).

Fig. 8 Reading .CSV file

To view the sample or first 5 values in that file, we can use sample() to show random row values. This head(3) shows 1st 3 rows to data.

Fig. 9 Sample values

Output:

Fig. 10 Output

Data Frame vs Series:

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). Pandas Series is nothing but a column in an excel sheet.

We can find the type using “type()” method. So, type(data) returns, “pandas.core.frame.DataFrame”

Assume that we are accessing one column of the dataframe then it returns “pandas.series”. We can see data.country returns the “country” column values in series type.

Fig.11 dataframe vs series

This basic understanding is helps to make graph from given dataset. Now, we are going to take India’s population from given dataset. Using simple filtering operation we can filtered the country India and stored in a variable called “ind”.

Fig. 12 Filtering

We can even try many styling techniques to create a better graph. What if we want to change the width or color of a particular line or what if we want to have some grid lines, there we need styling! So, let me show you how to add style to a graph using python matplotlib. First, we need to import the style package from python matplotlib library and then use styling functions as shown in below code:

Fig. 13 Styling

In above example, using style module we could make some effective presentation. Adding 3rd argument to plot() as linewidth=’5’, we can set the width of line. Adding another argument to plot() is applying manual color using first letter of the color. We can see that

“plt.plot(year,pop,’g’, label=’India’,linewidth=5)” -Here “g” represents the color of line. And we can set the grid lines using grid() method.

Conclusion:

An introduction to data visualization and its implementations was covered in this blog including the basic understanding to handle .csv format files and different styling techniques.

--

--

No responses yet