Skip to main content

Ordered & Mirrored Barplot

This article explains how to reproduce a mirror barplot with annotations, individual observations, custom style and nice features.

  • This plot shows the number of outgoing and incoming students in different countries. Each bar is the average of the country, and the points the values for each year.

  • A mirror barplot with individual observations using the Matplotlib library to visualize data about the Erasmus Program in European countries.

Libraries

  • matplotlib is used for creating the chart and add customization features
  • pandas is used to put the data into a dataframe and data manipulation
  • numpy is used for adding noise to the positions of each marker
# Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Dataset

For this reproduction, we're going to retrieve the data directly from the Github repo. This means we just need to give the right url as an argument to pandas' read_csv() function to retrieve the data.

## URLs
resume_url = 'https://raw.githubusercontent.com/nnthanh101/Machine-Learning/main/analytics/data/resume.csv'
erasmus_url = 'https://raw.githubusercontent.com/nnthanh101/Machine-Learning/main/analytics/data/erasmus.csv'

## Load Datasets
resume = pd.read_csv(resume_url)
data = pd.read_csv(erasmus_url)

Ordered Barplot

First, let's create the barplots without so much customization.

In order to have the barplots side by side, we just have to specify than the second one will be in negative (-resume['mean_send'] when using the barh() function). Moreover, since we ordered before our dataset by the mean_send column, it will automatically be in the right order (decreasing).

The alpha argument defines the opacity of the bars (between 0 and 1).

# Create a figure and axis with a specific size
fig, ax = plt.subplots(figsize=(6, 6))

# Create both barplots
ax.barh(resume['country_name'], resume['mean_rec'],
color='blue', alpha=0.3)
ax.barh(resume['country_name'], -resume['mean_send'],
color='darkorange', alpha=0.3)

# Add a title
ax.set_title('Number of Student', weight='bold')

# Display the plot
plt.show()

png

Remove spines and shift country names

In this step, we will remove the spines (border of the graph) and put the country names on each bar

  • remove label using ax.set_xticks([])
  • remove spines using ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False)
  • add labels on the center using the text() function
# Create a figure and axis with a specific size
fig, ax = plt.subplots(figsize=(6, 6))

# Create both barplots
ax.barh(resume['country_name'], resume['mean_rec'],
color='blue', alpha=0.3)
ax.barh(resume['country_name'], -resume['mean_send'],
color='darkorange', alpha=0.3)

# Remove axis labels
ax.set_xticks([])
ax.set_yticks([])

# Removes spines
ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False)

# Put country names on the center of the chart
for i, country_name in enumerate(resume['country_name']):
ax.text(0, i, country_name, ha='center', va='center', fontsize=8, alpha=0.6)

# Add a title
ax.set_title('Number of Student', weight='bold', fontsize=9)

# Display the plot
plt.show()

png

Add individual points

The peculiarity of the points in this graph is linked to 2 things:

  • their position on the y-axis, which is different for each country
  • their opacity, which depends on the year concerned

In practice, we iterate over the rows of our data dataframe (thanks to iterrows()) that we haven't used until now, retrieving positions and opacity according to country name and year.

For visual purpose, we add a very small noise to the y_position so that the points are slightly burst.

# Create a figure and axis with a specific size
fig, ax = plt.subplots(figsize=(6, 6))

# Create both barplots
ax.barh(resume['country_name'], resume['mean_rec'],
color='blue', alpha=0.3)
ax.barh(resume['country_name'], -resume['mean_send'],
color='darkorange', alpha=0.3)

# Remove axis labels
ax.set_xticks([])
ax.set_yticks([])

# Removes spines
ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False)

# Put country names on the center of the chart
for i, country_name in enumerate(resume['country_name']):
ax.text(0, i, country_name, ha='center', va='center', fontsize=8, alpha=0.6)

# Add each observations, for each year and country
y_position = 0
for i, row in data.iterrows():

# Get values
sending = -row['participants_x']
receiving = row['participants_y']
y_position = row['y_position']
years = row['academic_year']

# Change alpha parameter according to the year concerned
year_alpha_mapping = {'2014-2015': 0.3,
'2015-2016': 0.4,
'2016-2017': 0.5,
'2017-2018': 0.6,
'2018-2019': 0.7,
'2019-2020': 0.9}
alpha = year_alpha_mapping[years]*0.6 # adjust as needed

# Add small noise to the y_position
y_position += np.random.normal(0, 0.1, 1)

# Add
ax.scatter(sending, y_position, c='darkorange', alpha=alpha, s=3)
ax.scatter(receiving, y_position, c='darkblue', alpha=alpha, s=3)


# Add a title
ax.set_title('Number of Student', weight='bold', fontsize=9)

# Display the plot
plt.show()

png

Add annotations for final chart

All that's missing is a few annotations, but the hard part's over!

Thanks to the text() function, you can easily add annotations of different sizes and styles.

# Create a figure and axis with a specific size
fig, ax = plt.subplots(figsize=(8, 6))

# Create both barplots
ax.barh(resume['country_name'], resume['mean_rec'],
color='blue', alpha=0.2)
ax.barh(resume['country_name'], -resume['mean_send'],
color='darkorange', alpha=0.2)

# Remove axis labels
ax.set_xticks([])
ax.set_yticks([])

# Removes spines
ax.spines[['right', 'top', 'left', 'bottom']].set_visible(False)

# Put country names on the center of the chart
for i, country_name in enumerate(resume['country_name']):
ax.text(0, i, country_name, ha='center', va='center', fontsize=8, alpha=0.6)

# Add each observations, for each year and country
y_position = 0
for i, row in data.iterrows():

# Get values
sending = -row['participants_x']
receiving = row['participants_y']
y_position = row['y_position']
years = row['academic_year']

# Change alpha parameter according to the year concerned
year_alpha_mapping = {'2014-2015': 0.3,
'2015-2016': 0.4,
'2016-2017': 0.5,
'2017-2018': 0.6,
'2018-2019': 0.7,
'2019-2020': 0.9}
alpha = year_alpha_mapping[years]*0.6

# Add small noise to the y_position
y_position += np.random.normal(0, 0.2, 1)

# Add
ax.scatter(sending, y_position, c='darkorange', alpha=alpha, s=3)
ax.scatter(receiving, y_position, c='darkblue', alpha=alpha, s=3)


# Label of Outgoing and Incoming students
ax.text(-6000, 24, 'Outgoing\nStudents',
color='darkorange', ha='center', va='center', weight='bold')
ax.text(6000, 24, 'Incoming\nStudents',
color='darkblue', ha='center', va='center', weight='bold')

# big title
ax.text(-7000, 9, 'Students\nexchanges\nin Europe',
ha='left', va='center', weight='bold', fontsize=14)

# description
text = '''Country ranking based on a
sample Erasmus programs.
Bars show the annual average
for the period, points show
the values for each year.'''
ax.text(-7000, 4.5, text, ha='left', va='center', fontsize=7)

# credits
text = '''Data: Data.Europa | Ordered & Mirrored Barplot'''
ax.text(-7000, 1, text, ha='left', va='center', fontsize=6)

# Academic year legend
ax.text(x=4200, y=11, s='Academic Year', fontsize=7, weight='bold')
y_position = 10 # start at the 10th bar
for year, alpha in year_alpha_mapping.items():

# Add the point
ax.scatter(4000, y_position, alpha=alpha, s=5, c='black')
ax.text(x=4200, y=y_position-0.2, s=year, fontsize=7)

y_position -= 1 # decrease of one bar for the next iteration

# Add a title at the top
ax.set_title('Number of Student', weight='bold', fontsize=9)

# Display the plot
plt.show()

png

References

The chart was originally made with R. This post is a translation to Python by Joseph B..

For more examples of advanced customization in barplot, check out this circular barplot. Also, you might be interested in creating a population pyramid plot.