Visualization of Data with Pie Charts in Matplotlib
Examples of how to create different types of pie charts using Matplotlib to visualize the results of database analysis in a Jupyter Notebook with Pandas
Photo by Niko Nieminen on Unsplash
The Challenge
While working on my Master’s Thesis titled “Factors Associated with Impactful Scientific Publications in NIH-Funded Heart Disease Research”, I have used different types of pie charts to illustrate some of the key findings from the database analysis.
A pie chart can be an effective choice for data visualization when a dataset contains a limited number of categories representing parts of a whole, making it well-suited for displaying categorical data with an emphasis on comparing the relative proportions of each category.
In this article, I will demonstrate how to create four different types of pie charts using the same dataset to provide a more comprehensive visual representation and deeper insight into the data. To achieve this, I will use Matplotlib, Python’s plotting library, to display pie chart visualizations of the statistical data stored in the dataframe. If you are not familiar with Matplotlib library, a good start is Python Data Science Handbook by Jake VanderPlas, specifically chapter on Visualization with Matplotlib and matplotlib.org.
First, let’s import all the necessary libraries and extensions:
https://medium.com/media/adc3397352f29a008c0c3fb70bf5c57a/href
Next, we’ll prepare the CSV file for processing:
The mini dataset used in this article highlights the top 10 journals for heart disease research publications from 2002 to 2020 and is part of a larger database collected for the Master’s Thesis research. The columns “Female,” “Male,” and “Unknown” represent the gender of the first author of the published articles, while the “Total” column reflects the total number of heart disease research articles published in each journal.
https://medium.com/media/792a405ffb5dd5a144ff397b48e8a949/hrefImage by the author and represents output of the Pie_Chart_Artcile_2.py sample code above.
For smaller datasets with fewer categories, a pie chart with exploding slices can effectively highlight a key category by pulling it out slightly from the rest of the chart. This visual effect draws attention to specific categories, making them stand out from the whole. Each slice represents a portion of the total, with its size proportional to the data it represents. Labels can be added to each slice to indicate the category, along with percentages to show their proportion to the total. This visual technique makes the exploded slice stand out without losing the context of the full data representation.
https://medium.com/media/27e1eb4a5677224c0a16522075fd712d/hrefImage by the author and represents output of the Pie_Chart_Artcile_3.py sample code above.
The same exploding slices technique can be applied to all other entries in the sample dataset, and the resulting charts can be displayed within a single figure. This type of visualization helps to highlight the over representation or under representation of a particular category within the dataset. In the example provided, presenting all 10 charts in one figure reveals that none of the top 10 journals in heart disease research published more articles authored by women than men, thereby emphasizing the gender disparity.
https://medium.com/media/a58b178de8293a2dbaad651dfcb370f2/hrefGender distributions for top 10 journals for heart disease research publications, 2002–2020. Image by the author and represents output of the Pie_Chart_Artcile_4.py sample code above.
A variation of the pie chart, known as a donut chart, can also be used to visualize data. Donut charts, like pie charts, display the proportions of categories that make up a whole, but the center of the donut chart can also be utilized to present additional data. This format is less cluttered visually and can make it easier to compare the relative sizes of slices compared to a standard pie chart. In the example used in this article, the donut chart highlights that among the top 10 journals for heart disease research publications, the American Journal of Physiology, Heart and Circulatory Physiology published the most articles, accounting for 21.8%.
https://medium.com/media/e67487ffd1cb3b146d6b2dbcb32e7858/hrefImage by the author and represents output of the Pie_Chart_Artcile_5.py sample code above.
We can enhance the visualization of additional information from the sample dataset by building on the previous donut chart and creating a nested version. The add_artist() method from Matplotlib’s figure module is used to incorporate any additional Artist (such as figures or objects) into the base figure. Similar to the earlier donut chart, this variation displays the distribution of publications across the top 10 journals for heart disease research. However, it also includes an additional layer that shows the gender distribution of first authors for each journal. This visualization highlights that a larger percentage of the first authors are male.
https://medium.com/media/e1c7d9b6c07990edb444379a57ac6fa0/hrefImage by the author and represents output of the Pie_Chart_Artcile_6.py sample code above.
In conclusion, pie charts are effective for visualizing data with a limited number of categories, as they enable viewers to quickly understand the most important categories or dominant proportions at a glance. In this specific example, the use of four different types of pie charts provides a clear visualization of the gender distribution among first authors in the top 10 journals for heart disease research publications, based on the 2002 to 2020 mini dataset used in this study. It is evident that a higher percentage of the publication’s first authors are males, and none of the top 10 journals for heart disease research published more articles authored by females than by males during the examined period.
Jupyter Notebook and dataset used for this article can be found at GitHub
Thank you for reading,
Diana
Note: I used GitHub embeds to publish this article.
Visualization of Data with Pie Charts in Matplotlib was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.