Ads Top

Chart Templates Part 1: Sankeys

Recently, Rodrigo Calloni mentioned to me that he wanted to create a visualization for the upcoming 2018 FIFA World Cup. His idea was to create a sankey diagram showing the top 10 countries and the number of goals scored in each World Cup. But he hadn’t previously created a sankey and wondered if I could help. Fortunately, I had previously built a nice template for sankeys from the work by Olivier Catherin and Jeffrey Shaffer which I provided to him and resulted in the following visualization (click on the image to see the interactive version).

Templatizing Complex Charts
Unfortunately, some chart types, including sankeys, can pretty difficult to create, even when you have a detailed tutorial. They tend to include data scaffolding, data densification, unorthodox joins, table calculations, trigonometry, etc. So when I create one of these, I always templatize them as much as possible. These templates typically consist of a relatively simple Excel spreadsheet into which some data can be plugged and a Tableau workbook, which simply connects to the spreadsheet. In the end, the process becomes relatively plug-and-play.

The complexity of these charts can often be a deterrent for some people. So, in this series of blog posts, I’ll be sharing some Excel/Tableau templates for creating three different charts—Sankeys, Sunbursts, and Joy Plots. In each post, I’ll start by describing the chart, detailing its potential use cases, and providing a high-level technical description of how it’s built in Tableau. I’ll then provide as step-by-step tutorial on how to use my templates.

Before I move onto my first chart type, I want to take a moment to encourage you to drill into these chart types further. Part of the reason I am providing these templates is in hope that it will pique your interest in better understanding how they’re built. Taking apart complex charts is a really good way to learn some of the more advanced features of Tableau, so I highly encourage you to take some time to understand how these work under the covers.

Sankey Diagrams
We’re going to start with sankey diagrams. describes these as such: “Sankey Diagrams display flows and their quantities in proportion to one another. The width of the arrows or lines are used to show their magnitude, so the bigger the arrow [or line], the larger the quantity of flow.” These charts are named after Matthew Henry Phineas Riall Sankey (that's a lot of names!!) who first used this chart type to show energy efficiency of a steam engine as shown below. (Wikipedia).

So, the key use case for sankeys is for showing flow from one thing to another. One of the best use cases I’ve seen is colleges using sankeys to show graduating students’ majors and the proportion of those students who have gone into various professions. Unfortunately, sankeys are often misused in situations where there would be a much better alternative. Sure, sankeys look cool, but it simply isn’t always the best chart type. So, before I go any further, I just want to stress the importance of considering the best chart for your use case before using a sankey. Most of the time, a sankey will not be the best chart and that, in all honesty, is a really good thing because your better choices are most likely going to be much easier to create.

Implementation in Tableau
A number of different people have implemented sankey charts in Tableau. Some approaches consist of lines drawn from one point to another, but the one I’ll be discussing is the polygon sankey developed by Olivier Catherin and Jeffrey Shaffer. Since this method was developed, it’s become one of the most commonly used methods. The chart consists of three primary components. The first two are gantt bars which represent the source and target (from and to). The third is a set of polygons, sized by proportion, which show the flow from one thing to the next. These polygons tend to take the shape of a sigmoid curve—a type of curve which takes on an S-like shape—which are created by leveraging parametric equations. Building these polygons also requires data scaffolding and some complex table calculations, which I won’t go into here.

The Template
Before I share my template, I quickly want to mention the amazing work that Ian Baldwin, of the Information Lab, recently did which shows how to build a Sankey diagram in Tableau without any data prep beforehand. It's a brilliant post and well worth the read.

This sankey template is based directly on the method detailed by Olivier Catherin—I have not added any new innovations to the method. Rather, I’ve simply tried to turn it into a plug-and-play template. So, all the credit for this work goes directly to Olivier and Jeffrey.  I'd also be remiss if I didn't credit my colleagues at Bucknell University. I first came to understand the inner workings of this chart about a year ago when I took apart some of the sankeys used internally.

The template includes two components—an Excel spreadsheet and a Tableau workbook. The goal in developing these templates was to make it as easy as possible to plug in your own data. That being said, it does make certain assumptions. Primarily, it assumes that you already have aggregated your data. Modifying the template to allow for non-aggregated data would not necessarily be difficult, but I won’t be dealing with that here.

The Excel spreadsheet (you can find it here) has two sheets, Data and Model. Model is used to handle the data densification and parameter values needed to build the sigmoid polygons using parametric equations. If you didn’t understand that sentence, don’t worry—you don’t need to modify this sheet at all; just make sure it’s in your spreadsheet. The Data sheet will be used to populate your pre-aggregated data. It contains just four columns. You can add more if needed, but these are the four that are required by the Tableau template. The columns are as follows:

Link – The purpose of this column is simply to join each row in the Model worksheet to each row in Data worksheet for the purpose of creating the polygon curves. But don’t worry too much about this. You simply need to make sure that every row has a value of “link” in this column. Note: Strictly speaking, we could use a join calculation in Tableau to join these sheets together, but for simplicity sake, I often like to include a separate column in my data set.

Step 1 – This column defines the starting point of the flow (i.e. the “from”). So, to use my example of college graduates, this would be student majors.

Step 2 – This column defines the end point of the flow (i.e. the “To”). This would be the college graduates’ professions.

Size – This column contains the aggregated value for the flow. Again using the college student example, this would be the total number of students that majored in one subject (for example, Chemistry) and ended up working in a particular field (for example, Food & Drug).

Each flow from one thing to the other will contain a separate record. Here’s how the sheet looks with some sample data:

Once you have populated the Data sheet, then you need to connect it to Tableau. Start by downloading the template Tableau workbooks, which are on my Tableau Public page. I’ve actually created two Tableau workbooks—one for horizontal sankeys and one for vertical sankeys. Then edit the data source and connect it to your Excel template. The workbook should update automatically to reflect your data. From here, you can do whatever you like with the chart—change the colors, add filters, update tooltips, etc. just as you normally would. Here’s how it looks with the sample data shown above.

And that’s all there is to it. If you use this template to create your own sankey, I’d love to see it. But please don’t create one simply because you can. Be sure to closely scrutinize your use case and make sure it’s the right chart type first.

Ken Flerlage, April 13, 2018

No comments:

Powered by Blogger.