Creating an Arc Sankey in Tableau

I recently learned about the Xenographics website, which “is a collection of unusual charts and maps, managed by Maarten Lambrechts. Its objective is to create a repository of novel, innovative and experimental visualizations to inspire you, to fight xenographphobia and popularize new chart types.” I decided to take a look and see if there were any charts that might be interesting to visualize in Tableau. The first one that really caught my eye is the Sankey arcs diagram, which was originally designed by Till Nagel, Erik Duval, Andrew Vande Moere, Kristian Kloeckl, and Carlo Ratti circa 2012. A similar visualization type was also designed by Martin Wattenberg in 2002 for the purpose of visualizing structure in strings


My first impression was that the chart was visually quite stunning, but it was also fairly informative, so I decided to see if I could create it in Tableau. After hacking away at it for a bit, I created the following arc sankey which shows global migration between various regions of the world from 2005 to 2010 (Note: The chart excludes migration within the same region. e.g. Europe to Europe). Click on the image to see the fully interactive visualization.


How Does it Work?
An arc sankey shows flow between multiple entities. Each entity is shown on a horizontal line. Relationships or flow between each entity is then visualized using a circular arc. The width of that arc encodes the magnitude of the flow. Though not specifically noted on the Xenographics website or the accompanying research, I’ve designed my arc sankey so that arcs at the top show flow from left to right and arcs at the bottom show flow from right to left, allowing us to clearly see the direction of flow.

In many ways, arc sankeys are very similar to regular sankeys as they show relationships between two entities and the magnitude of those relationships. The big difference is that sankeys can show the relationships between two different entities, while arc sankeys only show one. The global migration example shown earlier is a viable use case for an arc sankey because each region is both a source and a target.

Like any non-standard chart, arc sankeys have a high potential for misuse. So I want to stress the importance of considering the best chart for your use case before using one. Most of the time, an arc sankey will not be the best chart. In all honesty, that could be a good thing because your better choices are most likely going to be much easier to create. And, even if you feel an arc sankey may be a good approach, you may still want to reconsider. In many cases, a regular sankey may actually be a better option (with the same entity on both sides) as it’s just easier to see the flow due to less overlap. To demonstrate this, I’ve also created a regular sankey showing global migration. Click on the image to view the fully interactive version.


I’ll let you be the judge on which one is easier to understand. I won’t belabor this point any longer—I’ll just ask that you carefully consider your options before choosing to create this type of chart.
 
Implementation in Tableau
The arc sankey consists of two primary components. First are the individual nodes which represent each entity. These are spaced equally along the x axis. They are created using simple filled circle shapes.

Second are the arcs which connect one node to another. These are drawn using data densification and parametric equations. I won’t go into the math involved in creating these, but feel free to download my workbook if you want to know how it works under the covers.

As noted above, arcs at the top show a flow from left to right, while arcs on the bottom show flow from right to left. For example, in the sample shown below, the flow starting at “Thing 1” and going to “Thing 8” is shown at the top and the flow from “Thing 8” to “Thing 1” is shown on the bottom.


By doing this, we can ensure that there is no confusion over the direction of the flow. A nice byproduct of this is that it makes the entire chart look like a circle, which can be quite visually satisfying.

But, there are situations where there is no change from one entity to another. Perhaps “Thing 1” stays at “Thing 1” with no change. To deal with these, I’ve just created a circle around each node. Again, using the example above, notice the circle around “Thing 1”.

The Template
To make these easy to create, I’ve created a fairly simple template, which consists of two components—an Excel spreadsheet and a Tableau workbook. My goal is to make it as easy as possible to plug in your own data.

The Excel spreadsheet (you can find it here: Arc Sankey Excel Template) has four sheets, Arc, Order, Model, and ToFrom. Model handles the data densification required to plot the individual points needed to approximate the curves. Essentially, these are the parameter values in our parametric equations. While these could have been created using bins and table calculations, I chose not to do that as it creates complexity within Tableau and often leads to performance issues. ToFrom is a single sheet whose purpose is basically to duplicate our data into two sets of “To” and “From” records. The good news is that both Model and ToFrom will remain static—you need not make any changes to them; just make sure they are in your spreadsheet.

The Order sheet allows you to specify the order, left to right, which your entities appear along the x axis. You’ll need to modify this sheet to include each unique entity and a numeric order.
Finally, the Arc sheet will be used to populate your data. It contains just four columns. You can add more if needed, but these are the four that are required by the Tableau template. The columns are as follows:

Join – The purpose of this column is simply to join each row in the Model sheet to each row in Arc worksheet. But don’t worry too much about this. You simply need to make sure that every row has a value of “link” in this column. Note: Strictly speaking, we could use a join calculation in Tableau to join these sheets together, but for simplicity sake, I often like to include a separate column in my data set.

From – The from/source of the flow.

To – The to/target of the flow.

Value – The value of the flow.

Here’s how the sheet looks with some sample data:


Once you have populated the Arc and Order sheets, then you need to connect it to Tableau. Start by downloading the template Tableau workbook from my Tableau Public page: Arc Sankey Template. Then edit the data source and connect it to your Excel template. The workbook should update automatically to reflect your data.

From here, you can do whatever you like with the chart—adjust sizing of the arcs and/or nodes, change the colors, add filters, update tooltips, etc. just as you normally would. 

If you'd like to better understand how the chart works under the covers, feel free to download the workbook and take it apart as needed.

Thanks for reading. If you create an arc sankey using this technique, I’d love to see it.

Ken Flerlage, October 14, 2019

15 comments:

  1. Can we have as many words we would like to have in the Order Sheet ?

    ReplyDelete
    Replies
    1. Yes you should be able to use as many as you like. Just be aware that space could become an issue.

      Delete
  2. What is 'From Abbr' and 'To Abbr'??

    ReplyDelete
    Replies
    1. Looks like that was in my original data set, but didn't make it to the Excel template. That was my mistake. If you are struggling, feel free to email me. flerlagekr@gmail.com

      Delete
  3. Is it possible to adjust the height and not the thickness of the arc based on the value? Like measuring the positive or negative change from Thing 1 to Thing 2 and Thing 2 to Thing 3, etc.

    ReplyDelete
    Replies
    1. Perhaps, but I'd probably need to see an example. Can you email me? flerlagekr@gmail.com

      Delete
  4. Hi Ken,

    Thanks for the great tutorial. I am excited to apply this to a dataset that I'm analyzing, but I'm a little stuck.

    When I tried to connect the 'Arc Sankey Excel Template' to the 'Global Migration' Workbook by editing the data source, it gives this error message ( SQLSTATE:42601] '-' requires numerical argument and it won't allow me to connect to go further. I tried connecting to your template as-is without any modifications and the problem still existed.

    Would you know why and how I can fix ? Thanks and I look forward to hearing from you.

    Kathy

    ReplyDelete
  5. I have subbed "Thing 1", "Thing 2" ...etc. with categories form my own dataset and their corresponding values. However, my centre circles are now misaligned with the start/end of the arcs. Any idea why that is and how to fix it? It seems that the value is somehow tied to the up/down position of the circles? Thanks!

    ReplyDelete
    Replies
    1. Try to synchronize your axes. If that doesn't work, feel free to email me. flerlagekr@gmail.com

      Delete
    2. That worked! I never even thought of it because the axes were hidden. Thanks for the tip :)

      Delete

Powered by Blogger.