Ads Top

More Sankey Templates: Multi-Level, Traceable, Gradient, and More!!


Sankey charts are often criticized in the data visualization community, largely because they are very regularly misused (perhaps more often than they are used properly), but I love them nonetheless. When used in the right way and for the right use case, they are incredibly insightful, not to mention visually stunning.


That said, these charts are pretty difficult to create in Tableau. There is no shortage of tutorials, but they inevitably get into some pretty tough table calculations and other tricky business. So, to make them a bit easier, last year I posted a template for creating sankeys, based on the polygon sankeys built by Olivier Catherin and Jeffrey Shaffer. To date, this has been one of my most popular blog posts. But I often receive questions from people asking how to make various adjustments. These have led to this post where I’m going to share six new sankey templates which attempt to address many of these questions. So let’s get started!!

Before I start, I want to say a big thanks to my brother, Kevin Flerlage (you'll hear from him directly soon) for his help in testing these templates. His feedback was critical to making these templates as useful as possible.

Quick Note: I’m going to run through these six new sankeys first before showing you how to create them. Many of them use the exact same template and process as my original template, but others require some slight adjustments. I’ll address all of this at the end of the blog.

Adjustable Whitespace
The sankey from my original post looked like this:



One problem with this is that there is a lot of space between each of the bars on the left and right sides, so a common question has been how to reduce that whitespace. I’ve generally provided a hacky solution which involves changing one of the calculated fields then adjusting the axes on all the sheets, but I’ve never loved that solution and really wanted to make it adjustable via a parameter. Unfortunately, this was not exactly a straightforward change and required some pretty fundamental changes to almost all of the calculations. But, in the end, I was able to create a sankey that allows you to adjust the amount of whitespace anywhere from zero, which will look like this:



…to 1, which because the sankey is drawn on a scale from 0 to 1, means that it’s pretty much all whitespace (and completely useless). But you could make the bars really small, if you’d like:



In most cases, you will probably want something in between. It is important to note here that the whitespace defined in the parameter is the total amount of whitespace added to the bars on the left and right sides. That whitespace is distributed evenly between the bars. For example, if you specified a whitespace amount of 0.3 and you had four bars, the template would place 0.1 whitespace in the three gaps between them. But, if you only had two bars, the entire 0.3 whitespace would fill the single gap between them. This ensures that the bars’ sizes are not compressed which would lead to changing flow sizes from the left to the right.

Okay, so this one isn’t really a new template so to speak—it’s really more of a feature I’ve added to make it more flexible. That being said, I’ve included this feature in all the additional templates to follow.

New Look and Feel
A second comment I often get is that people don’t like is that, when you highlight a flow, the bars’ labels are not highlighted, so it’s difficult to see where the flow is going, as shown below:


This has a pretty simple fix—you just need to adjust the highlight action so that the bars aren’t a target. But I wanted to come up with a different solution to this problem. One day while looking around the web for sankeys, I came across this beauty by Stefania Guerra (see Visualizing Ageing - Issue mapping for an ageing Europe for this and more of her related work.)


There is so much to like about this sankey, but one thing that caught my eye was that the labels were separated and placed to the left/right of small thin colored bars. So, I decided to build a template that looks very similar to Stefania’s work.



The highlight actions are set up to not include the text as a target, so that means they’ll always be visible, thereby addressing the original problem. Plus, it just looks really nice.

Multi-Level Sankey
Another common request I receive is for help creating a multiple-level sankey so that a larger flow, with multiple steps, can be visualized. The design of the original sankey template is such that you can create a multi-level sankey by copying many of the calculated fields, copying the sheets, adjusting the table calculations, and adding them all to the dashboard. But that can prove to be a ton of work and an understanding of the table calculations used is pretty important. So I decided to create a multi-level sankey template (click the image to see the interactive version).


Those colors are pretty bold, but you’re free to change them however you like. You will notice that I chose to stop at 5 levels. I thought that would be enough, but if you need more, feel free to reach out to me and I can provide some direction on adding more.

Traceable Multi-Level Sankey
The multi-level sankey above was, admittedly, the first time I’d ever built a sankey with more than two levels and, as I was looking at it, there was one thing that bothered me about it—there’s no easy way to trace a record through then entire flow. Granted, a sankey is an aggregated chart, so it’s purpose is to give you an overall idea of the flow, at an aggregate level, but I just though it would be nice to have the ability to trace a single item (or person or order or whatever) through the flow. So, to address this, I created a traceable multi-level sankey (click the image to see the interactive version).


In this sankey, every detailed record in your data set results as its own separate flow. All of the record identifiers are available in the dropdown list (a parameter). When you select an item, it is highlighted in a different color. When you hover over a specific bar on any of the levels, it will highlight the entire flow, but hovering over the flow itself will highlight the individual record through the entire flow. The combination of these features give you the best of both worlds.

I’d also point out that, if you look closely at the image, you can see that there is a faint white border around each individual flow. These were not intended and are not visible in Desktop. But, when published to Public or Server, they show due to how HTML5 Canvas renders adjacent polygons on the web. However, I really like them in this case as they add some depth to the chart and make it clear that there are multiple “strands” in each flow. Note: Keep this in mind as it will impact our next chart.

In addition to the ability to trace a single item through the flow, I created version that allows you to trace an entire flow. You can select which step you'd like to trace then which value. For example, below I've chosen to trace from the first step, then trace the D value. 



Gradient Sankey
The final template is a gradient colored sankey. If you’re a regular reader of my blog, you probably saw my recent post on how to create gradient bar and area charts. So, after spending all this time experimenting with sankeys, I decided to see if I could do the same here.

In all of the sankeys I’ve shown previously, each flow is a single polygon in Tableau (except for the traceable sankey, of course). So, similar to my approach for creating gradient bar charts, my idea was to break each of those polygons into small slices then color each slice a slightly different color, creating the gradient effect. This required some changes to the underlying data model built into the template (I’ll get to that shortly). In the end, I was able to create this:



But there’s a problem. Notice those thin white lines between each individual polygon. This, of course, is the same problem noted earlier, but in this case, I don’t care for the way it looks. Unfortunately, we can’t eliminate the lines altogether, but there are a couple of things we can do to reduce the impact. First, we can create a slight overlap between each polygon in an attempt to cover up the white lines. So, I added a parameter that allows you to specify the amount of overlap you’d like to have. A value of zero will give you what I showed above, but you can then tune it a bit to reduce the impact of the white lines. For instance, here’s the sankey with 0.0073 overlap.



That kind of pushes it the other direction, creating darker lines instead of lighter ones, but I prefer this over the light lines. But what’s nice about the flexibility of this parameter is that you can really embrace the lines. For instance, you can make them a bolder dark color:



Or you can make the value negative to create larger white gaps:



But, if you really want a smooth color transition, there’s one more option. Jeffrey Shaffer’s website, Data + Science, includes a blog detailing methods to create high resolution images from your Tableau workbooks. If you’ve never read this, then do yourself a favor and go read it right now. It’s incredibly valuable when you need a high res image, be it for an image on your blog, a desktop background, or to print yourself some wall art. My personal favorite is the pixelratio trick, which essentially increases the number of pixels displayed. Interestingly, you can use this trick to help reduce some of the impact of the thin white lines. For example, here’s the sankey with a pixel ratio of 15:



The drawback of this, however, is that it increases the load time of the chart. This sankey is already somewhat complex, so that may not always be ideal.

How to Use The Templates
Okay, now that you’ve seen all six new templates, let’s talk about how to use them. The first two templates—adjustable whitespace and new look and feel—follow the exact same process as shared in my original blog, so go check out the steps documented there. The remaining three follow the same basic process with a few slight modifications.

Multi-Level
The multi-level sankeys require a slightly different template, which is different than the original in a two ways. First, instead of two fields, Step 1 and Step 2, it now has Steps 1-5. The second difference is the addition of an ID field. This field is to be used to uniquely identify a record. The purpose of this field is to enable traceability in the traceable multi-level sankey. Strictly speaking, it isn’t required in the regular multi-level sankey, so it can be left blank, unless you’d like to have record ID’s for filtering or other purposes.

If you’re building a traceable sankey, then once you’ve loaded your own data into Tableau, you’ll need to populate the Select ID parameter. You can use the Add From Field button on the parameter to populate the list from the ID field.

Before I move on to the gradient sankey, I would like to point out that my original blog suggested that you aggregate your data before populating the Excel template. This isn’t really necessary as the template is built to aggregate the data automatically. So, unless you are concerned about the number of records, feel free to populate the template with detailed data.

Gradient
The process for creating the gradient sankey is the same as the original blog, but requires a slightly modified Excel template. However, the tab on which you populate your data will remain unchanged. The only difference in the template is an adjusted Model tab.

Coloring the flows with the gradient color requires a bit of explanation. If you look at the marks card, you’ll see that the flows are colored by Step 1, Step 2, and Polygon.


Step 1 and Step 2 are the to and from fields that define a flow overall; polygon is a numeric field which identifies each of the 120 polygons used to break the flow into multiple pieces which are then colored individually. To color a flow, click the color card, then click Edit Colors. You’ll see something like this:


Select all 120 polygons for a given set of Step 1/Step 2 values.


Finally, select your favorite continuous or diverging color palette then click Apply. Tableau will the automatically distribute that palette across all 120 values, giving you the nice gradient color we’re looking for. 



You’ll then need to repeat this step for all of your individual flows.

The Files
You can find all the Excel and Tableau templates using the following link. I’ve included the Tableau workbooks in both 2019.1 and 10.4 formats, so that you can use them even if you are not using the latest and greatest version of Tableau. If you need an even older version, then please reach out to me. 

All the Files

Using in Real-Life
Finally, before closing this out, I wanted to note that I often have people tell me they can’t use these templates because they can’t put their data in Excel. But that’s not exactly true. If you can get your data into the basic structure of the template, either through use of data prep tools, such as Tableau Prep, or custom SQL, then you can easily replace the data with your own. Not convinced? Here's a quick testimonial from my brother, Kevin Flerlage:


I recently utilized one of Ken’s Sankey templates at work, specifically the multi-level Sankey template. I simply structured my data using SQL to include the data points shown in his Excel template. I brought that into the Tableau workbook template as Custom SQL, replaced the “Data” sheet, and created a join calculation of 1 = 1. From there, all I had to do was to replace references of his column names to my column names.  The process was quite simple and it only took about 20 minutes from start to finish. 

If you need to do this and get stuck, please feel free to contact me and I’d be happy to help you out.  

Different Curve Types
Okay, one last thing before you go. All of these templates use a curve type called a sigmoid, but other types of curves can be used as well and, in many cases, these other curves can work better than sigmoids. If you want to learn how to leverage these curve types in your sankeys, then check out the amazing work of Chris DeMartiniMore options for your Tableau Sankey Diagram

Ken Flerlage, April 13, 2019



3 comments:

  1. Excellent article, Ken. NOt really a fan of gradient fills, but the white-gapped gradient version is quite striking.

    ReplyDelete
  2. Ken, I used your last Sankey template to create a multi-level (with 3 sets of curves!), and whilst it was a bit tricky, the layout of your templated dataset made it much less troublesome than I thought it would be. Now, with this multi-level template, I can maybe relax a bit more next time.

    One point I would like to add, is that the sankey I was making did not have 100% of step 1 flowing all the way through to step 4, and actually, new outside influences came in at step 2 and 3. I had to do some creative resizing to make it work (because I am hopeless at the math behind this!) - is this something that you think could be possible to create effectively with the calcs?

    Great work on this post, the gradients look pretty cool!

    ReplyDelete
    Replies
    1. Thanks Stewart. That is certainly a challenge, especially the addition of new influences. I don't have any thoughts on how to approach that at the moment.

      Delete

Powered by Blogger.