Ads Top

Chart Templates Part 1: Sankeys

Recently, Rodrigo Calloni mentioned to me that he wanted to create a visualization for the upcoming 2018 FIFA World Cup. His idea was to create a sankey diagram showing the top 10 countries and the number of goals scored in each World Cup. But he hadn’t previously created a sankey and wondered if I could help. Fortunately, I had previously built a nice template for sankeys from the work by Olivier Catherin and Jeffrey Shaffer which I provided to him and resulted in the following visualization (click on the image to see the interactive version).



Templatizing Complex Charts
Unfortunately, some chart types, including sankeys, can pretty difficult to create, even when you have a detailed tutorial. They tend to include data scaffolding, data densification, unorthodox joins, table calculations, trigonometry, etc. So when I create one of these, I always templatize them as much as possible. These templates typically consist of a relatively simple Excel spreadsheet into which some data can be plugged and a Tableau workbook, which simply connects to the spreadsheet. In the end, the process becomes relatively plug-and-play.

The complexity of these charts can often be a deterrent for some people. So, in this series of blog posts, I’ll be sharing some Excel/Tableau templates for creating three different charts—Sankeys, Sunbursts, and Joy Plots. In each post, I’ll start by describing the chart, detailing its potential use cases, and providing a high-level technical description of how it’s built in Tableau. I’ll then provide as step-by-step tutorial on how to use my templates.

Before I move onto my first chart type, I want to take a moment to encourage you to drill into these chart types further. Part of the reason I am providing these templates is in hope that it will pique your interest in better understanding how they’re built. Taking apart complex charts is a really good way to learn some of the more advanced features of Tableau, so I highly encourage you to take some time to understand how these work under the covers.

Sankey Diagrams
We’re going to start with sankey diagrams. Datavizcatalogue.com describes these as such: “Sankey Diagrams display flows and their quantities in proportion to one another. The width of the arrows or lines are used to show their magnitude, so the bigger the arrow [or line], the larger the quantity of flow.” These charts are named after Matthew Henry Phineas Riall Sankey (that's a lot of names!!) who first used this chart type to show energy efficiency of a steam engine as shown below. (Wikipedia).


So, the key use case for sankeys is for showing flow from one thing to another. One of the best use cases I’ve seen is colleges using sankeys to show graduating students’ majors and the proportion of those students who have gone into various professions. Unfortunately, sankeys are often misused in situations where there would be a much better alternative. Sure, sankeys look cool, but it simply isn’t always the best chart type. So, before I go any further, I just want to stress the importance of considering the best chart for your use case before using a sankey. Most of the time, a sankey will not be the best chart and that, in all honesty, is a really good thing because your better choices are most likely going to be much easier to create.

Implementation in Tableau
A number of different people have implemented sankey charts in Tableau. Some approaches consist of lines drawn from one point to another, but the one I’ll be discussing is the polygon sankey developed by Olivier Catherin and Jeffrey Shaffer. Since this method was developed, it’s become one of the most commonly used methods. The chart consists of three primary components. The first two are gantt bars which represent the source and target (from and to). The third is a set of polygons, sized by proportion, which show the flow from one thing to the next. These polygons tend to take the shape of a sigmoid curve—a type of curve which takes on an S-like shape—which are created by leveraging parametric equations. Building these polygons also requires data scaffolding and some complex table calculations, which I won’t go into here.

The Template
Before I share my template, I quickly want to mention the amazing work that Ian Baldwin, of the Information Lab, recently did which shows how to build a Sankey diagram in Tableau without any data prep beforehand. It's a brilliant post and well worth the read.

This sankey template is based directly on the method detailed by Olivier Catherin—I have not added any new innovations to the method. Rather, I’ve simply tried to turn it into a plug-and-play template. So, all the credit for this work goes directly to Olivier and Jeffrey.  I'd also be remiss if I didn't credit my colleagues at Bucknell University. I first came to understand the inner workings of this chart about a year ago when I took apart some of the sankeys used internally.

The template includes two components—an Excel spreadsheet and a Tableau workbook. The goal in developing these templates was to make it as easy as possible to plug in your own data. That being said, it does make certain assumptions. Primarily, it assumes that you already have aggregated your data. Modifying the template to allow for non-aggregated data would not necessarily be difficult, but I won’t be dealing with that here.

The Excel spreadsheet (you can find it here) has two sheets, Data and Model. Model is used to handle the data densification and parameter values needed to build the sigmoid polygons using parametric equations. If you didn’t understand that sentence, don’t worry—you don’t need to modify this sheet at all; just make sure it’s in your spreadsheet. The Data sheet will be used to populate your pre-aggregated data. It contains just four columns. You can add more if needed, but these are the four that are required by the Tableau template. The columns are as follows:

Link – The purpose of this column is simply to join each row in the Model worksheet to each row in Data worksheet for the purpose of creating the polygon curves. But don’t worry too much about this. You simply need to make sure that every row has a value of “link” in this column. Note: Strictly speaking, we could use a join calculation in Tableau to join these sheets together, but for simplicity sake, I often like to include a separate column in my data set.

Step 1 – This column defines the starting point of the flow (i.e. the “from”). So, to use my example of college graduates, this would be student majors.

Step 2 – This column defines the end point of the flow (i.e. the “To”). This would be the college graduates’ professions.

Size – This column contains the aggregated value for the flow. Again using the college student example, this would be the total number of students that majored in one subject (for example, Chemistry) and ended up working in a particular field (for example, Food & Drug).

Each flow from one thing to the other will contain a separate record. Here’s how the sheet looks with some sample data:


Once you have populated the Data sheet, then you need to connect it to Tableau. Start by downloading the template Tableau workbooks, which are on my Tableau Public page. I’ve actually created two Tableau workbooks—one for horizontal sankeys and one for vertical sankeys. Then edit the data source and connect it to your Excel template. The workbook should update automatically to reflect your data. From here, you can do whatever you like with the chart—change the colors, add filters, update tooltips, etc. just as you normally would. Here’s how it looks with the sample data shown above.






And that’s all there is to it. If you use this template to create your own sankey, I’d love to see it. But please don’t create one simply because you can. Be sure to closely scrutinize your use case and make sure it’s the right chart type first. 


Update April 14, 2019: I've just posted a set of six new variations of this sankey template. For more, check out the post: More Sankey Templates: Multi-Level, Traceable, Gradient, and More!!


Ken Flerlage, April 13, 2018

68 comments:

  1. Hello,

    I can't open the twbx file in Tableau 10.2. Do you have an idea to help me please?
    Thanks a lot.
    Fred

    ReplyDelete
    Replies
    1. Here's a version in 10.2 format: https://www.amazon.com/clouddrive/share/kzUoHhPFFHmDQzQbvY4DcaCCrBzhfbud62psloc4Zwt

      Delete
  2. Thank you for this! Is there a way to add another flow to this diagram? I was able to get a sheet for the 3rd bar working but not for a second curve. Any help would be much appreciated!

    ReplyDelete
    Replies
    1. Yes, but you'd have to duplicate all of the calculated fields used to create the curve.

      Delete
    2. If I am to understand this correctly - one would need to duplicate the calculated fields. Is recreating the bars (bar3, bar4, bar5,... ) also necessary?

      Delete
    3. Yes, unfortunately, you'd need to duplicate just about all of it.

      Delete
    4. Not a problem. Would I be able to email you the work I am doing to confirm I am on the correct path?

      Delete
    5. Would you be willing to share how you modified the excel & tableau templates to accomodate additional flows? I'm trying to add 3 more flows & am having a hard time figuring out exactly what i need to duplicate. Thank you!

      Delete
    6. I actually just created a new set of templates, some of which are multi-level, so my advice would be to check those out. You can find them here: https://www.kenflerlage.com/2019/04/more-sankey-templates.html

      Delete
    7. Wow Ken, thank you SO MUCH. This is incredible & exactly what I needed!!

      Delete
  3. This saved me so much time! Thank you!

    ReplyDelete
  4. Looking forward to utilize this template!!

    ReplyDelete
  5. Thank you so much for this! I was able to show pre and post merger account locations very easily with this tool!

    ReplyDelete
  6. When I load my data in, t(Model) is colored red, which makes Sigmoid invalid, along with 5 other dependencies on Sigmoid. Path(Model) is also marked red and says it cannot be found in the database, the same as t(Model). How can I fix this issue?

    Path(Model) is also marked red and says it cannot be found in the database, the same as t(Model).

    ReplyDelete
    Replies
    1. Would you be able to send me an email with the details and perhaps a sample workbook? flerlagekr@gmail.com

      Delete
    2. Hi Ken, thanks for your response. I ended up having to right click on t(Model) and use Replace References and then did that for the Curve A Polygon as well, I think. I can't remember, but I was able to get it to work somehow.

      Alternatively, I downloaded the horizontal Sankey template again and made a separate Excel file to load in. This time there was no need for any weird fixing. It worked perfectly. I must have done something wrong the first time.

      Thank you so much for your response and this awesome template! It could not be any more convenient and does exactly everything I need, especially considering how I had no luck with any of the other tutorials I saw online.

      Delete
    3. Great. Glad you figured it out! And glad you found this useful.

      Delete
  7. This template was the most useful thing I've found yet when attempting to create a simple Sankey flow chart. I haven't got it working quite the way I want yet, but I have a very rough prototype that I can show to stakeholders to get their buy-in for more development work to be done. Thank you very much for posting these templates!

    ReplyDelete
  8. Just let you know that your templates are not working now (tableau 2019.1) - something to do with extract??? Thanks for supplying updated templates.

    ReplyDelete
  9. Ken, this article seems to be interesting but I did follow your advice by checking that informationlab article thingy. It's flawed as First step 9 seems to work but STep 10 is not working and is not producing sankey chart. Rather, it is just producing stacked bar charts. Do you know why or how? I would love to know why or how so I can progress to your article as your article is a bit advanced for me to learn first unless I try with informationlab and see how they tackle it before I take your version. Thanks for advice.

    ReplyDelete
    Replies
    1. Might be more than we can address here. Could you email me at flerlagekr@gmail.com

      Delete
  10. Hi Ken, how can i change your tableau horizontal template from polygon to just lines? I am basically looking to create a multi level decision tree and so i am not really bothered about the flow size...Will really appreciate your inputs on this..

    ReplyDelete
    Replies
    1. Hmmm, that's a tricky question. I'd probably need to know more about what you're trying to do. Any chance you could email me at flerlagekr@gmail.com?

      Delete
  11. Hi Ken,

    This is great- thank you! I was able to create a horizontal Sankey with one issue: the text in my "target" column is not displaying, no matter how large I make the display area. It is completely grey. Do you know what might be causing this?

    ReplyDelete
    Replies
    1. Can you resize the sheet to make it wider? Or change the text color? It could be a lot of things. Perhaps you could email me with further details? flerlagekr@gmail.com

      Delete
  12. Hi Ken,

    My colleague sent me this link and it seems your approach is an great starting point to learn Sankey.

    Issue: when I download your Tableau template (horizatol) and try to edit the data, it seems:

    The table "[TableauTemp].[Mode]$" does not exist.

    any idea what is the cause? This is after initial error where it ask for a newer version, which I download a trial 2019.1

    thanks ahead

    Harry

    ReplyDelete
    Replies
    1. Were you able to get it working then? If you'd like, I could downgrade it to a previous version. Could you email me at flerlagekr@gmail.com?

      Delete
  13. Nice blog.
    Really helpful template, used it and tweeted it:
    https://twitter.com/MelissaPerotti/status/1117787836168773633

    ReplyDelete
    Replies
    1. Glad it was useful, Melissa. FYI. I just posted a new set of sankey templates today, so be sure to check them out: https://www.kenflerlage.com/2019/04/more-sankey-templates.html

      Delete
  14. Hey Ken,
    The tool is very helpful to edit and get it up and running. Thank you so much.

    I want to sort my step 1 and step 2 values. I'm trying to understand the Curve A polygon field in rows on the Sankey chart but it gives me an error after clicking on edit table calculation saying "Invalid set function". Could you please help me on how to see the nested calculations under the rows "curve A polygon"?

    ReplyDelete
    Replies
    1. This kind of troubleshooting is always difficult without seeing a workbook. Any chance you could send me an email with the workbook? flerlagekr@gmail.com

      Delete
  15. Hi! I'm having some trouble attempting to modify the calculations for the polygons and then re-rendering the charts (the idea is to allow for each "arm" to have multiple parts to add an extra layer of complexity for flows from the same source:target, e.g. to differentiate between fruits/vegetables compared to total produce). I think I have an idea of how to approach this (it will require making copies of the calculated fields and calculating the maxes/mins based on an "intermediate" step as opposed to the normal step1/2). However before I get there I'm having trouble just replicating the sankey flow itself.

    Even from the original horizontal example template (I'm on 2018.3), if I try to create a new sheet and then input the same pills+marks you have on the example, I get errors in the "N* position min" and "curve a min" and "curve a polygon" calculations when I add them to the viz (says that they are missing required fields, but in the table calc options nothing is red, and I've tried adding every mark I could think of to detail to no avail). Any idea what could be wrong?

    The steps I'm following are:
    * new sheet
    * add step 1, step 2 (detail) (with sorting configuration Field/Asc/Size/Sum)
    * add min or max (detail)
    * set Marks type to Polygon
    * add Sum(path (model) (path)
    * add t (model) as dimension to columns
    * add Curve a max and curve a min (pill is red)
    * add curve a polygon to rows (pill is red)

    Any help would be greatly appreciated! Thanks for the awesome work.

    ReplyDelete
    Replies
    1. What you're trying to do sounds pretty similar to the traceable (or aggregate traceable) sankey template I documented here: https://www.kenflerlage.com/2019/04/more-sankey-templates.html. If so, check that out and see if it helps.

      I'm happy to help you get through these other issues, though it'll be difficult via these comments, so I'd suggest you email me at flerlagekr@gmail.com. Sankeys are difficult to troubleshoot, so I'll probably need you to include a packaged workbook.

      Delete
    2. Hi Ken,

      I think I'll shoot you an e-mail with some questions about re-working the calcs. In terms of recreating a chart I managed to do so only after I did the following (posting here in case anyone else faces the same issue):
      * Downloaded the newer horizontal template (the one with the adjustable whitespace)
      * Re-attempted the above, but still got the same errors for anything related to "*min" fields
      * However I noted that your "*min" calculations in the new template actually just reference the normal calc (e.g. "N1 position min" = "N1 position"), yet "N1 Position" would show up green as a mark and "N1 position min" as red (no idea what's going on there).
      * Replacing references to "N1 position min" to "N1 position" and likewise for other "*min" references seems to have solved it.

      Delete
  16. Calvin FriedrichMay 1, 2019 at 12:00 PM

    Ken,

    I have your template working with my own data and I love it! I have one question, though. Is there a way to customize the order of the Targets and Sources?

    ReplyDelete
    Replies
    1. Yes, but it is, unfortunately, a little involved because of the complex table calculations. Any chance you could send me an email with a mockup of what you're trying to do? flerlagekr@gmail.com

      Delete
  17. Hello,

    Just curious if there was a work around to the number of records as a measure - this seems to be an issue with the Ian, Olivier and Jeffrey models.

    Thank you in advance for your help.

    ReplyDelete
    Replies
    1. Not sure I understand the issue. What problem are you running into?

      Delete
    2. Hello,

      Thank you for the quick response.

      On this "The Information Lab" blog (https://www.theinformationlab.co.uk/2018/03/09/build-sankey-diagram-tableau-without-data-prep-beforehand/) there were a couple of folks who were having issues with the Sankey given that the "Chosen Measure" they selected was the Number of Units or CountD of IDs. Read every thread and there was no solution presented and this is the same issue I am having. I work for a hospital and basically we just want to use Sankey to chart the patient mix movement or change of a patient's insurance coverage year over year. i.e. Patient X has AETNA in 2016 and switched to MEDICARE in 2017. My dimensions will be the YEAR of coverage. Maybe Sankey is not the best approach?

      Thanks you for your time.

      Delete
    3. I'd need to know more about the use case to be sure if sankeys are a good option or not, but I see no reason why you wouldn't be able to use Number of Records. In fact, I just tried it with my template and it worked. Any chance you could send me an email at flerlagekr@gmail.com? I realize you can't share real data with me, but if you could mock up some fake data, then I'd be happy to show you how to do this.

      Delete
  18. Hi Ken,

    Awesome stuff! I'm really excited about using this for my data. I have a 200k row data set that this would be perfect for. How can I modify this to work with that dataset? I have 2 dimensions and a measure in mind to use, but need that level of detail for other parts of the report.

    Any advice would be greatly appreciated!

    ReplyDelete
    Replies
    1. Are you able to put the data in a structure like the Excel template?

      Delete
    2. Hi Ken,

      The data is in that same format, 2 dimension columns and a measure column. Essentially, one column is an employee level column, like vp, director, etc. and the 2nd dimension column is a 'result' column. The measure column is a count. I've added a "Link" column as well. I'm just worried that joining on link to 200k rows will inflate the data too much and cause performance issues.

      Delete
    3. Yes, that will cause the data to grow significantly. Is it possible to pre-aggregate it for the purpose of creating this specific chart? You could then create a second data source with all the detail.

      Delete
    4. The data was actually around 600k rows and I just went with it...ended up with around 60 million rows of data, but it wasn't actually that bad from a performance perspective!

      Thanks again!

      Delete
  19. Instead of replacing the data source, I believe it'd be possible to just click on the 'data source' tab at the bottom left corner and when it prompts to find the file you could just navigate to the updated excel sheet with new metadata right? Saves a bit of time making all the calculated fields again.

    This is so great and although I haven't dived in and played around with the template fully yet I do plan on testing it out with my own dataset soon. Looking forward to it, thanks Ken.

    ReplyDelete
    Replies
    1. Yes, if you have data in Excel, then you're correct. But, if your data is in a different format such as a database, then you'll have to do the replace thing.

      Delete
  20. Hi Ken,

    Thanks for this amazing template - it is exactly what I need. I have recreated everything using my data and checked all the calcs and the sorts etc. a few times but my second bar doesn't line up with the curves (the first does). Any inkling as to why that may be? Thanks, Patrycja

    ReplyDelete
  21. Never mind about my last comment - I figured out the problem. One of my table calcs for the curve polygon was directed at the wrong dimensions.

    ReplyDelete
  22. Ken,
    Can you provide details on the values portion of the template? In my data set I have assigned a percentage of contribution to each step. It is more of measure of impact at each step with the parts within each step totaling to 100%. I ended up adding up the percentages for each ID and using this for the size.

    Thanks,
    Luke

    ReplyDelete
    Replies
    1. Not sure I follow you exactly, but the Size field is meant to be used for whatever measure you're wanting to visualize. Ultimately, the workbook will sum everything up and break into part-to-whole relationships automatically, so it's not exactly necessary for you to do that yourself. But if your numbers are percentages, that should work perfectly fine.

      Delete
  23. Ken, I'm super unfamiliar with Tableu but have become familiar with your template at the top of this page, and I wanted to ask if you might be able to provide some guidance with regard to how I might be able to tie this to a graphic?

    I'm trying to show how involvement of levels of the spine change with respect to an exposure event. Have you ever seen the sankey diagram used in something like this? Thanks again!

    ReplyDelete
  24. How to highlight only the relevant sankey polygons. eg on the left there are 2 columns and on the left there are 5, the first left column only connects with 4 column on the right. So how to highlight only the 4 columns which are relevant on the right if I select the first column on the left?

    ReplyDelete
    Replies
    1. Might be difficult to address via these comments, but please feel free to send me an email at flerlagekr@gmail.com

      Delete
  25. Brilliant! Thank you!

    ReplyDelete
  26. Hi Ken--great post and this template has been a life saver [and provided some kudos to me with my stakeholders--thanks for that:)] but i have one question. When i use volume or number of say deals this works great as it is one to one--always the same deal, but if for instance I wanted to look at a Sales Open Pipeline from date 1 to date N , the values of the deals will change--Date 1 Deal A was worth 100K but by Date N it had changed to 200k etc--can the Sankey take this into account if the template is using one value field?
    Thanks again
    Greg

    ReplyDelete
    Replies
    1. This is an interesting request. Might be able to do something like that. Any chance you could email me so we can take it offline? flerlagekr@gmail.com

      Delete
  27. Hi Ken,
    Will do with an example or 2.
    Cheers
    Greg

    ReplyDelete
  28. Hi Ken,

    I went through your website on Sankey dashboard using Tableau. I would like to say thank you for your great work.

    I have implemented all the calculated fields from the template workbook in my main dashboard and used the excel sheet (model) to join with my master data set based on the common field named 'Link'.

    All works well but there is a problem as each record is getting multiplied 98 times and inflating my dataset. I already have million of records in my original dataset and with this Link join it's making the extract refresh impossible to work.

    Is there any way I can avoid multiplying my data. Please advise.

    Piyush

    ReplyDelete
    Replies
    1. Just addressed this with you via email, but I'll post an answer here as well, for the benefit of others. You'll need to find a way to aggregate your data before bringing it into Tableau. Sankeys, by their nature, are meant to be aggregated. Even if you wanted to show each individual record in the sankey, it wouldn't really be usable because the flows would be so small. So I'd advise you to aggregate it using Tableau Prep, SQL, or whatever you have available. That should reduce the size of the data enough to make it manageable.

      Delete

Powered by Blogger.