Ads Top

Creating a Sankey Funnel in Tableau


I’ve become fairy well-known in the Tableau community for my sankey templates, but everything I’ve created was based on the work of others ahead of me. In particular, Jeffrey Shaffer, who was the first to build a sankey in Tableau, and Olivier Catherin, who built upon Jeff’s work to create the polygon-based sankey that most of us use today. Of course, there are many others who have innovated with sankeys—Chris Love, Alexander Mou, Chris DeMartini, Ian Baldwin, and Merlijn Buit—but the work of Jeff and Olivier has been the basis of most of my templates. That being the case, I was thrilled to finally meet Olivier at the 2019 Tableau Conference and for all three of us to take a photo together.


Left to Right: Jeff, Olivier, Ken (Yes, it does appear that I have my eyes closed)

Olivier was even kind enough to bring us t-shirts and laptop stickers!


So, fresh off of spending some time with two fellow sankey-lovers, what better blog to write than one about a new type of sankey—what I’m calling a sankey funnel.

Sankey Funnel
So what is a sankey funnel? The idea first came from Terry Dehart and the Progress Bible team. He had used my multi-level sankey template to build something like this (I’m using some sample lead generation data, but the concept was the same).


While this wasn’t bad, what he really wanted was for the flows to end at appropriate spots. For example, after Rejection Reasons A, B, and C, the lead is dead, so there is no need to flow to the next step. So, ideally, we’d have something more like this:


The removes the visual clutter of the first version, making it much easier for us to focus on what’s happening as we flow from left to right. I’ve chosen the name “Sankey Funnel” because this sankey works in a similar fashion to a funnel chart, while allowing for a bit more detail about each phase.

After working with Terry a bit, we were able to develop a method to hide the unnecessary flows and both he and I were happy. But since that time, I’ve gotten a number of additional requests for sankeys with this capability, so I decided to provide a new template, as well as explain how you can add this functionality to sankeys built using my prior templates.

Sankey Funnel Template
If you’d just like to plug-and-play your data, you can use my template. The Tableau template can be found here: Sankey Funnel Template. And the Excel data file is the same as used in my multi-level sankey template (the link above takes you to a list of files—you’ll want the one called Sankey Template Multi Level.xlsx). The key difference in how you use this template is that some of your “Step” values will be NULL. For instance, the data for the sankey shown above looks like this:


Notice that there are numerous blanks in the file. So, when you want the flow to stop (for example, after Rejection Reasons A, B, and C), you just leave the rest of the steps NULL (blank in Excel). Calculations within the Tableau template will then take care of the rest for you. Other than this, the process for using the template is exactly the same as the multi-level template (See the How to Use The Templates section of the multi-level sankey blog).

I should also note that your NULLs can appear at the beginning as well. By doing this, you can have new values added into the flow in the middle of the process. For example, you could create something like this:


Adjusting an Existing Multi-Level Sankey
If you’ve already used my multi-level sankey template but want to apply this technique to it, you have a couple of options. First, you could put your data into the template as explained above. If your sankey is relatively simple and you haven’t built up a lot of extra functionality around it, then this might be the best approach. But, if you feel it will be too much work to create from scratch, the following steps will guide you through updating your calculated fields to hide the NULL flows. Before doing that, I need to share some of the underlying details of the calcs used in the workbook:

  • Each “Bar” sheet is driven by a calculated field called N1 Bar Position (for the first bar), N2 Bar Position (for the second), etc.
  • These calculations are based on additional “helper” calculations, which are all then grouped together in one of the Bar folders.
  • Each “Curve” sheet is driven by a calculated field called Curve 1-2 Polygon, Curve 2-3 Polygon, and so on.
  • Like the bar calculations, these also have helper calculations and are grouped together in one of the Curve folders.
  • Since the N Bar Position and Curve Polygon calculations drive the plotting of the bars and curves, our goal is to modify these such that, when the value is NULL, we return NULL so that nothing is plotted.
  • Because the curves connect two bars, we need to check both the starting point and the ending point for NULLs.
  • Finally, my approach accounts for both NULL values and empty strings (“”). Technically, these are two different things, but I’ve included both to reduce any confusion.


With the above out of the way, here’s how you can modify your calculations to account for NULLs:

1) Modify all N Bar Position calculations, adding an IF statement to check for NULLs. For example, N2 Bar Position will look like this:


Note: The statement after the ELSE is the same as what is already there in the template, so no changes are required except wrapping it in the IF/ELSE statement.

The above changes will need to be made to all N Bar Position calculated fields, swapping out Step 2 for the appropriate step.

2) Make similar changes to the Curve Polygon calculated fields. For example, Curve 2-3 Polygon should become.


Like the Bar Position calcs, the only change here is to add the IF/ELSE statement. It is, however, critical that the IF statement check for NULLs/blanks in both the source step (e.g. Step 2) and the target step (e.g. Step 3). Similar changes must be made to all of the Curve Polygon calcs.


As always, thanks so much for reading! If you have any thoughts or comments, please leave them in the comments section below.

Ken Flerlage, December 1, 2019


2 comments:

  1. Hi Ken, thanks a lot for this post--it's tremendously helpful! I am trying (with moderate success) to adapt your approach and your templates to my own dataset. I have 3 problems in particular, and would appreciate any tips you may have:

    1) I have 7 Steps in my Sankey Funnel, and have successfully created sheets for Bars 6 and 7. I can't do the same for the Curve 5-6 and Curve 6-7 sheets, as 'The table calculation requires a field that is missing'. I have updated the Colour pills and the Curve Polygon table calculation to compute using Steps 5 and 6, and 6 and 7 respectively, as I did in the Bars sheets, but this doesn't fix the problem.

    2) In a similar way to how there are rows exiting the funnel at each step (like 'Rejection Reason A' and 'Not Qualified Reason A' in your example), I am also trying to have rows joining midway along the funnel (i.e., new customers being introduced in Step 3). In my dataset, these rows currently have nulls for Steps 1 and 2 and non-null values from Step 3 onward.

    3) Is there a way to have one box at the final stage of the funnel 'holding' or aggregating all the customers at each stage lost through the funnel by bypassing the intermediary stages? Put differently, if I have 60 customers dropping out at Step 2, 40 at Step 3 and lose no further customers in the funnel, I'd like a box with those 100 customers without them having to move through red boxes at Step 4, Step 5, Step 6 and Step 7.

    Thanks in advance!
    Keith

    ReplyDelete
    Replies
    1. Hey Keith. Any way you could shoot me an email? These comments just don't allow enough flexibility. My email address is flerlagekr@gmail.com

      Delete

Powered by Blogger.