Ads Top

4 Ways to Connect Tableau to MongoDB


MongoDB is one of the most popular of a new breed of “NoSQL” databases. NoSQL databases are essentially the opposite of relational databases such as Oracle, SQL Server, MySQL, PostgreSQL, etc. They allow for the storage of unstructured and semi-structured data as well as the ability to maintain flexible schemas.

Here are a few key things to know about MongoDB:


  • Focuses on the storage of “documents” (as opposed to graph databases or other types of NoSQL databases).  
  • Data is stored in JSON format (technically, they store data in a binary representation of JSON they call BSON).
  • Built with developers in mind, so it has lots of tools, APIs, and drivers to meet the needs of virtually any developer. 
  • Because it does not require the creation of rigidly-defined schemas, it provides developers with lots of flexibility. The focus can be shifted from “schema-on-write” to “schema-on-read.” This creates much more agility for developers. 
  • Built with a distributed architecture so it is highly available, scalable, durable, and reliable out-of-the-box. 
  • It is open source, though MongoDB also sells licensed enterprise versions (more on that later).


Because of the above, MongoDB is a fantastic generalized database for any type of data from unstructured and structured data (and everything in between). It makes a great platform for small to medium data lakes. And it is one of the top choices of database for many modern developers.

Connecting Tableau to MongoDB
With the continued growth and popularity of the MongoDB platform, we as analytics professionals, will likely cross paths with it at some point and will need to connect Tableau (or other BI tools) to it. The important thing to remember here is that Tableau expects data to come in a relational format—tables with columns and rows that are related to other similarly structured tables. But MongoDB is not relational, so that immediately creates some challenges. Some restructuring of the data will inevitably be necessary for it to be consumable in Tableau. Thus, connecting to MongoDB is a bit trickier than connecting to more commonly used relational databases.

In this blog, I’m going to provide you with 4 options for connecting Tableau to your MongoDB data. For my examples, I’ll be using an instance of MongoDB Atlas, Mongo’s cloud database-as-a-service offering. You can connect to it using the following connection details:

Cluster Name: Mongo-shard-0/mongo-shard-00-00-pw3el.mongodb.net:27017,mongo-shard-00-01-pw3el.mongodb.net:27017,mongo-shard-00-02-pw3el.mongodb.net:27017

Username: nosql
Password: nosql

The cluster has a number of sample databases installed already, one of which is similar to Tableau’s Superstore, called sample_supplies; it contains a single collection called sales, which I’ll be using in the examples below.

Note: What we call Tables in relational databases are called Collections in MongoDB. Rows or Records are known as Documents.

Option 1: Export Data as JSON
The first option is to export data out of MongoDB into JSON files, then leverage’s Tableau’s native JSON connector. To do this, we’ll need to do the following:

1) On the computer running MongoDB, open a command line. If you cannot connect to the computer, as in the case with our Atlas instance, you’ll need to install the MongoDB utilities on your computer. To do this, download MongoDB and choose the custom install option. Deselect all options except for the Import/Export tools.

2) Open a command line and navigate to the directory in which the utilities were installed (on Windows, it should be something like this: C:\Program Files\MongoDB\Server\4.0\bin).

3) Run mongoexport using the following syntax (red text indicates that we need to plug in our own values here):


mongoexport --host <Cluster or Host Name> --ssl --username <Username> --password <Password> --authenticationDatabase admin --db <Database Name> --collection <Collection Name> --type json --out <Output JSON File Name>

To export the sales collection from the sample_supplies database in our Atlas cluster, use the following command:


mongoexport --host Mongo-shard-0/mongo-shard-00-00-pw3el.mongodb.net:27017,mongo-shard-00-01-pw3el.mongodb.net:27017,mongo-shard-00-02-pw3el.mongodb.net:27017 --ssl --username nosql --password nosql --authenticationDatabase admin --db sample_supplies --collection sales --type json --out c:\sales.json

4) Use Tableau’s JSON file connector and select the json file. Since the JSON file is not structured in a relational format, Tableau will prompt you to select which “schemas” you wish to include.


Once you select the schemas, Tableau will flatten the data, repeating fields in multiple rows, as shown below.


This is somewhat similar to if you were to join a Sales table to a Customer table. That customer’s data would then be repeated for each row.

5) Create your visualization.


Note: I’ve used the recommendations in Ryan Sleeper’s fabulous blog, 3 Ways to Make Beautiful Bar Charts in Tableau to create this bar chart. I’d highly recommend his suggestions to give your bar charts some extra appeal.

There are, of course, some problems with this method. First of all, it’s very manual—it requires you to export each collection into JSON files every time you want to update your visualization. This could, of course, be addressed by creating some automated process that exports the collections to JSON files on a regular basis, but that could prove to be a lot of work. Even with automation, however, performance could be a problem—instead of leveraging the horsepower built into the database, you are now reliant on a file sitting on a hard drive, which will not perform as well as a direct connection to the database. Finally, the JSON file connector is pretty limited. You cannot join multiple JSON files to each other and can’t even join to other types of files or databases, so you may struggle to piece together a data model that has the flexibility you need.

Option 2: Third Party ODBC
Multiple third-party providers have created ODBC drivers for MongoDB, including Simba, Progress, DevArt, CData, EasySoft, and others. Here are the steps for leveraging a third-party ODBC driver (I’ll be using the Simba driver but most are pretty similar):

1) Download and install the driver. In general, the installs for these drivers tend to be very straightforward.
2) Set up your ODBC data source. Here’s how I set up my connection to the MongoDB Atlas cluster using the Simba driver:


Note: I also opened the SSL settings and checked the “Enable SSL” box.

3) Use Tableau’s ODBC connector to connect to the new ODBC data source.


Notice that the driver automatically breaks the collection into multiple tables. This is because the driver must translate the JSON data into a relational format. This requires normalization of the collection into multiple tables. By default, the drivers will automatically perform this normalization, but some of them, including Simba, provide you with additional tools which allow you to modify the automatically-generated schemas.

4) Create your data model.


5) Create your visualization.

This method also has some drawbacks. For one, the third-party driver is just another tool you have to maintain. In addition, these drivers are not free, so there will be some cost involved. I’ve also found that these drivers can have mixed results, in terms of both the table normalization process and performance, especially when dealing with more complex JSON structures.

Option 3: The BI Connector
MongoDB offers a “Connector for Business Intelligence” which essentially flattens your MongoDB data, in real-time, so you can connect Tableau (and other tools) directly to the database. This connector is a separate install, which can be set up on one of the MongoDB servers or another server (it does not have to be installed on each user’s client or the Tableau Server). In the case of Atlas, it is built into the platform.

The connector leverages the MySQL wire protocol to translate MongoDB’s JSON structure into a flattened relational structure. Yes—I said “MySQL”—the end result of the connector is a sort of virtual MySQL database, which you can interact with just like any other MySQL database. But, because MongoDB collections can contain unstructured or semi-structured, a single collection may result in multiple tables, just like the third-party ODBC drivers.

Tableau has a native connector for the BI Connector, which leverages MongoDB’s own custom-built MySQL driver. You can, of course, also use the MySQL connector, but the BI connector makes it a bit easier to connect, among other advantages.

This is, in my opinion, the best way to consume your MongoDB data in Tableau as it is an automated, real-time solution that does not require any manual intervention or use of third-party products. But here’s the rub—for Production workloads, the BI connector is only available via MongoDB’s enterprise offerings, Enterprise Advanced and Atlas Enterprise Advanced. That said, if you have the budget available, then I highly recommend this option.

A quick note on custom SQL. You can use custom SQL with the BI Connector, but it does not support everything you may be accustomed to. So, if you need to write custom SQL, do your best to keep your SQL as simple as possible. Here's a reference of Supported SQL Functions and Operators.

Option 4: Create a Data Pipeline
A final option is to create a data pipeline or ETL workflow that moves data out of MongoDB and into some more structured database, such as a data warehouse. There are lots of options available for programmatically extracting data so you can then restructure and write it to another database platform. Or, if you do not want to write code, you can leverage either third-party ODBC or the BI Connector within an ETL workflow.

You might be asking why, if we’re moving data out of it for analysis, we’re even storing data in MongoDB in the first place. But remember, we’re only part of the puzzle here. As noted previously, MongoDB is great for application developers. For example, one of the huge advantages for them is that they can write data to the database in a loosely structured manner without spending too much time focused on pre-defining a structured schema. This can be a huge timesaver! So, even though it may be a bit more difficult for us to get data out, it has many benefits that can often outweigh these drawbacks.

Wrap-Up
I love MongoDB. It’s a fantastic NoSQL platform that provides a ton of flexibility for developers and lots of great features for the rest of us. I’m hopeful that the tips I’ve provided here help you to better understand how the platform works and allows you a few different options for consuming its data within Tableau. Thanks for reading!!

Ken Flerlage, July 21, 2019

No comments:

Powered by Blogger.