Getting Data and Preparing Data Sets for Analysis in Tableau
by Wyatt "Miller" Miller
Getting Data
Tableau offers a number of data resources, as well as instructional videos. Proceed to the Tableau Resources website (https://public.tableau.com/en-us/s/resources) and click the "Sample Data" tab to view available downloads.
The Preview Pane
Tableau "connects" to the data file; a read-only operation that ensures Tableau will not edit the data. To connect to a data set:
- Open Tableau
- Choose the data file type from the selection pane on the left
- Navigate and connect to the file.
A valid data set can also be drag-and-dropped into the workspace after opening a fresh instance of Tableau.
If importing an Excel spreadsheet with one worksheet, the data should automatically appear in the preview pane. If importing an Excel spreadsheet with multiple worksheets, click and drag the desired worksheet to the workspace from the left pane. The goal of this step is to have a preview of the desired data appear in the in the summary pane at the bottom.
Prepping the Data (Basic)
Data is imported semi-naively: Tableau will attempt to name and classify data columns on intake, but will not natively apply cleaning functions. The Data Interpreter is Tableau's automated data cleanup tool, which is activated by a checkbox to the left of the preview pane. Immediately after import, check the following:
- Column header names are imported correctly
- The Data Interpreter can correct this if the names of the columns appear in the data, or they can be changed manually
- Column datatypes are imported correctly
- The Data Interpreter can correct this if column data types can be ascertained correctly, or they can be changed manually by clicking on the data type icon:
- Globe - indicates a geographic variable
- Abc - indicates character data
- # - indicates numerical data
- Many more...
- The Data Interpreter can correct this if column data types can be ascertained correctly, or they can be changed manually by clicking on the data type icon:
- Data is generally consistent
- Arguably the largest can of worms - see "Prepping the Data (Advanced)" below. The Data Interpreter is relatively flexible; it will attempt to handle:
- Stripping out inconsistent data (e.g. column headers that were not properly parsed on intake)
- Stripping out blank observations (rows)
- Extracting sub-tables from within the same worksheet (will appear in the worksheet table to the left of the preview pane)
- Arguably the largest can of worms - see "Prepping the Data (Advanced)" below. The Data Interpreter is relatively flexible; it will attempt to handle:
- NULL responses are typically handled well by Tableau - just make sure these responses should actually be NULL (observe the data and check consistency)
NOTE: The Data Interpreter does NOT work on text (.csv) files as of version 10.5.2. You will receive the following error, "An error occurred while communicating with data source, '<my_file>'." In the details, you will see "Unable to establish connection: Data source 'Text file' has not been licensed." To correct this, simply save the .csv as an Excel file (.xls or .xlsx) and import.
Prepping the Data (Advanced)
A raw database pull will likely produce a well-prepared data set, assuming the database uses any type of normalizing methodology on data collection (e.g. 3NF or BCNF) - it is exceptionally rare for a database to not use some type of normalizing methodology.
However, if the data source is not clean (e.g. a government PDF designed for human consumption) or the data is organized in a non-concise manner (e.g. across several tables), Tableau is capable of several advanced data organization techniques. While these techniques are beyond the scope of this article, descriptions and resources are included below:
Joins
A classic data table join on a primary key. This is used when multiple tables contain observational data for the same observation (multiple rows for the same observation, typically across multiple tables).
Unions
Appending one data table to another. This is used when multiple columns have the same header.
Split
Splits a column into two columns. Tableau will automatically attempt to extract varying data types within the column (e.g. month and year), or custom splits can be performed.
Pivot
Note: Tableau pivot methodology operates oppositely to Excel's pivot table usage.
Pivots are useful when you need to transform from a wide-format table to a long-format table. Typically used when data has been collected using variable data as headers (e.g. collecting observations in columns labelled "2000, 2001, ..." instead of a single column "year"). The key question to ask yourself to determine if you need to pivot is, "does this data change the number of columns as data is collected?" If yes, then a pivot is likely called for.
The video in the tutorial link below provides an excellent example of usage.
Summary
- Install Tableau
- Download and connect to your data set
- Prepare the data (pop on the Data Interpreter)
- Ensure the column header names are imported correctly
- Ensure the column datatypes are imported correctly
- Ensure the data is generally consistent
- Perform advanced preparation techniques if necessary
- Joins
- Unions
- Splits
- Pivots