Introduction to Tableau: Section 1
by Wyatt "Miller" Miller
Introduction and Installation of Tableau Public
Introduction
Imagine if you took all the data imagery features of Excel, constrained and optimized the input for strict data analysis, and zhushed up the graphics. Say hello to Tableau - an industry leading business intelligence tool that focuses on data visualization and discovery. It makes your data very pretty, very quickly, and maintains the ability to interface with server and big data setups (at least with the paid Desktop version).
Tableau Public is free as of the date of this article.
Tableau Public has several limitations:
- The types of files you can import are limited (.csv, Excel, and Access files import just fine)
- No private database connectivity (all of your data is public)
- No saving of workbooks locally
- No automation (e.g. automated reporting)
- No security
- Limited to 10 million rows of data on any single connection
- A single account holder is limited to 10GB of public server storage
These limitations exist to encourage businesses to purchase Tableau flagship products, however they should not impact individual users wishing to practice data analysis.
Installation of Tableau Public
Perform the following steps to install Tableau Public:
- Go to the Tableau Public Website (https://public.tableau.com/s/).
- Enter a valid email address and click "Download the App."
- The appropriate file for your operating system and CPU architecture should download automatically.
- Proceed with installation normally.
Creating an account is not necessary for installation of Tableau Public at this time, however it is necessary to save a worksheet (which saves to the Tableau Public server).
Getting Data and Preparing Data Sets for Analysis in Tableau
Getting Data
Tableau offers a number of data resources, as well as instructional videos. Proceed to the Tableau Resources website (https://public.tableau.com/en-us/s/resources) and click the "Sample Data" tab to view available downloads.
The Preview Pane
Tableau "connects" to the data file; a read-only operation that ensures Tableau will not edit the data. To connect to a data set:
- Open Tableau
- Choose the data file type from the selection pane on the left
- Navigate and connect to the file.
A valid data set can also be drag-and-dropped into the workspace after opening a fresh instance of Tableau.
If importing an Excel spreadsheet with one worksheet, the data should automatically appear in the preview pane. If importing an Excel spreadsheet with multiple worksheets, click and drag the desired worksheet to the workspace from the left pane. The goal of this step is to have a preview of the desired data appear in the in the summary pane at the bottom.
Prepping the Data (Basic)
Data is imported semi-naively: Tableau will attempt to name and classify data columns on intake, but will not natively apply cleaning functions. The Data Interpreter is Tableau's automated data cleanup tool, which is activated by a checkbox to the left of the preview pane. Immediately after import, check the following:
- Column header names are imported correctly
- The Data Interpreter can correct this if the names of the columns appear in the data, or they can be changed manually
- Column datatypes are imported correctly
- The Data Interpreter can correct this if column data types can be ascertained correctly, or they can be changed manually by clicking on the data type icon:
- [Globe Icon] - indicates a geographic variable
- Abc - indicates character data
- # - indicates numerical data
- Many more...
- The Data Interpreter can correct this if column data types can be ascertained correctly, or they can be changed manually by clicking on the data type icon:
- Data is generally consistent
- Arguably the largest can of worms - see "Prepping the Data (Advanced)" below. The Data Interpreter is relatively flexible; it will attempt to handle:
- Stripping out inconsistent data (e.g. column headers that were not properly parsed on intake)
- Stripping out blank observations (rows)
- Extracting sub-tables from within the same worksheet (will appear in the worksheet table to the left of the preview pane)
- Arguably the largest can of worms - see "Prepping the Data (Advanced)" below. The Data Interpreter is relatively flexible; it will attempt to handle:
- NULL responses are typically handled well by Tableau - just make sure these responses should actually beNULL (observe the data and check consistency)
NOTE: The Data Interpreter does NOT work on text (.csv) files as of version 10.5.2. You will receive the following error, "An error occurred while communicating with data source, '<my_file>'." In the details, you will see "Unable to establish connection: Data source 'Text file' has not been licensed." To correct this, simply save the .csv as an Excel file (.xls or .xlsx) and import.
Prepping the Data (Advanced)
A raw database pull will likely produce a well-prepared data set, assuming the database uses any type of normalizing methodology on data collection (e.g. 3NF or BCNF) - it is exceptionally rare for a database to not use some type of normalizing methodology.
However, if the data source is not clean (e.g. a government PDF designed for human consumption) or the data is organized in a non-concise manner (e.g. across several tables), Tableau is capable of several advanced data organization techniques. While these techniques are beyond the scope of this article, descriptions and resources are included below:
Joins
A classic data table join on a primary key. This is used when multiple tables contain observational data for the same observation (multiple rows for the same observation, typically across multiple tables).
Unions
Appending one data table to another. This is used when multiple columns have the same header.
Split
Splits a column into two columns. Tableau will automatically attempt to extract varying data types within the column (e.g. month and year), or custom splits can be performed.
Pivot
Note: Tableau pivot methodology operates oppositely to Excel's pivot table usage.
Pivots are useful when you need to transform from a wide-format table to a long-format table. Typically used when data has been collected using variable data as headers (e.g. collecting observations in columns labelled "2000, 2001, ..." instead of a single column "year"). The key question to ask yourself to determine if you need to pivot is, "does this data change the number of columns as data is collected?" If yes, then a pivot is likely called for.
The video in the tutorial link below provides an excellent example of usage.