Becoming a Tableau Desktop Specialist – Part 7: Apply analytics to a worksheet
Oct 24, 2019Christine Rietmann
Hi, everyone! This week we have a lot of stuff to learn. Our topic ‘Apply analytics to a worksheet’ will deal with sort options, reference & trend lines, table calculations, bins & histograms, calculated fields, and parameters. For all these topics there still exist good blog posts. So, I hope I can give you a good overview, but I also will refer to other sources to go into detail or to look at more examples.
Check out the further posts of this series to learn more about it:
Tableau offers a lot of different sort-options which can help you to lend more expression to your insights, to structure your visualization, and to just make things clearer. The most common ways to sort your data are sorting alphabetically or by values, but you can also sort your data manually. Especially when you have a complex visualization that includes more than one dimension and measure, your filter-options become more complex and include also nested sorting.
Let’s look where to find the different options.
One of the easiest ways to sort your view is to use the symbols from the toolbar. If you have more than one dimension in your view, you should know that Tableau will sort your data by the deepest dimension by default. (You can handle this by first selecting the dimension which you want to sort.) For the following example, the category is the deepest dimension and it is sorted ascending.
The same schema applies to measures. If you have more than one measure in your view, Tableau will sort your data based on the first (or left) measure in your view.
When hovering over your axes, headers or field labels a small icon for sorting appears. This offers you a quick sorting, but it is important to understand how Tableau sorts the view. In all cases, one click sorts ascending, two clicks sort descending and three clicks clear the sort.
Look here to find well-explained examples.
You can also sort your viz manually. Legends and headers can be sorted by dragging and dropping. Keep in mind that if you change your order in the legend you also change the position of the marks. To sort your data manually you drag a header and drop it into the right place. A black line will show you the place where your header will be placed.
The sort menu allows defining your sorting. It is only available for discrete dimensions or measures. To open the sort menu just right-click the data field and choose ‘Sort’.
Now the following window appears:
Then you can choose between several options.
I think the most options are self-explanatory, only the option ‘Sort By Nested’ was totally new to me. Nested means that you sort every single pane and only the value for this pane is crucial for the sorting order. In return non-nested means that you look at the overall value for the dimension. The same source as above includes a good example.
The generic term ‘reference lines’ includes bands, distributions, box plots and reference lines itself. All these analytic instruments can help you to highlight a value or a range of values.
To add a reference line you must switch from the data pane to the analytics pane.
Now you can drag the field into the view and set the scope of the item. Let me explain this by a quick example.
Imagine you want your reference line to show the average of the sales. Then you can, for instance, choose if the average should be computed over all years (Table) or for every single year (Pane). You could even compute the Average for every region and every year (Cell – which wouldn’t make sense in this case).
If your view includes more than one measure you also have to choose on which measure the computation should base.
Now you have a lot of options to define your reference line. I think the options require no explanation, but if you want to look them up you can do it here.
The difference between the types is exactly what their names suggest.
The reference line is a single line, which can be computed or set as a constant.
Reference bands shade an area behind the marks in the view again based on constants or computed values.
Reference distributions add a gradient of shading to indicate the distribution of values along the axis. Distribution can be defined by percentages, percentiles, quantiles or standard deviation.
A box plot is a standardized model to show distribution. This blog post provides a detailed explanation of box plots.
Often we do not just want to analyze our data, but to get useful information to make decisions for the future. Trend lines show whether the values of one variable can predict the values of another variable. In other words, we try to find out whether one variable has an impact on another variable. So, trend lines are statistical regression models. In Tableau, we have five different types of trend lines. For learning more about the different types look here or on the Tableau website.
To explain how to add a trend line I want to show the relationship between sales and profit per month. I assume that if we sell more products in a month we will earn more profit.
2. Select trend lines from the analytic pane.
3. Drag the trend line into the view and chose a type by dropping it on the chosen field.
The next step is to check how ‘good’ your trend line fits your data and how meaningful and reliable the information you get actually is. When you hovering above your trend line you get information about the regression equation, the P-value, and the R-Squared.
More information is available when you right-click on the Trend Line and say ‘Describe Trend Line’ or ‘Describe Trend Model’.
An established limiting value for the p-value is 5%. So, in general we conclude that if the p-value < = 5% the alternative hypothesis is significant. The R-squared describes model accuracy and can take any value between 0 and 1. A value of one would indicate a perfect linear correlation.
Take these rules with care and inform yourself about the statistical background before you conclude!
To edit your Trend Line right-click on the Trend Line and then chose ‘Edit Trend Lines’
Table calculations compute values within your view. So, your computation will be based on the fields in your view. This is contrary to a calculated field which is based on the whole data set. Table calculations are predefined calculations that give you the power to create advanced computations without knowing the underlying syntax. In Tableau, we can use Quick Table calculations or define our computations using a table calculation menu.
To be honest I don’t feel competent enough yet to explain how table calculation works. This is why I would like to recommend this tutorial from Andy Kriebel. It has helped me a lot to understand how table calculations work and how I could start to work with them without just experimenting.
To get familiar with all the computation options I can recommend the visualization from Lilach Manheim which you can find here.
Other sources I read:
We can easily create bins from a continuous measure. Therefore, you right-click on the measure and choose ‘Create’ and then ‘Bins’.
The following window will open where you can define the size of your bins. Tableau always suggests an optimal bin size, based on Min, Max, Difference between min and max and the number of values.
Your bins appear as a new dimension on the data pane.
Bins are set as a dimension by default because the values are now limited and discrete. Nevertheless, you can convert this discrete dimension into a continuous one. This is what you need if you want to create a histogram.
As I already mentioned the first step to create a histogram is to convert your bin dimension into a continuous dimension.
The icon which symbolized your bins will change the color from blue to green (you remember, right 😁). After that, create the following view:
The last step is to change the aggregation of your measure ‘profit’ from SUM to COUNT.
Your view will update and then look like this:
If you are interested in creating histograms I can recommend the two following articles:
Often our data set doesn’t provide all the data fields we need. Calculated fields give us the opportunity to create new data fields based on our data without changing the underlying data. They can include the following elements (I found the explanation of the elements here):
Fields: These consist of the existing fields in your data source and can include other calculated fields and sets. In formulas, fields appear in orange text and in square brackets.
Functions: These are built-in operations that can be performed on data, such as COUNT, DATETIME, SUM, FIND, and DAY. Functions appear as light blue text in formulas.
Operators: These include the standard mathematical operators like +, –, and *. Operators show up as black text in formulas.
Parameters: These are placeholder variables that you can use in a formula so that the actual value is specified at runtime. Parameters appear as purple text and also in square brackets in formulas.
Comments: You can add comments to your formulas to provide a level of documentation. Comments show up as grey text in formulas. Items you enter after two forward slash marks are considered comments.
Let’s look at how to create a calculated field. Click on the small black arrow in your data pane on the right side next to the header ‘dimension’ and say ‘create calculated field’.
Now you can enter a new name and start writing down your formula. An often used and easy example is the profit ratio. It will look like this:
Next to the typing field, you can see a list of available functions. When you select a function the grey space next to the list will show an explanation of the function.
After clicking ‘OK’ the new field will appear on the data pane. It shows in front of the icon an equal sign. So, all calculated fields have this equal sign to differentiate from the data fields from the data source.
Parameters generate interactivity and flexibility. They allow the end-user to interact with the report/visualization. I think the most common way is to use parameters in calculated fields (to replace constant values), but you can also use them in filters or reference lines.
For now, let us focus on parameters that add interactivity to your dashboard. Imagine you want to let your dashboard users decide whether to show sales or profit by Sub-Category.
The following steps are necessary.
The window provides serveral options to define your parameter. This website explains all the different editing options.
2. Create a new calculated field using the parameter:
3. Drag the calculated field into the view
4. Turn on the parameter control
5. Interact with your visualization
I hope you enjoyed reading. I am happy about your feedback, you can reach out to me on twitter or in the comments section below!