Scatter Plots & Reference Lines/Bands
Jul 26, 2018Klaus Schulte
Some weeks ago this Tweet by Rody Zakovich caught my eyes:
— Rody Zakovich (@RodyZakovich) 31. Mai 2018
Rody also described the how-to in his Tweet:
Based on this (and the further discussion in the Tweet) I was able to recreate Rody’s magic in my #makeovermonday week 29 viz on average salaries by team in the NBA (click for an interactive version) with its cool reference bands:
Looks nice. Thanks Rody 😊!
Scatter plots can be described like this (source: Wikipedia):
A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are color-coded, one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.
In Tableau (and as well in Excel or in other BI tools) it is very easy to add a forth and fifth variable by sizing the bubbles or using different shapes for the objects in the scatter plot.
Beside the data, reference lines are often used in scatter plots to build quadrants that allow an easier interpretation of the data or to formulate strategies like in the famous Boston Consulting Group analysis. As an example I have used them this way in my #vizforsocialgood visualization for the UN SDG Action campaign (click for an interactive version).
Mathematically spoken a horizontal reference line is displaying all x-values for a given y and a vertical reference line is displaying all y-values for a given x. You can add them easily in Tableau by just bringing them in from the analytics pane and setting the constants.
In Rody’s or in my NBA salaries viz the white borders between the colored bands (I’ll come to these bands later) are nothing less than reference lines as a given result of a function of both variables. Rody divides career points by games and I divide salary by team by number of players.
To give an example the upper line in my viz is showing all combinations of total salary by team and number of players that lead to an average salary by team of 5 Mio. $, the lower line all combinations to get an average salary by team of 4 Mio. $.
For my #makeovermonday week 30 viz on paid maternity leave I wanted to try another calculation. The data provided allowed me to create this scatter plot with ‘curvey’ reference lines displaying full paid weeks of maternity leave as results of a multiplication of a total avg payment rate and total weeks.
To draw each reference line I created data in Excel. I thought 100 points would be enough points to get a nice curvey line. As an example, look at this data for the ’10 full paid weeks’ line:
Like Rody in his viz I brought the reference lines data to Tableau, exported the reference viz as an image (worksheet->export->image) and added this image in my scatter plot as a background image. Make sure that you set identical start and end values in your original viz and in your ‘reference viz’ when fixing the axis.
To sum up: reference lines that display a given result of a function of both variables can be used in a scatter plot under the premise that a function of both variables does make sense (it wouldn’t for example in the above mentioned BCG analysis).
To create reference bands out of my reference lines I just doubled the reference lines data except for the first and the last reference line. Furthermore I had to flip the point order in the copied data and to build a joined point order from 1 to 200 to create my bands as polygons.
Then I went through the same steps (building the reference viz, exporting it as an image and adding it to my viz as a background image).
After some formatting it finally looked like this:
I hope this blog post will help you reengineering my viz and building own ‘advanced’ reference lines!