Manifesto of Internal Data Densification
Sep 5, 2019
Klaus Schulteby Klaus Schulte & Rosario Gauna
Data Densification – you will have already come across this topic, when you have gone beyond ‘Show Me’ in Tableau:
For all these mentioned and unmentioned use cases, you have used Data Densification to enrich the original data by adding extra points.
There are a lot of good blogs around describing the techniques that can be used for Data Densification. We can highly recommend for example Ken Flerlage’s video “An Introduction to Data Densification” as part of the #ThinkDataTuesday campaign by Tableau and the accompanied blogpost. Tableau greats like Jeffrey Shaffer, Jonathan Drummey, and Joe Mako have also talked and written about the topic.
In Ludovic Tavernier’s and my (Klaus) session Surprise Me – Creating Advanced and Unique Charts in Tableau at Tableau Conference Europe in Berlin earlier this year, Ludovic explained the concept of Data Densification with this slide:
Both techniques perform Data Densification on row level. The main advantage is that you could drill down to row level in your visualization and the design would still work. The biggest disadvantage is, obviously, that you have to duplicate your data. When you already have a ton of data, you probably don’t want to do this.
But: Do we really have to? Let’s try something different…
The common techniques of Data Densification allow users to drill down to row level. There are for sure use cases, when you visualize row level data, for example when the data is pre-aggregated. We have seen this for example in many #makeovermonday datasets.
But in many, if not even in most (real world) visualizations, we visualize aggregations, often based on transactional data, like practiced many times with Tableau Superstore. And this is the key to a technique, that we have named Internal Data Densification.
While Data Densification densifies and multiplies the data by a factor n using a model dataset, Internal Data Densification creates a densificator from the original data and splits up the data into n partitions (these partitions do not necessarily have the same number of records). E. g. for n=2 we would double the data with Data Densification and build two (smaller) partitions with Internal Data Densification.
This can be done based on any dimension or measure in the original data, that has at least two distinct (for doubling the data) members, and that’s why this technique only works for aggregations.
This is important: Internal Data Densification is based on the assumption, that you always have enough members for the desired number of “copies” in your densification dimension for the level of detail in your view.
Let’s come back to this assumption after introducing the techniques.
Let’s give an example with Tableau Superstore data and try to “double” the data with Internal Data Densification. The visualization we have in mind is to show category sales 2018 by month.
EXAMPLE 1
Using [Region] as densificator with an IF-Statement:
[Region_Dens] = IF [Region]="West" THEN 1 ELSE 2 END
Creating a view with MONTH(OrderDate) on columns, Category on rows and COUNTD(Region_dens) on label shows us, whether this densificator is valid:
Yes, we have two distinct values (1, and 2) in each period for each category.
Would it also work for Sub-Category?
No, we have periods with only one value (e. g. region West or one of the other regions) and some periods without any sales.
EXAMPLE 2
Using [Order Date] as densificator with Modulus:
[OrderDate_Dens] = DATEPART('day',[Order Date]) % 2
Modulus calculates a remainder. In this case, the formula above is calculating a “0” for even days (e. g. 8 = 2 x 4, remainder = 0) and a “1” for odd days (e. g. 7 = 2 x 3, remainder = 1).
You can use Modulus with any other dividend to define your densificator.
When we look again at MONTH(OrderDate) on columns, category on rows and COUNTD(OrderDate_dens) on label, we can see that [OrderDate_Dens] is also a valid densificator.
EXAMPLE 3
What if we needed four copies of the data? We can again use [Order Date] as our densificator:
[OrderDate_Dens_4x] = STR(DATEPART('day',[Order Date]) % 2)+"-" +STR(DATEPART('week',[Order Date]) % 2)
This formula is combining a “0” for even and a “1” for odd days with a “0” for even and a “1” for odd weeks and is giving us four possible combinations (0-0,0-1,1-0,1-1).
Is it valid? Yes, it is!
EXAMPLE 4
For the visualization we have in mind here (category sales 2018 by month) the number of “copies” you can create with this data is limited. We managed to create 10 copies by combining two densificators, but not more.
Superstore is not the perfect dataset for this technique. Perfect would be for example production data, where we have logs that come in regularly like every minute or every 10 seconds or something like that.
Trying modulus with data from my (Klaus) solar power plant (where the data comes in every 5 minutes), we managed to put together 3,456 valid combinations within minutes using modulus with days, hours and minutes on MONTH(Time)- and “Dimension”-level.
Enough “copies” to draw some nice curves 😎.
Ok, let’s head over to some use cases to demonstrate the capabilities of this technique. And these are massive!
For our first use case, we want to create a Sigmoid-Style Line Chart. You can of course argue the purpose of such a chart – but hey, this is just to demonstrate the capabilities of Internal Data Densification.
To create this, we are leveraging Internal Data Densification in combination with range & bins-technique. To do this, we need one extra “copy” of our data.
We want to show category profits for all years. Let’s check first if there is a valid densificator and try again Modulus with day of Order Date:
[Densificator] = DATEPART('day',[Order Date]) % 2
Yes, we have a valid densificator!
For the range & bins-technique we have to set the range of 49 first. This is the number of points we want to use for our sigmoid-style curve. We call this calculated field path…
[Path] = [Densificator] * 48
…and this gives us either a 0 or a 48. When we now create a bin and set the bin size to 1, we should get our 49 points.
Let’s check: Yes, 49 marks (you have to select “Show missing values” to get this)!
We have split up our data into two parts, therefore we have to redefine our Profit using a LOD:
[Profit LOD] = { FIXED MONTH([Order Date]),[Category]:SUM([Profit])}
We want to select YEAR(Order Date) as a filter at a later point. Make sure to set the filter as a context filter, otherwise you would have to add YEAR(Order Date) to this Fixed-LOD.
You can now use [Path] and [Profit LOD] and follow for example this blog by Toan Hoang to create curvy line charts.
Check out the curvy lines using this technique on Tableau Public.
In this exercise, I (Rosario) used the technique to create a radial chart with weekly sales in 2018.
In this viz, we want to show weekly sales and two data points are required per week. Let’s again check first if there is a valid densificator and try Modulus, but now with the ASCII value of the first character of the Product Name field:
[Densificator] = ASCII([Product Name]) % 2
Yes, we get a valid densificator.
We have split up our data into two partitions, therefore we have to redefine our weekly sales with the following LOD:
[Weekly Sales] = {FIXED [Week]: SUM([Sales])}
In this exercise, using the technique allows each of the partitions to use different values for the radius:
[Radius] = 0.15+(ATTR([Data_Dens]) * (sum([Weekly Sales] / {FIXED: MAX([Weekly Sales])}))
Resulting in:
Play with the interactive version or download the workbook here.
Just a few weeks ago, I (Klaus) wrote about how to encode change in line charts.
The blogpost describes how the data was tripled to achieve this. Again, let’s try it with Internal Data Densification.
In this viz, we want to show category sales for all years. Let’s check first if there is a valid densificator and try again Modulus with day of Order Date:
No, we have some months with only two distinct remainders and therefore the densificator isn’t valid.
What about this:
[Densificator] = int(Sales) % 3
Yes, with Modulus of integer Sales, we get a valid one.
We have split up our data into three partitions, therefore we have to redefine our Sales with the following LOD:
[Sales LOD] = { FIXED YEAR([Order Date]), MONTH([Order Date]),
[Category]: SUM([Sales])}
Here, we’ve included YEAR(Order Date) in our LOD and can take year and category just as a dimension filter.
The rest is just finding and replacing both the old densificator (“Table Name”) and the old measure (Sales) with our new ones. That’s it!
You can find and download the updated workbook on Tableau Public.
In this exercise, I (Rosario) used the technique to avoid pivoting data with three measures (Low, High and Price) contained in each daily record by stock. Thus, I could simplify the necessary calculations.
The challenge in question was launched by the unique and incredible Andy Kriebel and corresponds to the 45th week of 2017 of #WorkoutWednesday. In that week, he challenged us to build a beautiful display style of a portfolio of stocks, an idea originally coming from Jeffrey Shaffer. This exercise was the first time where I experienced the benefits of using the Internal Data Densification technique. Below I share a summary of the main steps of the solution, applying the technique as I use it today.
In this viz, we want to show the 52-week low and the 52-week high of a stock in the last year and the current price. Three data points are required per Stock (or Symbol). Let’s again check first if there is a valid densificator:
[Marks_Dens] = CASE DATEPART(‘day’, [Date]) % 3 WHEN 0 THEN “Low” WHEN 1 THEN “High” WHEN 2 THEN “Close” END
Yes, we get a valid densificator.
We have split up our data into three partitions, therefore we get the value of the minimum price and the maximum price per stock in the last year and the current price per stock (the stock, in the table is identified with the name of Symbol) with the following LOD calculations:
[52-Week Low] = {FIXED [Symbol] : MIN([High])}
[52-Week High] = {FIXED [Symbol] : MAX([High])}
[Price] = {FIXED [Symbol]: SUM(IIF([Date] = {FIXED: MAX([Date])}, [Close], NULL))}
Subsequently, for each of the three possible densifier values, associate a value within the x-axis.
[Mark Axis Value] = CASE [Marks_Dens] WHEN "Low" THEN0.004 WHEN "High" THEN0.996 WHEN "Close" THEN ([Price] -[52-Week Low]) / ([52-Week High] -[52-Week Low]) END
Finally, for each of the three possible densifier values, associate a measure with the label you wish to display.
[Mark Label Value] = CASE [Marks_Dens] WHEN "Low" THEN [52-Week Low] WHEN "High" THEN [52-Week High] WHEN "Close" THEN NULL END
In this exercise, using the technique allows us to show for each of the stocks three marks (thanks to densification) and also for each of the marks to use independent measure values.
The updated version of the viz can be found here.
We both think, that Internal Data Densification is a very powerful technique, that allows you to create any chart in Tableau that would normally require the addition of extra points. We’ve shown four use cases but could have added many more.
Internal Data Densification can be used only under certain assumptions. Like described in our use cases, there has to be a valid densificator. Furthermore, it works only on aggregation level and best with large and granular data sets. Therefore, it is a nice addition to the existing Data Densification techniques, because these data sets are the cases, where Data Densification would ‘hurt’ the most. As mentioned before: when you already have tons of data, you don’t want to multiply it. For small or/and pre-aggregated data sets, you won’t have a problem to multiply your data.
That means, you have to know your data! We don’t recommend using this technique with live data, where you cannot be sure about what data comes in. Use it with completed/closed data sets or in cases, where you have full control over extract refreshes. Make sure that your densificator will work in ANY case!
That said we hope you enjoyed reading. Please use this technique and share your use cases with us on Twitter (@ProfDrKSchulte and @rosariogaunag) or in the comments below.
Thanks for feedback along the way to Jeffery Shaffer, Ken Flerlage and Ludovic Tavernier.
Rosario & Klaus
COMMENTS