posted: November 16, 2019
tl;dr: Different ways the humble IF statement has been implemented across various data visualization tools...
One of the aspects of my job that I enjoy is getting the opportunity to work with a variety of technologies. Often tools which do the same job will have fairly different ways of doing the same operation. For some reason, the tools’ makers chose to solve the same problem in different ways, rather than looking to an already-existing solution as a model. This is often called “reinventing the wheel”.
Recently I’ve been doing some data analysis work and using several data visualization tools: Tableau, Amazon QuickSight, and Google Data Studio. They all do fundamentally the same thing: take a table of data, typically from a database, and display a visualization of it, such as a graph or chart. They each have their strengths and weaknesses, but they all come with their own internal domain-specific languages for manipulating the tabular input data into a final form for display. All three tools invented their own languages. The differences can be seen by examining the humble IF statement, which (in some form) also appears in nearly every higher-layer computer language.
The tools chose not to adopt the IF statement syntax of today’s most widely used languages. Data analysis and visualization also can be done entirely in Python and JavaScript. Python has, arguably, the simplest syntax with the least amount of keywords and punctuation, although indentation is a must (you should be indenting your code for readability anyway):
if condition1:
statements1
elif condition2:
statements2
...
else:
statementsN
JavaScript looks very similar, after taking into account the additional punctuation it requires, chiefly because it is a curly-braces-block language:
if (condition1) {
statements1;
} else if (condition2) {
statements2;
...
} else {
statementsN;
}
Python and JavaScript predate all three of the data visualization tools, although they were not as popular when each tool got started as they are now. So the tools can’t necessarily be faulted for failing to adopt the syntax of Python or JavaScript. There were, however, similar tools around at the time that could also be used to create visualization of tables, namely spreadsheets. Here’s the syntax of the IF statement in Microsoft Excel, which nearly matches the syntax of the prior dominant spreadsheet program, Lotus 1-2-3:
IF (condition1, value1, IF (condition2, value2, ... IF (conditionN-1, valueN-1, valueN)...))
This syntax is not bad in the simplest case of a single if-then-else clause, but it gets progressively more difficult as the number of else-if clauses grows, because additional IF statements must be embedded within the else clause of the previous IF statement. At the end all the IF statements must be terminated with closing parentheses, which is error-prone. Inserting a new else-if clause in the middle requires editing both the middle and end of the statement. So I would not hold up this IF statement as a model, but it’s interesting that the three data visualization tools chose not to simply adopt the syntax of the world’s most popular spreadsheet, as many users could be expected to know that syntax.
Tableau, as the name suggestions, is a table visualization tool. The parent company was acquired earlier this year by Salesforce for over $15 Billion. The table of data for each Tableau visualization can come from an Excel spreadsheet but more typically comes from a database.
Tableau is a mature, rich tool with many features and capabilities. Tableau's designers took an interesting approach to implementing the IF statement: they implemented multiple of them, allowing the user to pick-and-choose their preferred one. One of them resembles the one-conditional case of an Excel IF statement, although it is called the IIF statement:
IIF (condition1, value1, value2)
That syntax only works for a single conditional. For the more general case of multiple conditionals, the Tableau folks must not have liked the Excel syntax because they invented this IF statement:
IF condition1 THEN value1 ELSEIF condition2 THEN value2 ... ELSE valueN END
This syntax almost uses Python’s keywords, except for the change of “elif” to “ELSEIF” and the pesky, required “END” at the end. It has minimal other punctuation, so it’s pretty clean and straightforward. Tableau has great autocomplete and immediate syntax checking, which help the user create long IF statements. There also is a CASE statement, but it’s not as general purpose as the CASE statement we’ll look at below. All in all, Tableau has a decent implementation, although the existence of both IIF and IF will cause some confusion.
Amazon QuickSight is fairly new to the scene. Their tool designers chose not to follow any IF statement model that I am aware of. Instead, they invented this syntax for their “ifelse” statement:
ifelse(condition1, value1, condition2, value2, ... , valueN)
My guess is that this was an attempt to fix the main deficiency of Excel’s IF statement by pairing up each condition and value clause and sticking the final else value at the very end. It does eliminate a lot of typing, the complexity of embedding additional keywords, and the challenge of closing all the parentheses at the end. The main downside is that it is new syntax to most users, including a somewhat strange keyword, “ifelse”, which stands in contrast to the much more widely used phrase of “else if”.
Google Data Studio took a different approach. There is no IF statement, as if/then/else functionality is provided by a general-purpose CASE statement with this syntax:
CASE
WHEN condition1 THEN value1
WHEN condition2 THEN value2
...
ELSE valueN
END
There’s that pesky END statement again 😉. If that syntax looks familiar to you, even though you haven’t used Google Data Studio, it’s because the tool designers chose not to reinvent the wheel and to instead copy the SQL CASE statement. This is actually a smart choice, as often the data coming into Google Data Studio is coming from a tabular database, and SQL is the most widely-adopted query language for tabular databases. So many Google Data Studio users will be familiar with this syntax.
I’ve just focused on the IF statement in this post. Each of these tools’ languages has many more keywords and built-in functions, of course. I prefer not to reinvent the wheel and to instead look to prior art whenever possible, so it’s interesting to me to see how others have approached the same problem.
Related post: Reinventing the wheel: an example of not doing so