(TL;DR) As we’ve hurried down the empirical path of measurement while also having a large head full of rationalist theory, we’ve come to an interesting intersection. People are becoming more skeptical of data measurements, collection, results and this is a good thing for data professionals!
This week I was listed as a contributor to Avinash Kaushik’s Blog post titled: A Great Analyst’s Best Friends: Skepticism & Wisdom!
The focal point of how to be justifiably skeptical around summary statistical integers, charts or measurement methods has been an interest of mine over the last year. Mostly I work on the empirical results of data. Collection, planning or design have not especially been in my purview. However I can infer things about the process which produces the result I work with. So, I have been learning about scientific methods from some impactful philosophers like Hilary Putnam who passed less than a month ago with Martha Nussbaum doing the HuffPost Arts & Culture commentary. (This last link is a great read!)
But I’m not writing this post to talk about these aspects as this portion all might be considered least significant bit first but I’m sharing it since it was a point of pride for me and an indicator I am heading in the right direction with my journey. For as a skeptic, I always has a certain degree of self-doubt.
what I really want to talk about is Point #10:
Have we accounted for the attenuation of data as it was collected via a specific medium?
Nominal, Ordinal, Interval and Ratio data or N.O.I.R. for short all have various mediums in which the data can be collected. Nominal data can be the results of a survey medium but so can ordinal and the same with ratio. In each result the medium attenuation can be high to low with a possibility of zero if done really well. Let me give an example:
Let’s look at the common business metric:
Net Promoter Score(NPS).
NPS quickly asks customers to respond to one question: “How likely is it that you would recommend our company/product/service to a friend or colleague?”
Here is a template example from the popular survey software, SurveyMonkey.
The likelihood is assessed with an 11 point scale in which the respondent answers using buttons that appear to be points ranking ordinally from 0-10. The results are then summarized as a ratio comparing each of the three categorizes to the total response size. In our example below, we are taking an NPS score from 100 respondents. I’m not going to go into survey methodologies, sample distributions and sampling statistics in this post but these do matter and should be detailed in the measurement plan.
Promoters – 75:100
Passive – 15:100
Detractors – 10:100
An NPS score is then calculated by taking the ratio of the Promoters minus Detractors. In this case it’s 75 – 10 = Score: 65.
Ok so that is how it works and there is no shortage of issues I’ve written about when using the NPS survey method even without thinking about how attenuation will effect these results. (See Postscript on why NPS is a bullshit metric.)
Back to the main point – Attenuation!
Now we know our company has a positive NPS score of 65 but what is the strength of this signal? Let’s look at some common mediums that deliver the NPS question:
- E-mail: NPS results from 100 answered e-mails
- Market Research: NPS question asked as part of a larger survey question set
- Transactional: NPS question asked at the end of every business transaction
- In-app Software: NPS question asked using in-app technology like a pop-up
Each medium will have it’s own fluctuation of signal strength. Imagine if you could ask the NPS question through your software to 100 people on a Monday upon first login, mid-week upon third login, or in the afternoon on Friday. Now remember the medium here, all someone has to do is click a radio button and the results are saved. Noise over this medium could appear as:
- Consumer just wants the pop-up off the screen and clicks a number
- Account for using: Time delay response measures
- Account for using: Don’t prompt user suddenly upon first login
- Consumer is new and so clicks a zero or a 1 because it’s your lowest option
- Account for using: Add an N/A option
Ok, so I hope our target practice has been useful here in at least getting you to think about one of the ways to be skeptical of data measurement, collection and results.
But why does it matter?
It matters because eventually, someone is going to present these figures as facts! Or they are going to use a summary statistical integer to combat your idea or point. As a data professional who has been studying the craft of measurement and analysis for sometime both points bother me but the latter one so much, that I made a slide about it in my Data Storytelling and Collection presentation. Here it is:
There are two things contrasted in this slide:
#1 – Jonathan Franzen quoting the 1996 tit-for-tat between:
- Adam Hochschild’s 1996 article – “Taken Hostage at the Airport”
- John J. Mcmenamin’s response (President, Turner Private Networks) – “Seek Shelter if the Medium Is the Irritant”
- Adding his personal commentary about being “mugged by a norm”, which presumably is the 95% – four sigma data distribution under the normal curve, so I added the six sigma 68%, 95%, 99.7% points into my slide.
#2. Edward Tufte – a well known data visualization professional lamenting the same point about 20 years later.
- Side Note: When I had responded to his tweet with what Franzen had written many years prior and how I was adding this tweet to my slide, he blocked me. Fairly disappointing for me since this is the only interaction we’ve ever had and I would be attending his course two months later and putting a significant amount of my personal time into a 3 Part Overview.
Presentation of the slide:
I put these two together on the same slide so the audience can understand the arc of this conversation. However, my real point here is in regards to what you should never do as a data professional. That is: Use a summary statistic to defang a profound argument or philosophical point.
Let me put it another way. Even if what John is saying is 100% true that 95% of air travelers say CNN enhances the airport experience, that 89 percent believe that it makes the time spent in an airport more worthwhile and that CNN wants to provide discretionary, not intrusive or forced viewing with audio only adjusted to six decibels above the ambient noise levels.
None of those numbers, negate the point that the amount of din has increased in an already noisy environment of shuffling feet, luggage wheels, cart beeps, chit-chat, food bites, beverage sips, gate agents and my list doesn’t even include modern devices! Now, you are going to put TV noise into that mix with no option to turn it off?! That is what Adam is talking about. He’s not talking about how satisfied people are with the noise. He’s talking about the general principle of commercializing every aspect of airport idle time. Which of course he was right about as we’ve put advertising now inside the bins at the security checkpoint and other nonsensical places.
Showing Adam a number that says people are satisfied with their force-feeding of noise still doesn’t address his suggestion that airports create a TV room just like they have yoga rooms, mediation rooms and prayer rooms. Adam may well have been a man ahead of his time in this regard.
Why is this important?
When I call a meeting, I prepare hand-outs for the audience and like Tufte recommended, I try to make it a sanctuary for the material. Typically, this means I know a great deal that my audience does not and I’ve spent my time in the data trenches where the audience hasn’t. If I was to rebuke an executive making a philosophical counter-point with years of experience, gut instinct trained through the practicing of a craft, savviness of business expertise with only a summary statistic, then we both lose. Because a number derived under a false premise cannot get us to truth.
So be skeptical my friends. Your data professionals will appreciate it. :]
Sharpen your skills with this exercise
Recently Polygraph has undertaken arguably the largest Hollywood script analysis to date.
Results here: http://polygraph.cool/films/
As you read the article, review the techniques Kausik published.
I have commented especially on this graph here:
Can you tell what’s odd here? I put the arrows in to help.
If you need a hint or the answer, click this link and read my comments which Polygraph liked.
Postscript: Why NPS is a Bullshit Metric
From my post in Sept 2015 – https://www.linkedin.com/pulse/20140922215049-14473158-is-nps-a-bs-metric
The tough part about NPS is that the methods aren’t bullshit but the score is bullshit. Let me defend my snotty remark before you just dismiss me. :]
This post covers what I’d call methods, measurements and actions. All of which are good points and something that any survey could really deliver on. However NPS is unique because of the ‘S’ or the Score which is commonly a summary statistical integer. 23%, 25%, 21% ect. I call bullshit on NPS for the following reasons:
#1. NPS scores are based on the number of responses. This number has to be controlled for it to be meaningful and rarely is it published along with the summary score statistic. I’ve seen responses rates vary from Q1: 164 to Q2:554, a 237% INCREASE but an overall NPS decrease from Q1:39% to Q2:34%. Now, an educated rebuttal to me pointing this out would be: “Yeah, but that doesn’t matter. 50% of 100 is still 50, 50% of 200 is still 100 and 50% of 300 is still 150. It doesn’t matter how many responses you get because 50% is 50% is 50%.” I would contest that because the promoters score of 10-9 or 20% of what’s available is actually hard to get. You have to remember that under an NPS model, people who respond with an 8 are considered passive. So under a random probability test you’d have a 20% positive chance, a 20% passive chance and a 60% negative chance. Now, I understand that NPS isn’t based in random probability but over time, one can see here that your much more likely to fill your Detractor bucket than you are your Promoter bucket. So because the response rate increased 237% between quarters, it’s pretty obvious to my why you have the 5% overall NPS decrease between Q1 and Q2. The causal explanation for your decrease is in your damn methods! Nothing more.
#2. NPS scores don’t show variation. It would make much more sense to me if NPS scores for Q2 with 554 respondents were broken down into 5 groups of 100. Then along with the summary statistic the variation was shown. Error bars, standard deviations, median moving range, Shewhart control charts, something! Anything! Showing people a summary statistic is all of about useless without an understanding of the variation. “We had a 34% NPS score.” Ok…..compared to what?! 34% with expected variation between 20% and 40%. Ok now you are telling me something, namely, we’re still positive. But 34% with variation between -10% and positive 55% sounds to me like we’re hitting something and then drawing a bulls eye around it.
#3. NPS scores don’t account for context/comments. As I’ve leafed through NPS results I commonly see: Survey score: 5 – Comments: “Haven’t used the product long enough to know.” The problem with NPS is that this counts against you! Some NPS survey’s even include a 0 option. Did someone select the zero because they don’t have the product? Don’t care? Haven’t used it long enough? With NPS context doesn’t matter, if it’s a zero, it counts against you. If it’s a 1, it counts against you.
Alright, this is a long enough post but for these reasons I call bullshit on the SCORE not on the methods. NPS is totally valuable for:
-Knowing who your promoters are and following up with them.
-Knowing who is passive and working on converting them to promoters.
-Knowing who is a detractor and reaching out to them to find out why.
But then again, you can pretty much do that with any survey; can’t you?