Metrics - A little knowledge can be a dangerous thing (or 'Why you're not clever enough to interpret metrics data')

Posted by Jason Crease on Simple Talk See other posts from Simple Talk or by Jason Crease
Published on Thu, 03 May 2012 14:23:00 GMT Indexed on 2012/05/30 16:56 UTC
Read the original article Hit count: 328

Filed under:

At RedGate Software, I work on a .NET obfuscator called SmartAssembly. Various features of it use a database to store various things (exception reports, name-mappings, etc.) The user is given the option of using either a SQL-Server database (which requires them to have Microsoft SQL Server), or a Microsoft Access MDB file (which requires nothing). MDB is the default option, but power-users soon switch to using a SQL Server database because it offers better performance and data-sharing.

In the fashionable spirit of optimization and metrics, an obvious product-management question is 'Which is the most popular? SQL Server or MDB?'

We've collected data about this fact, using our 'Feature-Usage-Reporting' technology (available as part of SmartAssembly) and more recently our 'Application Metrics' technology:

Parameter	Number of users	% of total users	Number of sessions	Number of usages
SQL Server	28	19.0	8115	8115
MDB	114	77.6	1449	1449

_{(As a disclaimer, please note than SmartAssembly has far more than 132 users . This data is just a selection of one build)}

So, it would appear that SQL-Server is used by fewer users, but more often. Great.

But here's why these numbers are useless to me:

Only the original developers understand the data

What does a single 'usage' of 'MDB' mean? Does this happen once per run? Once per option change? On clicking the 'Obfuscate Now' button? When running the command-line version or just from the UI version? Each question could skew the data 10-fold either way, and the answers only known by the developer that instrumented the application in the first place. In other words, only the original developer can interpret the data - product-managers cannot interpret the data unaided.

Most of the data is from uninterested users

About half of people who download and run a free-trial from the internet quit it almost immediately. Only a small fraction use it sufficiently to make informed choices. Since the MDB option is the default one, we don't know how many of those 114 were people CHOOSING to use the MDB, or how many were JUST HAPPENING to use this MDB default for their 20-second trial.

This is a problem we see across all our metrics: Are people are using X because it's the default or are they using X because they want to use X? We need to segment the data further - asking what percentage of each percentage meet our criteria for an 'established user' or 'informed user'. You end up spending hours writing sophisticated and dubious SQL queries to segment the data further. Not fun.

You can't find out why they used this feature

Metrics can answer the when and what, but not the why. Why did people use feature X? If you're anything like me, you often click on random buttons in unfamiliar applications just to explore the feature-set. If we listened uncritically to metrics at RedGate, we would eliminate the most-important and more-complex features which people actually buy the software for, leaving just big buttons on the main page and the About-Box.

"Ah, that's interesting!" rather than "Ah, that's actionable!"

People do love data. Did you know you eat 1201 chickens in a lifetime? But just 4 cows? Interesting, but useless. Often metrics give you a nice number: '5.8% of users have 3 or more monitors' . But unless the statistic is both SUPRISING and ACTIONABLE, it's useless.

Most metrics are collected, reviewed with lots of cooing. and then forgotten. Unless a piece-of-data could change things, it's useless collecting it.

People get obsessed with significance levels

The first things that lots of people do with this data is do a t-test to get a significance level ("Hey! We know with 99.64% confidence that people prefer SQL Server to MDBs!") Believe me: other causes of error/misinterpretation in your data are FAR more significant than your t-test could ever comprehend.

Confirmation bias prevents objectivity

If the data appears to match our instinct, we feel satisfied and move on. If it doesn't, we suspect the data and dig deeper, plummeting down a rabbit-hole of segmentation and filtering until we give-up and move-on. Data is only useful if it can change our preconceptions. Do you trust this dodgy data more than your own understanding, knowledge and intelligence? I don't.

There's always multiple plausible ways to interpret/action any data

Let's say we segment the above data, and get this data:

Post-trial users (i.e. those using a paid version after the 14-day free-trial is over):

Parameter	Number of users	% of total users	Number of sessions	Number of usages
SQL Server	13	9.0	1115	1115
MDB	5	4.2	449	449

Trial users:

Parameter	Number of users	% of total users	Number of sessions	Number of usages
SQL Server	15	10.0	7000	7000
MDB	114	77.6	1000	1000

How do you interpret this data? It's one of:

Mostly SQL Server users buy our software. People who can't afford SQL Server tend to be unable to afford or unwilling to buy our software. Therefore, ditch MDB-support.
Our MDB support is so poor and buggy that our massive MDB user-base doesn't buy it. Therefore, spend loads of money improving it, and think about ditching SQL-Server support.
People 'graduate' naturally from MDB to SQL Server as they use the software more. Things are fine the way they are.
We're marketing the tool wrong. The large number of MDB users represent uninformed downloaders. Tell marketing to aggressively target SQL Server users.

To choose an interpretation you need to segment again. And again. And again, and again.

Opting-out is correlated with feature-usage

Metrics tends to be opt-in. This skews the data even further. Between 5% and 30% of people choose to opt-in to metrics (often called 'customer improvement program' or something like that). Casual trial-users who are uninterested in your product or company are less likely to opt-in. This group is probably also likely to be MDB users. How much does this skew your data by? Who knows?

It's not all doom and gloom.

There are some things metrics can answer well.

Environment facts. How many people have 3 monitors? Have Windows 7? Have .NET 4 installed? Have Japanese Windows?
Minor optimizations. Is the text-box big enough for average user-input?
Performance data. How long does our app take to start? How many databases does the average user have on their server?

As you can see, questions about who-the-user-is rather than what-the-user-does are easier to answer and action.

Conclusion

Use SmartAssembly. If not for the metrics (called 'Feature-Usage-Reporting'), then at least for the obfuscation/error-reporting.
Data raises more questions than it answers.
Questions about environment are the easiest to answer.

Developer IT