The Onion recently released an article entitled “Cogmap, dismayed over poor data quality of other vendors, releases one man’s opinion in data sampling errors”. Link below!
Flurry recently released a delightful set of Mobile 2013 data, but they did not caveat it enough for my taste. (I will soon be guilty of doing the same thing, FWIW.)
So I thought I would complain because, in short, when I see an ad network release data my perspective on every single slide is “How is this skewed by network composition”. And usually the answer is: “I bet it is skewed a lot.”
A couple of examples:
This is kind of set-up as “the growth of mobile”, but really it is the growth of Flurry. What we need to add to this is a retail storefront data point: How much of this is “same store sales” versus “new sales”? I will say, Flurry is big. This data is probably pretty good, but it is probably not unfair to say that many of their biggest customers are probably tracking a lot more data than they were. This could cause in-application events to skyrocket even as application usage remained static. Those big jumps could be a change to the way Angry Birds tracks apps, or it could be the installation of a new app, or it could be some broad market growth descriptor. It would be nice to know which.
Much later, we saw this slide which has similar issues:
They disclose that this data comes from Flurry. So this is really more about their network composition. This means that one of two facts is true: They either have Facebook or they don’t. If they have Facebook, then their data sample over-indexes the Social Networking (i.e. they capture most of the Social Networking activity happening in phones, but they don’t capture all of the other activity.) If they do not have Facebook (as seems likely from this graph), they are missing most of the social networking activity people perform on their phones. Instagram? If they disclosed the sites they track, we could better understand the relevance of this data.
There is a more interesting problem here: They are comparing apples to oranges. They pull the television time from the Bureau of Labor Statistics, then they pull the web browsing time from comScore and Alexa, and then they pull the Mobile App numbers from their own data. I assume that the Bureau of Labor Statistics does a good job controlling for people that don’t consume any TV – and I assume that pulls the average down substantially. I assume comScore does an OK job controlling for this – once again, pulling the average down substantially. I assume Flurry does a terrible job controlling for this – they are not really that kind of company, why would they? So I suspect that this data is quite wrong.
I know people in glass houses shouldn’t throw stones, I just can’t help it. If you are still looking for the Onion article about me, go buy a book from the Onion using my affiliate link. You owe me a nickel for falling for that.