#108 The death of accuracy: an obituary
In this week’s episode of The Measure Pod, Dan and Bhav reflect on insights gained from MeasureCamp London, and dive into the thought-provoking topic of “the death of accuracy.” They talk about the future of accuracy in data, and the complexities surrounding the concept of data accuracy, questioning whether tracking 100% of data is truly accurate or even beneficial.
Show notes
- Dan’s GA4 Immersion 6-week cohort training
- More on MeasureCamp
Share your thoughts and ideas on our Feedback Form.
Follow Measurelab on LinkedIn and LeanConvert on LinkedIn.
Join the CRAP Talks Slack community and follow them on LinkedIn.
Music composed by Confidential – check out their lo-fi beats on Spotify.
Transcript
We don’t trust the data, is not a throwaway question. There is a fundamental issue here at an organisation level, and that needs to be addressed with the highest level of seriousness.
Bhav
I don’t think accuracy is a useful term to throw around in this context anymore.
Dan
[00:00:00] Dan: Welcome to The Measure Pod. This is episode 108 and we recorded this on the 24th of September, just after MeasureCamp London. So if you were there thank you for joining our live episode that should come out in the next couple of weeks. But in the meantime me and Bhav, we’re back talking about the death of accuracy and obituary.
[00:00:30] Dan: And this is a bit of a clickbaity headline. But this is a Conversation I’ve been wanting to have for a little while. It’s just the two of us. And we were talking about whether or not tracking a hundred percent of data is, is accurate, this concept of accuracy and whether or not that’s a good thing, a bad thing, whether it exists anymore, I’ll, I’ll probably introduce herself.
[00:00:47] Dan: I’m Daniel Perry-Reed. I’m a principal analytics consultant and trainer at Measurelab. And Bhav is director of experimentation and analytics at LeanConvert. So Bhav. We’ve just finished the episode and it’s always a bit weird kind of when it’s just us two kind of at the end of the chat going back and sort of reviewing the chat.
[00:01:03] Dan: But what do you think? I mean, I kind of threw this on you really with not a lot of prep time and I appreciate your time for this. What do you think as we’ve kind of wrapped up the conversation on this?
[00:01:12] Bhav: I’m going to refrain from saying my staple like, Oh, I really love this episode because it seems kind of self serving given that you’re just you and I, and we, you know, we, we held, we held the conversation.
[00:01:22] Bhav: The majority was talking it was, the topic was an interesting one because you’re right. We decided just in the last few hours that we were going to talk about this. I think we luckily we had this conversation come up at measure camp on Saturday. So I was already somewhat prepped and primed to be thinking about this topic.
[00:01:39] Bhav: I hadn’t anticipated that the rabbit hole we were going to go down, but I think it’s an important topic to talk about because It is an ongoing problem. It’s always going to be an ongoing problem. This, you know, the reliability and accuracy of data, whatever you want to call it, has been a problem, you know, for nearly 20 years.
[00:01:55] Bhav: And it’s going to continue to be a question for the foreseeable future. So I think it’s just really around, I know, I know we summarised it a lot in the call, but how do you reassure people and give them comfort in the fact that actually, even if the data is not accurate, That doesn’t mean it’s not usable and it’s two different things.
[00:02:15] Bhav: So it was a, it was, it was a nice episode to record. It was nice to actually be challenged on, on a topic like this, because I hadn’t really given it due diligence prior to this, this episode. And maybe to some extent, the discussion we had. So I thought, I think it’s an interesting topic to be discussing. I don’t think it’s a topic that’s going to go away anytime soon. So I’m glad you brought it up.
[00:02:35] Dan: And of course we answer all of these questions. We give everyone the exact answer. You just have to listen to the end, sign up to the newsletters, join our mailing lists and give us your personal details, right?
[00:02:43] Dan: And then you can get all the magic answers as we’re now data magicians. Of course. All right. Anything you want to plug or talk about before we jump into the episode?
[00:02:51] Bhav: I’ve just got a whoop and I’ve, I’m curious and I’m interested about all the data I’m about to start collecting. So for context, I’ve been doing boxing for the last four months, but I’ve not been able to track any of my boxing sessions because I can’t wear my Samsung watch.
[00:03:08] Bhav: Because it keeps clicking and it stops recording midway through the session. And, you know, you won’t talk about inaccuracy of data, try recording your, your boxing session with a Samsung watch or I watch whatever under your gloves. Actually the work did the entire session because there’s no buttons on it. And I was able to record my very first boxing session. And apparently I burnt 750 calories.
[00:03:32] Dan: Wow. Amazing. Amazing. I, I find that fascinating. We did an episode way back in the early days of this show where we called it the quantified surf and we talked about whoop because I joined whoop for a little while with a trial.
[00:03:42] Dan: And there’s a bunch of us at Measurelab where we joined a little group and we all saw each other’s stats. And it’s a bit slightly intrusive. Actually, you wake up in the morning, you can see each other’s recovery scores, a bit of an odd one, but we definitely made it a competition. Of course. I find it, I should say.
[00:03:55] Bhav: So I should say thanks to Bree. Bree’s one of our co founders at LeanConvert she got so sick of seeing me record manual activities on Strava and just putting up a selfie with no session details, like calories burned, heart rate, anything like that. She was just like, I’m just going to give you a whoop so that you can track your data. And so thanks Bree, this one’s, this one’s for you.
[00:04:15] Dan: Awesome. And I have nothing professional to plug, but as an avid video gamer, I have to say that the, my gateway video game Broken Sword has just been re released. It’s called Broken Sword Reforged, and they’ve re released a 1990s game in 4K, up res, and it’s a beautiful, beautiful game.
[00:04:34] Dan: game re released in all of its glory. And I just wanted to plug it because I’m having a great time revisiting this point and click adventure game from the nineties that got me into video gaming. So there we go. A little plug for that. All right, well, without further ado, we’ve done a really long intro for a very short episode. So enjoy the show.
[00:04:53] Dan: A lot of people have been kind of like you know, ignoring the whole concept of cookie banners and stuff until Google started enforcing it. So there’s a lot of companies out there that all of a sudden, let’s take Google analytics as an example. They’ve had a hundred percent of data in there, or at least what they perceive.
[00:05:06] Dan: A hundred percent of data to now having. 60, 80, 20, percent of data, whatever their opt-in rate is, and I’m getting asked this question of what can I do with this data now? It’s not accurate. Can I trust this data when we can’t validate it? So that’s where we’re starting with this. It’s either going to take us 10 minutes to figure this out and get an answer.
[00:05:22] Dan: It’s gonna never end this conversation, but I just wanted to get your perspective. Are you having these conversations and what are your thoughts on the topic so far?
[00:05:30] Bhav: Yeah, I think the conversations I’m having a very similar, they come about in very different ways, you know, usually through, oh, this, this, this doesn’t match what I have here, or this isn’t exactly the same as here.
[00:05:40] Bhav: And so I get this, I get this conversation come up more than I would like. But I’m glad it does come up because I think it’s one of those ones where it’s going to have to. It’s an education piece. We’re going to have to continuously talk about it. You know, I don’t think we’re ever going to reach 100 percent people understanding the nuances that go into data collection.
[00:05:58] Bhav: And I’m talking here about the likes of your GA data. I think your transactional database, you know, your customer database, These ones will always be your source of truth. So when we talk about, like, trusting the data and is it accurate? I think we’re talking about a few different things here. Because, let’s say, you have a bag of marbles, right?
[00:06:20] Bhav: There’s a hundred bags, there’s a hundred marbles in the bag, right? Some percentage of them are black, some percentage of them are White, right? And it’s still accurate, right? So that’s that. But then you take 30 out, right? Is the data that’s all is the number of marbles in that bag accurate? The answer is still yes. It’s not complete, but what you have is still accurate. Does that make sense?
[00:06:42] Dan: It does. But the question goes and the thing where we can’t validate that, but is what the marbles that are left, is that a representative sample of the original total audience size? Because if you took 30 white marbles out, all of a sudden the ratio changes.
[00:06:55] Dan: But the thing about that is because you’ve removed the white marbles in this example, you can’t validate what it was before to know that it’s different now. So how do you address that kind of problem?
[00:07:04] Bhav: But that’s, yeah, I agree, and I think that’s a slightly different problem, but what’s left in the bag is still accurate, right? So when you get, oh, I don’t trust this data, I, you know, I like to start, well, why don’t you trust this? Like, oh, we just don’t think it’s accurate, so why don’t you think it’s accurate? And if the answer’s, because it doesn’t represent the full, Overall population size. That’s a very different conversation too.
[00:07:25] Bhav: I just don’t think it’s right, right? Because it’s actually what you’ve collected. If you know, assuming, assuming, assuming it’s set up correctly, assuming your tags are firing, assuming that you’re not losing customers midway through the journey because it’s broken. And then they’re restarting the journey through like, you know, a payment page, which sits on a slightly different domain or something like that, you know, whatever it might be, as long as the setup is correct, whatever you track should.
[00:07:52] Bhav: Be accurate. Will it be a hundred percent in line with, let’s say you take that same date. Let’s say you don’t know who, how many people didn’t make it to the site. But if you know, with the transactional data being in line with what you see in the back end, it may not be exactly right, but if you can just isolate just what you have and usually I try to, this is why it’s so important where I try and capture order ID or user ID in GA, because then I can match it up and say, look, well, this is this person, this is their order.
[00:08:18] Bhav: This is what we have in the backend. This is the order and one plus one equals two. Now that for me is a very different conversation. Like, look, what we have is accurate. We’ve cross referenced that we’ve checked against the database and anything we capture is, is, is right. Or, you know, it’s wrong because it’s off by X amount because of like, I don’t know currency conversions or something like that, but generally what we have is, is correct.
[00:08:42] Bhav: Now, to your point, which is what about the marbles that are thrown away? Right, and this is the hairy one, because you never really know the answer to this question. You don’t really know how many marbles have been chucked away. If all you did was stick your hand into this, into this bag of marbles, pulled out a handful and chucked them away.
[00:08:58] Bhav: You don’t know if you’ve got, you know, if you’ve got like 10 percent of them, 20 30 percent 40 percent of them. And I don’t know if there’s ever gonna be a right way of doing it, because you’re gonna miss people for all sorts of reasons. And I just don’t think there’s a way around it.
[00:09:12] Bhav: Now, the way I overcome this now, I can’t do it for sessions, but I like to, I go back to about using my my, my, my transactions data. I’d like to compare transactionally what I have in something like GA or amplitude or whatever, versus what I’ve recorded in my transactions table. Now, if what I have from a transactions.
[00:09:34] Bhav: from some transactions point of view matches what I have in my backend. I use that as a way, as a proxy. And again, it’s not perfect, but statistically as a proxy to say, okay, well, we have, we roughly captured 90 percent of orders. We roughly captured 90 percent of add to baskets or 80 percent 85 percent whatever.
[00:09:52] Bhav: So can we assume that the trend, the sessions that we’ve captured is only that same proportion. So that’s how I would do it. And then, then my second point around this is, how do you do it, get by that because and I mentioned this on actually this, this partially came up on measure camp on Saturday when we were there where someone was asking the same question and they were talking about stakeholders pushing back and saying, we don’t trust the data stakeholders.
[00:10:18] Bhav: You know, they’re not unreasonable people. I think what they’re looking for is reassurance. And I mentioned this on Saturday. I think they’re looking for reassurance that yes, this isn’t a hundred percent accurate, but I just need to believe or find a way to believe that what I’m seeing here is reliable.
[00:10:35] Bhav: So it’s about reassurance and reliability of what you have is reliable and good enough and reassurance that actually whatever difference you have is going to be consistent.
[00:10:44] Dan: So you touched on it there but I think this is the kind of stakeholder management. So I think where there are a lot of similar people in the analytics space, a lot, often we kind of like jokingly cut this, this subject might come up and we might kind of diminish it in some way as a valid question.
[00:10:59] Dan: And we might be like, oh, your data has never been accurate. Huh? I bet you didn’t know that, you know, and just talking about things like, you know, let’s take Google analytics again as an example. It’s like, okay, well, it’s a, it’s, it’s based on JavaScript. It’s based on. The browser or ad blockers not being there is based on so many factors.
[00:11:14] Dan: And by thinking it was accurate or was a hundred percent, and now it’s changed. That’s an issue in itself. And I don’t think it’s a, and I’m not saying you’re saying this, but I don’t think it’s fair to be like, Oh, it never was. You know, because I think people used to be able to use data to make decisions, or at least they perceive that way.
[00:11:31] Dan: And now they think that that is not possible for whatever reason has happened. Let’s take the consent management as a, as a prime example, whereby the data has changed, you know, the, the, the arrows are red and facing downwards because they’re now tracking 30, 40 percent less. Data quote unquote data than they used to have.
[00:11:47] Dan: So I’m, I’m thinking like, okay, to, to communicate this with stakeholders, like the key part here. And I think we, we touched on there is around kind of like giving them reassurance, but how, how does someone that has, let’s say someone’s been in the industry for. 20 years, and they’ve been using Google Analytics for the best part of that.
[00:12:04] Dan: And they’ve been using Google Analytics in a certain way. Their, their, their director level, their C suite are using dashboards with things like total sessions and bounce rate from Google Analytics. Whether we agree with that or not, that’s not relevant. And all of a sudden we’re saying to them, okay, well, total sessions is a number you can’t use anymore because we can’t.
[00:12:20] Dan: We can’t track it in the same way. How do you go about approaching that conversation whereby essentially stuff that’s been static for a long time has now changed and we’re having to explain to them that this number has to change in essentially executive level data all the way down to optimizing maybe media campaigns and measuring things like cost per acquisition is now Difficult or different to do.
[00:12:42] Dan: And maybe we have to get used to new sort of benchmarks, I suppose. But how do you approach that? Like this is, this is a hard, this is change management. This isn’t technically explaining what’s happened because I think we’re all really good at explaining the technicalities and the change, but how do you actually do the change management side?
[00:12:58] Bhav: Oh, you ask, you’re asking a good question here, Dan. I think, okay, let’s, let’s first take solace in the fact that there’s, it’s probably unlikely that there’s a CEO somewhere who’s spent 20 years looking at data and then suddenly he’s had to shift and, you know, be like, well, why is my bounce rate suddenly like changed or whatever?
[00:13:17] Bhav: So I think, but your point stands, how do you manage that? And I think, I think as an analyst, certainly, or rather as analysts, The way we communicate this is not through sort of like single charts, right? Or single pieces of insight. And this is the thing that really irks me is whenever I see someone and we have this, you know, I’ve seen this before very recently actually where the client was asking about particular data and it wasn’t really being, it wasn’t matching.
[00:13:46] Bhav: And the approach I was taken was very much, Oh, here’s a, here’s a Google sheet with the data that shows you that this is. down by this much and this is down by this much. And if you look at this next number, the percentage is exactly the same. Therefore, it’s going to be consistent for me. That’s not reassurance.
[00:14:04] Bhav: That’s me being an like, you know, that’s the analyst kind of, Oh, here’s a chart or here’s a, here’s a couple of numbers. And they say this, interpret it in, you know, make an inference of it. And I think what you have to realize when you’re communicating with someone who’s senior, the, when you’re being challenged on, Data integrity, you have to take it seriously when you because if you know your approaches that I know I for one, I’m not saying, Oh, it’s never been correct.
[00:14:31] Bhav: You know, like, I don’t like that approach because it’s like, I mean, I think most people know that. So it’s a case of How do you take this very serious question and give it the respect it needs? And for me, the way I would go, I go about it and I continue to go about this is by turning into very formal, like, you know, a proper piece of documentation that is, you know, that captures the purpose of the document.
[00:14:55] Bhav: You know, I, I’m going to just, shift slightly and talk about the hierarchy of documentation and as an analyst, recognising when a chart is enough versus when a PowerPoint was enough, us, blah, blah, blah, whatever. So if we think about the hierarchy, let’s say the lowest, lowest form of data analysis or answering requests, if someone asks you something saying, here’s a child, here’s a number, right.
[00:15:17] Bhav: And it’s sent over Slack or an email or something like that. It’s, it’s the lowest form of Insight analysis work. It’s just literally, here’s a question. Here’s an answer. Then you have your data and your dashboards. This now starts to move more into, like, you know, we’re going to give you more insight, but you’re still not going to, sorry, we’re going to give you more information, but you’re not going to get more insight at this point.
[00:15:37] Bhav: This is just, here’s a dashboard, you’ve asked for it, you know, we’ll identify some trends, and we’ll do some things in there that may help you see standard deviation, or, you know, whether you’re within some type of range or something like that, but again, it’s still fairly basic. Then you start to go up.
[00:15:50] Bhav: Now you’re talking about PowerPoint presentations, right? This is where you want to tell a story. So for me, a PowerPoint presentation is a great way to do a piece of analysis on a campaign or an experiment that you’ve run or some, some analysis that you’ve been asked to do about the performance or, you know, customers or something like that.
[00:16:08] Bhav: And it’s, it’s still very, this is now very informative and it’s still much better than a dashboard or a single chart or a number. But it’s still, there is still not, it still doesn’t have the same level of gravitas attached to it. We then move into sort of like the top of this, which is a white paper or a memo or whatever you want to call it.
[00:16:28] Bhav: Now I used, I pull this out when I really need to influence change, right? It’s a, it’s just a message that ascends. It’s a, it’s a piece of document. It’s written in, you know, in a, as a document. As a memo format, right? Like Amazon live by memos and my previous company, one of my previous company just as we used to live by memos, but I feel like sometimes they overuse them.
[00:16:50] Bhav: For me, a good memo should drive change. So now going back to the point where if I’m going to, if someone’s asking, we don’t trust the data. If you’re going back to them with the lowest level of insight through a chart or a number or something like that, it shows to the user, like, I don’t really respect the question.
[00:17:08] Bhav: Whereas actually, if you turn it into a white paper, I’m not saying just write white papers willynilly. Right. But if there is a serious issue about this and people are really raising it through the due diligence, right. In write it in a, in a paper show, Trends over time show the delta between your actuals and what you have in GA in transactional data.
[00:17:27] Bhav: Whatever you can capture that you trust, compare it against GA. Talk about then the causes and the differences between the two platforms. Like why might we be losing data? And explain that actually we’re losing data because of cookie consent, blah, blah, blah. If you have heat mapping software show. That actually most people hit just hit.
[00:17:46] Bhav: Okay. But there are still, there is still a percentage of people who are clicking on the, do not do not, I do not consent. Right. And then you move this conversation, you know, and this question from a very throwaway question to something that as an organisation you take very seriously. So, sorry, I’ve given you a very long winded answer here, Dan, but I wanted to really explain to you what you went to the question.
[00:18:08] Bhav: Like we don’t trust the is not a throwaway question. This is a fundamental issue here. At an organisation level, and that needs to be addressed with the highest level of seriousness that you can possibly imagine. I’m not saying that a document does that, but I’m saying it implies that you’re taking the question very seriously.
[00:18:29] Dan: I agree, and I think this is one of the, one of the One of a very important skill that maybe a lot of, a lot of analysts may not be flexing or learning or developing right now, which is that kind of, I don’t want to say creative writing, but it’s writing, it’s writing it down. So it’s not about forming an email or building a dashboard or responding to a comment in a juror ticket.
[00:18:48] Dan: This is about curating a document that is like influencing multiple stakeholders and tells a story and drives action, right? I think this is, this is a key part of that. And you only really kind of, Get used to that by doing it and figuring it out and getting sort of peer reviews and getting people to comment on it.
[00:19:05] Dan: Google docs are a good way of doing that. You know, people can add comments and you can collaborate. It’s a really great way. Either I’ve been doing something recently and I’ve been, what I’m thinking towards is the specific situation that I’m working on without naming names is there’s a company that, that I say have been looking at GA data for a long time.
[00:19:19] Dan: They have, they’ve been using a cost per acquisition. That’s their, that’s sort of like a North Star metric, right? They’ve been using for their marketing activity. What’s happened is they’ve had a single kind of target cost per acquisition. And what’s happened is once they started respecting consent, that cost per acquisition really so they were still tracking all the costs, right?
[00:19:39] Dan: Because that’s over in the app platforms, that’s not affected by consent, but the acquisitions they were using or pulling from Google analytics or Google ads, which are, which have to respect consent. So all of a sudden their cost per acquisition shut up and they were like, what the hell’s going on now?
[00:19:53] Dan: The actual cost per Performance never changed. But the measurement of that changed. And then what happened is Google introduced them, well, at least after a little while, they introduced their modelling capabilities to then model out those conversions and then kind of reintroduce all the kind of fuzzy, fuzzy data, which maybe is a conversation for a different time, whether that’s useful or not.
[00:20:11] Dan: And then it kind of came back down, but it’s at a different level now. And so what they’re doing now is there, they didn’t know that this was happening. They didn’t know, they didn’t really understand. So there’s an education piece here that, that that would have changed. They didn’t know why it changed again.
[00:20:24] Dan: Once Google introduced all the modelling and I’ve done a white paper, as you said, to kind of go through that explanation, but what they’re now Kind of trying to struggling with as, as a business, as a, as a leadership team, as a, as a kind of focusing on the performance side of things or the performance marketing side of things, like how, how does one reevaluate what a target CPA is?
[00:20:45] Dan: If you’ve always had, let’s say 30 pound CPA. And it’s now me. The measurement has to be different. How do, how does one go about changing that or addressing that? Now, the explanation of why has now been, let’s say it’s ticked, done. What, what’s, what happens next? How do they work with that? ’cause I think both the marketers are struggling with this.
[00:21:05] Dan: The business owners are struggling with this. The, the kind of analytics people are struggling with this, and I don’t think anyone’s kind of, it’s not, no one’s really clear on. What that new benchmark is or where to go next or how to move beyond the conversation of we don’t know, it’s, it’s odd, it’s hard and it’s difficult and there’s no one answer.
[00:21:24] Dan: So what’s next? Do you know? I’m sorry, I’m asking this if you have all the answers here, but I’m just asking and I want to see what you think about this and if you have an approach to go next.
[00:21:32] Bhav: I don’t have a, because I’ve not been in the situation where someone, where a company has, you know, had to basically reevaluate their entire target CPAs because of the fact that consent has changed the underlying the underlying way of, you know, the way that it’s been calculated.
[00:21:46] Bhav: And it’s, it’s now, you know, you’re now dealing with a slightly inflated, well, slightly, maybe very inflated CPA. So I think the way I would do this is. I won’t approach any situation. I would look at the, like the source of truth. So I would start out, let’s say, let’s say the company deals with purely website data, right?
[00:22:05] Bhav: Assume that it’s, it’s, you know, there’s no off offsite transactions happening, right? It’s an e-commerce organisation. And let’s assume that they don’t track just new customer CPA. They track CPA of any conversion. Actually let the figure, even if they, let’s just say they’re doing new customer CPO, right?
[00:22:21] Bhav: What you have in your database is your like your your customer database and your transactional database. You will have your complete list of first time orders. Now, with those orders, you won’t necessarily have a marketing channel, right? However, what you do have is you do have your total cost, which as again, just Quoting you, it remains unaffected.
[00:22:45] Bhav: So now you’ve got two pieces of information that have not been affected by consent. What you have is your total spend data coming from your ad platforms and you have your total transactions data, new customer, existing customer, whatever it is, you’ve got that data somewhere. And these two have been unaffected by consent.
[00:23:02] Bhav: So I would take that and I would calculate a global CPA and just see how that trend has been doing right again. It’s not perfect, but. You know, stick with me. Then I would take the CPA that’s coming out of my platforms like GA and I would literally stack it on top of, on top of my actual CPAs and see at which point this, if all, if the, if the above CPA hasn’t moved much, but the boss, you know, the GA for CPA is, is changing, you can start to identify the trends and the patterns within that data.
[00:23:33] Bhav: And you can start to the CPA that we have coming out of GA was historically always X percent higher than our actual CPA. So let’s, and then if that’s obvious shifts to like why percent difference, then we can say, okay, well, we have seen a shift in change. The CPA has gone up by this amount. So let’s model it backwards and say, actually if we apply that same uplift to what we see in GA from a channel level, we can then potentially start to be able to say, well, this is the, the, the, the GA, the CPA being reported by GA or whatever.
[00:24:10] Bhav: But the, the true CPA is actually 10, 10 percent lower, or 12, you know, 15 percent lower, 20 percent lower, whatever it might be. So I would, I would treat it as a modeling exercise. And then you can do that by a channel level, right? So you, you are, of course, you are using the same constant. And. Which may or may not be true, but you can always, you know, tweak it and flex it.
[00:24:30] Bhav: You can actually say, actually, maybe the Facebook one, the Facebook CPA needs to be a little bit more aggressive, like a little bit harsher than the one coming from PPC ads. So that’s how I would do it. I don’t have a perfect solution. You put me on the spot here, but if I had to think about the problem, I would, I would go back to what do I know that is true? What do, what is the thing that I suspect to be a foul play and how can I reconcile the two?
[00:24:57] Dan: For sure. And Google Analytics is not a source of truth, as you can quote me on that one. No, Google Analytics is definitely not a source of truth.
[00:25:03] Bhav: And I don’t think anyone expects it to be. As I said, it just goes back to, is this, you know, is the number, are the numbers that we’re seeing reliable and are they directionally accurate?
[00:25:13] Dan: So can I, can I maybe shift this, this perspective or this conversation on slightly and just say like one of the question or one of the thing that I’m always kind of talking about with our clients is does having 100 percent of, of data tracked in something like Google Analytics matter? If we come back and think about.
[00:25:29] Dan: It matters if we use the data, if it’s affecting decisions, if it’s actionable, and if it’s applied to something, right? Let’s say I’m optimizing media campaigns, or what I’m really interested to get from you is the idea of experimentation and testing. If we can’t track 100 percent of visitors to understand what 50 percent was in what content, Sort of a B test or what 10 percent group was withheld from seeing an experiment.
[00:25:50] Dan: For example, does that change how we approach experimentation? And I suppose we can apply that to marketing as well, because now that we can’t actually see the total audience, and we’re now looking at, let’s say 60%, and then we’re testing within that 60%. Does that change the actionability of the data now that we can’t?
[00:26:09] Dan: We don’t have visibility or actually is it essentially the same and it doesn’t affect our approach or analysis or our testing capabilities.
[00:26:17] Bhav: So I’m not sure if this answer would be in line with how you treat media campaigns, but let’s look at it purely from an experimentation and AB testing lens.
[00:26:29] Bhav: So I take a lot of comfort and I mean a lot of comfort in statistics, right? And when we launch an experiment as a 50 50 test. Even if you have got consent issues and you’re only tracking, you know, you’re only running this experiment on 60 percent or 70 percent or whatever it might be, the people who are exposed to it have been exposed at a, at a 50, 50 level.
[00:26:58] Bhav: And we do these checks constantly in experimentation. We’re looking for something called sample ratio mismatch. So if there was an issue with the consent mode. And for some reason, it was affecting the experiment. We would see that in the allocation of traffic. And you can run statistical tests to say, actually, in this one, we’ve got 5, 000 users.
[00:27:19] Bhav: In this one, we’ve got 5, 500 users. Is that randomization and just sheer chance? Or is there a statistical error that’s causing you to have You know, 500 additional users in one group, and you can run a statistical check to be able to say, actually, this is outside of what we would expect, and therefore you have got some bias in this test.
[00:27:39] Bhav: And we do this all the time. We run these checks to be able to see. So regardless of the fact that actually only 40, you know, like 30, 20 percent of people are missing. The fact that the, and again, it goes back to what’s in the cookie, is in the marble bag, right? The people who are in the marble bag, as long as they’ve been split 50/50, it doesn’t affect the validity of the experiment.
[00:28:00] Bhav: The experiment still stays valid, because whatever, and that’s the whole beauty of experimentation, is because what you’re trying to do is you’re trying to control for external factors or confounding variables. By randomising, you’re holding everything else true, except for the one thing you want to introduce and change, which could be some content, it could be a feature, it could be pricing, whatever you’re testing.
[00:28:23] Bhav: So, and then the uplift that you see, or will not see, or whatever, is statistically valid, provided you let the experiment run for long enough and it’s a powered experiment. But generally, I don’t think consent, if anyone’s listening to this and they’re worried about consent affecting their experimentation, I don’t think the experiment will be affected by cookie consent issues.
[00:28:43] Bhav: I think cookie consent issues is more of an issue with marketing channels. So I don’t know how you’ve, what you think about this then, like from a marketing perspective, I know you want it to talk like, take this question on question from a marketing perspective and a campaign analysis perspective. Like what are your views on it?
[00:29:01] Dan: Yeah, it’s really hard because the one thing that I’ve I kind of go around in circles with is that you can’t validate this, right? You can’t validate data anymore. So back in, I don’t want to sound old, but back in the day, I used to do a lot of work, which was validating sessions and clicks, so marketing ad clicks.
[00:29:18] Dan: Two sessions on a website and that was an acceptable level, right? There was running a consensus is around 50 percent of display clicks were accidental. So, okay. So if we had a 50 percent session to click rate, then that’s good. So that’s fine. We can move on. And then PPC, it was more like 90%, right? Or 95%.
[00:29:34] Dan: But now that we can’t do that, there’s really no way for me to know if we are even accurately, and I use that term because I want to come back to at the moment, accurately able to even measure as much as we can measure, because I don’t know what a good percentage difference between sessions and clicks are now with ad platforms, because the thing is, is the consent rate might be different.
[00:29:52] Dan: And we’re talking about the kind of Relative sample size. So if you’ve got people coming from Instagram, they might accept your cookie banner at a different rate than if they come from paid search or if they’re on desktop or mobile or what part of the country or what country they’re in. And so because you can’t then.
[00:30:06] Dan: Compare that. You don’t know the benchmark. You don’t know how to validate that at all. I’m really stuck to be like, okay, well, you know, now there’s no validation. I’ve not done validation in time, even sort of like conversions. Like let’s say it’s e commerce purchases against order IDs. Like, okay, we’ve got 60 percent of purchases in GA4.
[00:30:22] Dan: Great. That could be fine. That could be a bug. I don’t know that we could be technically missing some, but also that could also be the number because of consent. So I think that what’s really hard is, is kind of like. Coming to terms with that and then as an analyst first, but then also kind of communicating that back to stakeholders and saying, well, I can’t convince you that this is okay now because I can’t validate the data.
[00:30:44] Dan: So before I compared two data sets together and then we moved on. So it’s really difficult to say what, what I’ve been thinking about a lot. And I’d love to get your perspective on this, this term accuracy. There’s, there’s something here, which I think is the seed of a blog post or something more in depth, but I, and the reason I titled this episode, the death of accuracy is because I don’t think accuracy is a, is a useful term to throw around in this context anymore.
[00:31:08] Dan: Because I, I know what you’re saying around, you know, the marbles in the bag is still accurate. I don’t know if we can be certain of that. Like, we, we don’t know. I can’t validate. I don’t know how many marbles there were to begin with. And so I’m wondering if we move, if there’s an idea there to move away from this concept of accuracy and actually the concept of is it accurate or not?
[00:31:25] Dan: That’s the wrong question and say, is it, is it directional? Maybe directional is a better word. Like, can I still use this data? Is it actionable? Is it directional? Is it telling me what I need to. See, is it, is it showing me change where change is important, whether it’s up or down. So I’m trying to find a better way of talking about this without being like, is it accurate?
[00:31:43] Dan: Can I trust it? And actually, I don’t know if it’s accurate. And I actually can’t tell you yes or no there. And I actually, that I’m trying to kind of come up with a new way of approaching this, which is at the moment, I’m kind of settled on this idea of directionality. And as the change directional.
[00:31:59] Dan: Now for our long term lessons, it’s no secret that I like to run training courses all around Google analytics, tech manager, and everything in between. Check out a full list of courses over at measurelab.co.uk forward slash training to see all the courses and workshops that we have available. Everything from.
[00:32:13] Dan: Learning Google analytics for, to Google tech manager, data visualisation, and short for workshops to cover small specific areas of interest, such as user provided data, generative AI for data analysis, as well as lots of other stuff. That’s measurelab.co.uk/training for all the details, or you can click the link in the show notes, or if you’re watching this scan, this QR code.
[00:32:31] Bhav: Now, but then someone’s going to ask you, how do you know if the direction is accurate, right? You’re always going to fall back into the same trap. And so I think it’s, I think it’s not a case of. It’s a, again, a really interesting one. I don’t think it’s a case of shying away from the question. I think it’s as, as people just doing the best we can to try and answer that question as best as possible.
[00:32:52] Bhav: And it’s, you know, it’s going to be through content QA ing, checking and testing. When you implement your GA setup, like, you know, making sure you can’t test for every single browser. You can’t test for every single mobile device. You can’t test for every single browser version, right? You have to be able to do the best you can and if you can if you’ve done what you can And you can demonstrate that and again some of it comes down to like doing like sanity checks I like to look at entry page report because i’ve seen a checkout page have You know, 500, 5000 entries into the site and something’s clearly wrong there because people don’t enter the site via checkout page and it turns out that the cross domain tracking wasn’t set up properly and so they were losing that session and again that that is a accuracy problem with a Very easy solution.
[00:33:44] Bhav: We fixed that very, fairly quickly, but the problem was you have to know where to look with some of these things. Right. And I don’t think that there’s a right or wrong way. I think me and Nico Nico’s a friend of mine and ex co founder on us doing causal, we set up a way to debug problems. You know, when someone says to you, like it’s conversion down, right?
[00:34:06] Bhav: There’s a step by step process that Nico and I put together saying, okay, is it actually down, is it? seasonal, is it blah? I can’t remember exactly what it was. And it’s, it’s on the website. I haven’t looked at it. It’s on, it’s on our website. We haven’t looked at for quite some time. I think having, following the same sort of like steps when someone says to you, I don’t trust the data.
[00:34:26] Bhav: Okay. Let’s understand what you mean by that. Of course you have to do some level of incisive questioning. I talk, you know, you and I’ve been talking quite a lot recently about asking better questions and I don’t trust the data is one of those ones where it’s okay. It’s not so much a question, it’s more of a statement, but yeah.
[00:34:39] Bhav: Ask, making better statements is probably a good way to think. What can, how do you, how do you turn this into. Something that you can go and check because if you know, as an analyst, there’s nothing more frustrating than I don’t trust the data is like, well, give me something to go with. Right? I’m not a magician.
[00:34:54] Bhav: I don’t know. Tell me why you don’t trust the data. Tell me what led you to believe that there’s a problem here. Show me something, you know, like if, and if you can point someone in the right direction saying, Oh, well, this number used to be like this. And suddenly we’re seeing like this. So now we don’t trust it.
[00:35:08] Bhav: Okay. Thank you. That’s all I needed. wasn’t too hard. Now I can, I know something, I know where to look. So it kind of gives you a starting point for me. I always, you know, when I guess a question like I don’t trust the data or, you know, there’s something wrong with the website or conversions down, I kind of go through and say, you know, is it actually down?
[00:35:27] Bhav: Yes or no. Is there a mark shift in marketing channel? Yes or no. And then I kind of like go through this like very, methodical step by step process of things that I want to check off until I arrive at some type of solution. And I will, but I will also draw a line, you know, I’m not going to spend two weeks going down a rabbit hole with, you know, where the, even if I do find the outcome, like, you know, by, by day two, oh, it’s, oh, it’s resolved itself.
[00:35:53] Bhav: It’s like, well, okay, who cares then? I’m not, who cares? Why did it happen? Of course, but I’m not going to kill myself if it wasn’t a fundamental shift. It was just someone changed something and I don’t really care if it’s back to normal.
[00:36:05] Dan: There’s, there’s, there’s two things there. So first of all, you forget our last conversation with Stine Rasmussen in the last episode that we are data magicians now. So we are, we are data wizards. I think you said, so we are, we are magicians and we can make the numbers match. Right. Surely. But the, but the thing there is, is okay, let me rephrase this. And that’s the last question I have for you. And then we can wrap this up or at least pause it for another conversation.
[00:36:23] Bhav: But when did I become the guest on this episode, Dan, I feel like somehow at some point in this episode, You shifted me away from co-host to guest.
[00:36:32] Dan: I respect your opinion enough to want to ask you this stuff, right? I’ve managed to pin you down in a schedule of recording this podcast. That’s good enough for me.
[00:36:39] Dan: So look, I’m not expecting like a definitive answer. It’s just good to get some, some, like, just get the conversation going between us. I think, and if we’re thinking about it, and if we don’t have the answers, I’m sure everyone else is thinking the same kind of thing too. The last thing, I’m going to come back to a very specific feature.
[00:36:55] Dan: Which is coming all the time because it’s a Google analytics feature. So my question for you if, if, if accuracy is, it still comes back to accuracy does things like data modeling to fill in gaps to Model back up to having a hundred percent of data or at least quote unquote a hundred percent of data Is that a good feature for us to use as an industry?
[00:37:15] Dan: Because I think it because there’s two sides of this for one thing like it’s not accurate. It’s a black box. It’s machine learning. We don’t know how it works. The numbers are back to being quote unquote higher I but it’s, it’s kind of papering over the cracks of dealing with the issue of using the data we can do and validate and contrast right?
[00:37:31] Dan: So in terms of the accurate bit is no longer the bit we’re looking at, we’re looking at this kind of estimation as a total, or is it, are we on the side of the fence or are you on the side of the fence where actually looking at this kind of modelled sort of data to account for that is a really clever, valid way of using the data.
[00:37:47] Dan: The data we can trust and he’s accurate to go forward. I’m stuck in this kind of two minds. I speak to some clients whereby they’re like absolutely switch it off because we don’t want any of this estimated data on top. We just want to look at, even if it’s partial, the accurate bit. And other people are like, well, obviously we’ll use it because we don’t want to look at partial data because we need to do year and year comparisons, et cetera, et cetera, et cetera and this is the closest they can get. So what’s your perspective on this kind of approach to dealing with partial data.
[00:38:13] Bhav: I think this is a company problem. So let me rephrase that. I think this is a problem that a company needs to decide the path forward with. And if they, and I don’t think there’s a right or wrong answer on this.
[00:38:26] Bhav: If you as an organisation decide that you want to work with model data, because you know, you need a more realistic CPA to ensure that you’re not. over-investing or under investing, then that’s fine as well. And if, but if you’re an organisation that prefers to work with actuals and not model data, because for whatever reason, it could be that, you know, you’re operating in a space that’s quite sensitive, like financial, I think like the finance sector probably don’t use too much model data for people they don’t track.
[00:38:53] Bhav: Maybe I’m wrong. I don’t know, but there’s probably higher levels of like regulations in place that mean that they can’t, they, you know, they can’t trust anything that they don’t build and they don’t understand. Simply, I know, I know that, for example, the most pharmaceutical companies refuse to use R, you know, and, and Python.
[00:39:08] Bhav: And the main reason they do that is because, you know, they still use old legacy, Statistical platforms like MATLAB or whatever it might be, they might be using is because of the fact that actually the packages that come out of R and Python are unregulated and they’re unmoderated. So I think this is a very much a business company problem, but let’s say if, you know, if you had to hold a gun to my head, I’m actually pro modeling on, you know, based on the situation.
[00:39:35] Bhav: And I, you know, I take, And this is a little bit outside of my now pay grade or knowledge zone. You know, I’m not, I’m not Google. I didn’t build this algorithm, but I know Google have a very extensive and heavy R and D team. We, you know, we all know that in Google invests millions and millions, probably, probably even billions into research and development.
[00:39:56] Bhav: So if they’re building a, if, you know, if they built some type of statistical model to model out your data and try to account for missing data, They probably got enough data that, you know, they’re not sharing or making more available to know that actually, you know, we have this like, you know, and that’s probably why it’s a black box is because they probably have got that data and like, and cookie consent is really just for.
[00:40:22] Bhav: Organizations, Google don’t care, they probably capture everything and, you know, they know how many people clicked on your ad or whatever, or, you know, and they, you know, I’m sure they’re doing some smart things. So, I would say, and especially, and I say this even more recently having seen the power of the help me out again, the HyperLogLog algorithm.
[00:40:40] Bhav: I, having compared the HyperLogLog algorithm to, What’s actual big query data and seeing that it’s only within two, two, two and a half percent. It’s really close, right? So if you’re going to, if you’re the type of person that’s going to scrutinise over like one and a half percent or 2 percent discrepancy, then maybe you shouldn’t be using models and you should just be using what you have and accepting the fact that you’re going to be missing 20, 30 percent, whatever it might be.
[00:41:11] Bhav: And I think you just need to come to terms with that. So not a perfect answer on this one, Dan, I don’t really have for you.
[00:41:15] Dan: I think the data is directional. We can use it because it’s directional, right? That 2 percent doesn’t matter.
[00:41:21] Bhav: Don’t get me wrong, I love direction, I love the concept, and I, I use directionally accurate, the phrase directionally accurate all the time.
[00:41:28] Bhav: It’s, it’s a core part of my language. Right. And, and, and, because I think it’s, it says the right thing is actually, is this data directionally accurate? Yes. Well, then who cares if it’s off by like some, some small percentage.
[00:41:41] Dan: Amazing. I love that. I’m going to skip past the triggering word. You use Matlab back then, because that brings me right back to 2008, nine, when I did that at university.
[00:41:49] Dan: But I think that’s enough for today. So thank you for guesting on your podcast, Bhav. It was nice to grill some, grill you with some questions around accuracy, directionality, to, Directionally accuracy and everything in between. I look, this is not something we’re going to solve in one episode and I’d love to hear other people’s approaches and thinking around this.
[00:42:07] Dan: If you’re an analyst, if you work in analytics and you’re out there and you’re dealing with this same stuff and you’ve got a an idea or, or, or an opinion or something you’d like to share with us or talk about join the crap slack community and share it with us, please. Cause I’m. I want to know not just to talk about it on this podcast, but I do this for a living too.
[00:42:22] Dan: And I think any kind of information we can share with each other, it’s just going to be really useful. All right. With that, I think yeah, we’ll leave it there. And yeah, I’ll speak to you soon, Bhav. Thanks for, thanks for joining.
[00:42:33] Bhav: Lovely, lovely, lovely to be here again.
[00:42:35] Dan: Awesome. See you soon. That’s it for this week. Thank you for listening. We’ll be back soon with another episode of the measure pod. You can subscribe on whatever platform you’re listening to this on to make sure you never miss an episode. You can also leave us a review if you can on any of these platforms. We’re also over on YouTube. If you want to see our lovely faces and our lovely guest faces while we do this as well, make sure to subscribe.
[00:42:56] Dan: To the measure lab channel to make sure you never miss an episode as they come out. If you’ll leave us a review, that’ll be hugely appreciated. You can do that on most of the podcast applications, or there is a form in the show notes. You can leave feedback directly to me and Bhav. Thank you for listening and we’ll see you on the next one.