#45 Forecasting analytics with machine learning (with Richard Fergie)

Written by Daniel Perry-Reed | July 15, 2022

The Measure Pod

00:00 / 32:39

This week Dan and Dara are joined by Richard Fergie (creator of Forecast Forge) to chat all about forecasting with machine learning models, and the flaws in Google Analytics 4’s machine learning aproach to watch out for.

Her’s Richard’s rules for using any machine learing forecast model:

No machine learning is allowed until you’ve plotted the data!
Make the simplest Forecast that you can
Look at where is it that this simple forecast is wrong, and correct for it
Repeat te process until you get fed up, or you get diminishing returns

Check out Forecast Forge for details on pricing and how it all works – https://bit.ly/3aB1Zco.

Reach out to Richard on Twitter – https://bit.ly/3aGtxgC.

In other news, Dan plays more games, Dara goes outside and Richard emigrates!

Follow Measurelab on LinkedIn for all the latest podcast episodes, analytics resources and industry news at https://bit.ly/3Ka513y.

Intro music composed by the amazing Confidential (Spotify https://spoti.fi/3JnEdg6).

If you’re liking the show, please show some support and leave a rating on Spotify.

If you have some feedback or a suggestion for Dan and Dara, fill in the form https://bit.ly/3MNtPzl to let them know. Alternatively, you can email podcast@measurelab.co.uk to drop them a message.

Transcript

[00:00:00] Dara: Hello, and welcome back to The Measure Pod, a podcast for people in the analytics world where we talk about all things analytics. I’m Dara, I’m MD at Measurelab.

[00:00:25] Daniel: I’m Dan, I’m an analytics consultant and trainer at Measurelab.

[00:00:28] Dara: So today we’re joined by Richard Fergie. Who’s the creator of Forecast Forge, which is something we’re going to talk about in this episode, and he’s also a consultant data scientist. So Richard, first and foremost, it’s really, really great to have you on The Measure Pod, welcome.

[00:00:42] Richard: Yeah thank you for inviting me.

[00:00:44] Dara: So Richard, what we always do when we have a guest on is we ask the same question really, which is how did you get into analytics. And this is your chance really, to give us a little bit of an overview or give our listeners a bit of an overview of you and your kind of journey in analytics.

[00:00:59] Richard: So I started off working in paid search, Google Ads, Bing Ads and whatever those were known back in the day. And that’s really good because it’s very, it’s data driven and it’s also very actionable in terms of like, you’re not having to go through like 10 layers of committees in order to get something done. Like generally, if you are the guy on the account, you can see an opportunity, you can go straight after it. I made a few embarrassing mistakes caused by not measuring the right things. I was fairly new in my career, people were like make this number bigger without spending too much money. And I was like, okay, and I didn’t really question like how valid that number was or whether things like leads or high quality leads or anything like that.

[00:01:39] Richard: So trying to not have to make those mistakes again and have those embarrassing conversations, led me more into analytics and tagging and how do we track these things. How do we make it like more, or as closely related to actual business goals, rather than just like the easy to measure proxies that a lot of people use. And then as you get more into analytics, there’s like a whole host of other data analysis and data challenges that you can run into, which has led me more into the machine learning and data science and most recently forecasting. That’s the sort of really quick version from paid search to web analytics and then into data science from there.

[00:02:19] Dara: That’s great, and you just brought back some painful memories for me talking about making mistakes. It’s the best way to learn, but unfortunately when it comes to getting numbers wrong, especially if it’s early in your career, it’s a baptism of fire.

[00:02:30] Richard: I would rather have not have made those mistakes, but I don’t feel like those were unusual mistakes to make, you know I think the industry’s grown massively since then. There’s probably thousands of people making the same mistakes this week.

[00:02:41] Dara: And you mentioned forecasting, so in the brief intro I gave you, we talked about Forecast Forge. So do you want to explain a little bit about what Forecast Forge does, because that’s probably going to spur a lot of our, Dan and my interests and our questions.

[00:02:54] Richard: Yeah so the background to that comes out of some of my data science work, where I would be asked about, can you do some data science? Can you do some machine learning, powered forecasting? And I’d be like, yeah, of course I can do that. And then the client would send me some numbers over, maybe I’d request some extra data and I’d do all the number crunching. I’d be coding on models and testing them against each other, and I’d present the results back to them and they’d be like, oh wow Richard. Yeah, this is great, this is great but it’s wrong here. I’d be like, well, what do you mean it’s wrong? I’ve done all the data science here. It can’t possibly be wrong. How can you say that my thing is wrong when you haven’t even like crunched in any numbers to tell me that. I wouldn’t be that rude, but that was essentially my feeling at the time. And they’d be like, we’re launching the autumn sale then. Or we’ve got some new models going live on the site then, or we’re running some TV ads at this time of year, or we’re shifting spend between these channels and all these things are going to change the numbers of the forecast.

[00:03:48] Richard: The forecast is wrong, and in some ways like that’s on me and not asking the right questions earlier in the consulting engagement. But in other ways it’s very, very difficult to ask all the right questions at the right time to make that happen. So I think in some ways, like a more iterative approach is the correct answer, but in other ways, it’s to do with like trying to get that specific knowledge that only people within the business have and integrate that into the model or into the machine learning gives you a better forecast than using a better algorithm or using something that’s cutting edge in that respect. So I began to think the algorithmic side of things is less important and it’s more to do with the feature selection side of things. What information is necessary to make a good forecast? How can you enable that to happen?

[00:04:38] Richard: From my experience of work, I meet a lot of people who are very like they’re capable, analytical thinkers, and they’re really good with spreadsheets, but they’re not comfortable with code. So the barrier is more to do with the coding skills than it is to do with figuring stuff out, which is where Forecast Forge comes in. It’s a tool to enable them to bring in this extra information into a spreadsheet, which can then be passed on to one of these machine learning forecasting algorithms. We’ll add in the weather forecast data so we can better calibrate our barbecue sales, depending on when we think the first heat wave is going to hit. There’s all kinds of this business specific stuff that they can add in because it’s a spreadsheet. They can do that themselves rather than have to code it up which is a much more niche set of skills at the moment. And that gives better forecast results than running some fancy, deep learning algorithm that’s being trained for 30 hours.

[00:05:32] Daniel: It does sound like you described to me right there, Richard, some competent analytics thinker, but with no proficiency in the coding side of things. It is definitely something that I’ve seen and admired.

[00:05:44] Richard: There’s a lot of people where they’re not comfortable in that, or they’re at a stage in their career where like taking the time out to learn that sort of thing is just not a good use of their time. Now I’ve run like coding training sessions and for some people it’s like, so you’ve shown me how to do this, but it’s really slow. If you just do it this way for three months, then it won’t be really slow by the end. And they’re like, what? I’ve gotta be really slow for like two and a half months before I can get back up to full productivity and that’s just not a good trade off. If you’re very early in your career, I think it is a good trade off, but as you’re more senior, then that’s a big chunk of time and value that you’re not delivering, isn’t it.

[00:06:19] Daniel: I mean, it’s that idea of starting again when you’re already proficient at something, it’s daunting to the best of us. But actually that’s my question actually, Richard, around the Forecast Forge tool. So this is a spreadsheet add on that does all this crazy stuff behind the scenes, do you have to have an awareness of what it’s doing? Or is this you can be for beginners, you can go in and you can forecast whatever you want to whatever end that you want to do. Who’s it kind of aimed for? Who’s the best person that you’d like to be using this tool?

[00:06:42] Richard: That’s a really good question. So one of the reasons it’s called Forecast Forge is because you need to hammer at it in order to get the best results. So it’s not something where you are like, okay so I paste these numbers out of Google Analytics into this column, and then I press the button and then I get a brilliant forecast. But it is something where if you’re used to spreadsheets and you’re used to thinking about data, you can be like, okay, so I’ve made the simple forecast and it looks like it’s wrong here. Why is it wrong there. Okay maybe it’s because of this, let’s try and add some data in that will help inform the algorithm of whatever that thing is and then you can, you can sort of experiment and change things as you go. So you can be a beginner in certain respects in that you don’t need to know about machine learning or know a lot about forecasting, but you need to be curious and determined to get the most out of it. It’s a tool rather than a solution, I guess.

[00:07:40] Dara: So it’s putting the tool in the hands of people that have that business context that you mentioned earlier, because you gave the example of when you created one of your first models for a client, you were missing out on all that context that they were able to add. So is that the idea that you give them this tool so they don’t need to know exactly how it’s working, but they’re going to have the best understanding of whether the model looks sensible or whether it’s taken into account all of the differences that are specific to their business and their market.

[00:08:06] Richard: Yeah, exactly. So what I was talking to a user the other week and a big thing with everyone’s forecast at the moment is trying to control for the effects of COVID, project that into the future. If you’re doing year over year comparisons at the moment, it’s really, really difficult. I have a sort of simple way that I advise people to start with that. But this guy was working on some sites for used car dealerships and, you know, he found out that the best way to control for it was to consider when car dealerships were allowed to be open and when they were forced to be closed and that’s like really, really obvious if you are working in that industry, but it’s not something that I would build into a tool at all unless I was specifically building a tool for car dealerships.

[00:08:48] Richard: As you know more about your client’s businesses or know more about the businesses you work in, there’s tons of that stuff that’s really, really obvious if you’re in that business or is not sort of generalisable outside of that business, which is why I think to get a better forecasting result it’s more about that specific information than a fancier algorithm.

[00:09:08] Dara: So how customisable is it? Like can the end user, can they add in different campaign dates?

[00:09:13] Richard: All the data that you add in is a column in the spreadsheet. So if you’re talking about campaign dates, you could have a column where it’s a zero for days when the campaign isn’t running and a one for days when the campaign is running, that would be a really simple way of doing that. Or it could be the amount that’s spent for the campaign each day, depending on what the campaign is, it could be like the reach of the campaign on a given day or the reach of the campaign for that week split out across the seven days. So there’s lots of different ways of doing it, but yes, all the extra information that you need to add needs to be converted into this column in a spreadsheet format. That’s one of the challenges for the users to figure out the best way of doing that. Then the next step is how can I encode that in a way that fits into the spreadsheet into one of these columns?

[00:09:59] Daniel: Is there like a common type of data that people end up forecasting? You mentioned Google Analytics throwing that in there a minute ago. And obviously we work with Google Analytics day to day. So for us, obviously we are thinking in terms of applications of how to forecast the web analytics or app analytics data. But traditionally, I suppose, or commonly from your experience of people using your tool, are they pulling out web traffic data to predict things like campaign performance or spend or return on investment? Or is it something completely different that’s nothing to do with the web analytics side?

[00:10:28] Richard: The algorithm that is used behind the scenes is not specific to web analytics data, that would work fairly well on any data that has certain features like annual seasonality, weekly seasonality and then certain types of trend. Most of my users are using it with either web analytics data or online data from online marketing platforms, be that like Google Ads or Facebook or other ones like that.

[00:10:55] Daniel: That’s really interesting, the way I talk about it is always that Google Analytics is always looking in the past. Whatever you’re doing in there, take attribution modelling for example, you’re always modelling or looking at what happened up to three months ago. And it’s really good to understand what has happened, but it doesn’t really do in you any favours to tell you what’s to come, and I think that’s the biggest blind spot of Google Analytics, or at least Universal Analytics today. So it’s really interesting how these two things can tie together, because like I said, a lot of marketing analytics is looking backwards. It’s very interesting and very seldom do you get to do that forward looking, and I think it’s a really interesting approach, but there is a question tied in here Richard somewhere. And that is with the introduction of Google Analytics 4 they’re leaning into the modelling like crazy and not just the data-driven attribution or the behaviour modelling and things like that. But they are doing predictive modelling for either anomaly detection or they’re doing their forecast sort of conversion and churning within the kind of the standard model, I suppose, the standard licence of GA (Google Analytics). I just wondered what your thoughts are on that, or if you’ve done any validation to it to see if it’s any good at all.

[00:11:54] Richard: It’s a really exciting development, I think, to make people, or as you say, like a big problem of the web analytics data or like data analytics in that respect is that it’s is all backwards looking and at some stage when people are deciding what actions to take, they are converting that into a forecast, even if they don’t make it explicit. I think what’s often happens is people are like, oh, so this is what happens in the last three months, let’s assume that that’s going to be what happens for the next three months and then we’ll make decisions accordingly. Making the forecasting step more explicit is a very good thing I think. Obviously that’s competition for me and competition that’s offered for free is part of Google Analytics. So in some ways that’s bad news, but in other ways what I’m selling with Forecast Forge is for people to be able to add in the extra information that they know as specific owners of the business.

[00:12:45] Richard: In some cases, Google will be able to use their fancy machine learning to figure out a lot of that. Like their assessment of seasonality say might be more sophisticated. I don’t know how their data sharing would work for that forecast, but maybe they’re using data from other similar sites to help make a better forecast for your site as well. But what they don’t know is things like what’s in your marketing plan or when you’re going to take these actions into the future. So that kind of thing can’t, that information is not available to them and as far as I’m aware, there’s not a way for you to give that information to them at the moment. So that information is not used in those forecasts. So there’s always going to be a bit of a blind spot there. That’s also true with a backward looking view as well. If you have a massive spike, two weeks ago, because you suddenly featured in the newspapers, they don’t know what that is or why it happened and that can affect like the uncertainties of the forecast in the future in a way that it’s possible to control for if you have a bit more like customizability of the input. So I think it’ll be a real test for whether my hypothesis is right because I’m sure that their machine learning is more sophisticated and better than mine because they have a whole warehouse full of top level PhDs working on it whereas I do not. But they also don’t have the ability to include this extra context which is what I think is important for getting the best forecasts.

[00:14:04] Daniel: And of course that it’s a black box and there’s absolutely no way to validate it or to see what the code is they’re using and what they are accounting for. I think that’s the other side of this, no matter how sophisticated it is, there’s no way for us to know what they’re doing and how they’re approaching things. We are only going with the snippets they have on the help centre.

[00:14:19] Richard: What I often advise new users of Forecast Forge to do is to hold out some data, say, give the Forecast Forge algorithm data through until the end of 2021, and forecast the first six months of 2022, and then you can compare what actually happened against what the forecast suggested. And that’s a really good way for spotting when the default forecasts are wrong and starting to think, okay well what was going on during those periods? And how can I use this to improve the forecast? I haven’t heard of anyone like automatically archiving the GA4 forecasts so they can even compare how good they were against the actual values. I’m sure Google are doing that internally, but as you say that’s all a black box and their evaluations, they are hidden from us.

[00:15:05] Daniel: And the results of them too. We dunno how accurate even to their own standards they’re reaching.

[00:15:10] Dara: But I guess it is by its nature going to be more generic. They’re trying to come up with, even if it is sophisticated, they’re trying to come up with something that’s going to work across such a huge number of websites. So for anyone who really wants to be able to tailor it more to their actual business, in my view at least they’re not going to get very far with the modelling in GA4. It might give them an indication.

[00:15:30] Richard: Don’t get me wrong there are some advantages to that approach. You know, if you’re in the automotive sector, say they can see the data from hundreds of sites in that sector, and that is going to be pretty useful for helping, identifying sector trends, which is a big part of what a good forecast can be. Even if you had that data, Forecast Forge doesn’t have a good way of integrating into it. It’s a, what’s called a univariate forecasting algorithm, which means it’s sort of forecasting one thing at a time using the historical data from that thing to make your future predictions.

[00:16:01] Dara: Quite a different question now, but do you find from working directly with both users of Forecast Forge and maybe your consultancy work that you do as well, do you find that clients typically do act on the forecasts or do they tend to be more of a, almost like a budget setting exercise where, because I’ve experienced that where you go through the process of building the forecast and then that’s the end of it. You know, it just gets done maybe once a year and it’s done as part of either a target setting or budget setting or both.

[00:16:29] Richard: So the budget setting one, I think people have changed the budgets based on the output of these forecasts. Because that’s one of the things where you can change it and say like, okay, well if we increased our budget for this channel, what do we think our return on it would be. And yes, it’s also used for target setting, and I believe the act of setting targets can affect a lot of people’s actions, but that’s like way downstream of the bit where I’m working on. A lot of what the Forecast Forge forecast does in a way is like, okay, so this is what it’s going to look like if you keep doing what you’re doing.

[00:16:59] Dara: What about the frequency of re-forecasting Richard? I know it might differ depending on what it is you’re forecasting and what the business needs are, but is there a kind of typical frequency you’d recommend?

[00:17:09] Richard: This is something I get asked a fair amount, and I think of it as a sort of two step process. There’s the step where you build the forecasting model, which is where you decide what training data to include and what features you want to add. Are you including the budgets for this channel? How are you dealing with lockdowns? Like all, all those kind of decisions. And then there’s the updating that data and updating the forecast off the back of that. I think that second step can be done very frequently. You could do that daily if you wanted to, but the first step is much less frequent, I don’t think you should be changing like the inputs to the model every day. Partly because I don’t think that’s going to give you like a much better forecast and partly because that can change what is forecasted quite a lot, which is not something you want to happen every day.

[00:17:57] Richard: But one of the reasons the Forecast Forge algorithm is the way it is, is because I don’t want like, okay you add in an extra day of data and then the forecast completely changes. I think that would be bad usability from a machine learning algorithm point of view. Yes, so update the training data, update the outputs of that as frequently as you like, can be much more cautious about updating the model. I’d do that quarterly at most I think.

[00:18:21] Daniel: Something I’ve been thinking about Richard is around the fact that there’s less observable data, people are collecting less things like cookie banners kind of wiping out a lot of the data that we can collect through things like Google Analytics and that isn’t the question, but actually how I suppose we can use machine learning and I’m just wondering if like people have been using your tool to do this, but to forecast maybe pre and post implementation of a consent management system so that they can see okay, well, based on what we would’ve got in terms of page views or traffic or users to the site, actually, since we’ve introduced the cookie banner, we’re actually seeing 80%, less or 10% less. It’s a really interesting concept to use the same tools in exactly the same way, but to model out the difference of how much traffic would we have actually got if we could track everyone still. I mean, I know Google are doing something along those lines at the moment, but I think this is where I’ve been using like very basic forecasting tools.

[00:19:08] Richard: I haven’t heard of any of my users doing that. It is something you could do at least in the short term, over the longer term, it would be harder to measure the effect of that. If you implemented it last quarter and you want to get an estimate of the effect of it, then yes, that is something that could work. What I’m seeing more of is people forecasting the business metrics directly and trying to link that with them, say marketing inputs, so they’ll try and forecast revenue based on media spend say, rather than trying to forecast clicks or impressions. Like obviously there is a relationship between media spend and impressions and media spends and clicks and media spend and revenue. But as you move down through that funnel, the sort of ease of tracking that relationship gets worse and worse and particularly with things like Consent Mode or the Apple privacy changes (App Tracking Transparency – ATT), things like that. That relationship is a lot harder to tie revenue directly to an impression, say.

[00:20:06] Richard: What the forecasting machine learning is doing there in this case is trying to learn what that relationship is between say revenue and media spend or revenue and marketing activity, which is, yeah, it’s not the exact use case that you were thinking of, but it’s sort of another way of looking at that same problem of like, we can’t track these bits in the middle so well, but perhaps we can use machine learning to figure out the relationships between inputs and outputs in a way that’s still useful for us.

[00:20:31] Daniel: This is all the MMM (Media Mix Modelling), considering there’s less availability to track sort of multi-channel attribution and attribution models as a whole tracking the individual user and all this touch points. So using the aggregates, the daily aggregates, the hourly, a whatever we’re aggregating up to, we’re trying to infer influence based on certain influences and basically doing attribution but without that user ID, I think it’s a really interesting approach because whether we like it or not, that is becoming more and more part of the analysis, part of the learning in terms of the marketing activity or whether you go on sale or not. It’s all part of it now, which never used to be the case, even what, five years ago.

[00:21:04] Richard: I’m very excited about Media Mix Modelling as an approach. It is kind of old school in a way like, and also where the industry’s being forced into it because of these privacy changes. But I think in a lot of ways it is superior to click attribution because it answers the incrementality questions directly which a lot of people avoid, particularly a lot of marketing agencies avoid when they can tie revenue directly to a click without worrying about whether that person would have bought anyway.

[00:21:34] Dara: You need a lot of data for it I believe, is that right to do like effective MMM (Media Mix Modelling) you need a pretty huge amount of data?

[00:21:41] Richard: It depends on kind of the randomness of your baseline. You look at their revenues charts and it’s a very, very regular series of down at the weekends, up on the week, down at the weekends, up in the week, the forecast for that can be very, very accurate and very, very precise without a lot of work. So then you can see that any variation from that baseline, it’s much easier to tie that back to a particular activity. In other cases, it’s all over the place for no reason that anyone can see at all, you know, like, it was raining in Surrey and windy in Manchester so they’re up 30%. There’s so much randomness going on that in those situations you do need a lot of data. You might need, you know, three years worth of data, which is often a lot harder. Or you might need to run quite dramatic experiments around like, oh, well, we’re going to cut our spend on these channels in these areas for example in order to get statistically reliable results.

[00:22:33] Dara: Going back to the tracking consent point for a second, with all the changes with kind of privacy policies and guidelines, could you do something similar to what you talked about with COVID where you map out the different lockdowns? Could you do something to adjust for those changes in user consent? Because otherwise, presumably if you’re modelling based on historical data, you’re going to have had more observable data at any point in the past to what you would have today and tomorrow and next week and next year. So do you think theoretically you could adjust for that?

[00:23:04] Richard: I was looking at a forecast the other day where twice during the training period they’d changed their conversion tracking. Which had, you know, increased the number of conversions that they’d got. And then their forecast through to 2023, basically the algorithm had assumed that they would change their conversion tracking again in a way that got them more conversions like that is what it had learned. It hadn’t really learned anything about the actual business performance, just learned that oh yeah, every now and then they change their conversion tracking and then they get a few more conversions. So the problem you described is definitely an issue. It theoretically can be controlled for in a similar way to COVID, how you would do that might be tricky because the audience it affects changes over time. As you know, more people install ad blockers or different versions of iOS roll out. You’d also be trying to control how those proportions change within your audience. So it’s almost like you’re having to forecast that and then use that as an input into your other forecast. I wouldn’t want to say it was impossible, but there’s going to be a bit of bodging going on there. How much of what the output is, is the machine learning and how much of it is just the user saying like, oh, I think it’s going to be this. Part of the thing with Forecast Forge being customisable is people can definitely put their thumb on the scale if they want to.

[00:24:16] Daniel: So what’s next for Forecast Forge, where do you want to take it into the future?

[00:24:20] Richard: So recently Google released something for Media Mix Modelling called Lightweight Medium Mix Modelling (LightweightMMM) which is interesting from my point of view because it runs fast enough that it can complete within HTTP request response cycle. It’s still a pretty long time, but it’s, you know, it’s measured in minutes rather than hours. So I’m trying to figure out the best way to integrate that into Forecast Forge because people are using it for kind of Media Mix Modelling sort of applications, but it’s not specifically designed for them. So this tool would be better for that. That would open up a lot of opportunities for people on the paid media side of things in particular. So that’s coming, I always like talking to people and finding out the weird and wonderful things they’re doing with it. Stuff that I would never have thought about, which really validates me for thinking, like putting in a spreadsheet where the users can do all their amazing stuff with it. And sometimes that gives me ideas of like maybe I should try and blog about that so that other people can use these ideas.

[00:25:17] Richard: I would like to, someone once asked me if Forecast Forge would work with something like the Solver plugin in say Google Sheets or Excel, which would mean that you could bring in other aspects of machine learning into like minimise this loss, like minimise my forecast error with these metrics, which give it a lot more flexibility and power from a machine learning point of view. But the Solver plugin is basically doing trial and error because it doesn’t understand how changing the inputs changes the outputs of the forecast. You know, all it can do is change the inputs, see the outputs change and then change the inputs again, which is like really, really slow.

[00:25:51] Dara: So Richard, back to kind of the kind of use of Forecast Forge, can you give a kind of overview of the process that a user would follow to actually set up, create a forecast for themselves?

[00:26:02] Richard: Yeah okay, so the first thing they need to get the training data for the thing that they want to forecast. Maybe there’s even a step before that of deciding what the thing you want to forecast is. Try and pick something where the result actually matters, so you’re not wasting your time. So get the training data for that. I have a rule that I provide, so it’s a Forecast Forge rule and it’s a rule for all my other data science projects that no machine learning is allowed until you’ve plotted the data. So draw the charts of what the historical data looks like, you’ll be just be able to use your eyes to see like, oh, well this is generally going up. Or, you know, this is generally going down. What are the big spikes? How did COVID affect things. All these things are, are things that you probably see if you just plot it.

[00:26:43] Richard: Then the first step would be to make the simplest Forecast that you can. I would do this with a holdout set, as I described earlier, like keep some data back from the algorithm, like the first six months of 2022, or, you know, the last 12 months or whatever it is. Pick an amount that’s relevant to the time period you’re trying to forecast. If you’re trying to forecast a week, then maybe you’d keep a few weeks back. If you’re trying to forecast a year, you’d want to keep a year back and hide that from the algorithm, make the simplest forecast that you can, and then you can compare what happened, what actually happened with what the forecast suggests.

[00:27:17] Richard: Maybe a simple forecast is I’m just going to extend it with a straight line, or I’m going to assume that next week is going to be exactly the same as last week, or if you’re using Forecast Forge, that can be the basic don’t add in any extras Forecast Forge forecast. And then you can compare that with what actually happened and more to the point you can measure how bad the error is, how wrong is it each day. You know, how different is that other totals at the end, and then you can tell if anything fancy you do on top of this, whether it works or not.

[00:27:48] Richard: And then the next thing is to look at where or when is it that this simple forecast is wrong. Some cases it just diverges from the actuals over time, it starts off very close, and then, you know, by the end, it’s quite far off. Other times there’ll be specific points where like it’s good up until now, and then it goes wrong or it’s good for a period and then it’s bad for a period and then it’s back to being good again, you need to identify or ask questions around, like, what was going on at these periods? Did something change? Was it a COVID lockdown? Did we change our marketing? Did the tracking break on the site? All those kind of things and ask those questions and then you can start thinking about, okay, well, these are the things that are important that the default forecast isn’t learning, how can I incorporate it? How can I incorporate this information? How can I include it in the forecast? Which the Forecast Forge means turning into one of these, what I call regressor columns where you need to numerically specify that information for each day.

[00:28:42] Richard: And then you can repeat the process, you can run it against your holdout data, see whether your error is any better. See where the new forecast diverges from the actuals. You can keep going with that forever really because you’re pretty much never going to get a zero error metric, but eventually you’ll get fed up with it or you get diminishing returns or you’ll be like, yeah, this is good enough. And then you can start deciding okay, well what are the actions from this?

[00:29:04] Dara: Amazing, brilliant, Richard, thank you for all of that. This is the point in the show where we switch gears and we talk about the wonderful lives we live outside of analytics. So I’m going to put you in the hot seat first. What have you been doing outside of work to wind down lately?

[00:29:19] Richard: Well, not so much winding down, but we’re really busy at the moment because we’re trying to emigrate to Canada in September. So we’ve been doing forms, paperwork, house sales and all the rest. It doesn’t feel like there’s been a lot of winding down, but it’s definitely not analytics I’ll tell you that.

[00:29:35] Dara: Not so much winding down, but exciting.

[00:29:37] Richard: Yeah it feels like it’s definitely going to happen now regardless of how ready we are for it when the time comes to get on the plane.

[00:29:43] Dara: What about you, Dan? What have you been doing outside of work?

[00:29:45] Daniel: Nothing quite as grand I can say. But I found a new video game this weekend. It’s called Road 96 and it’s a really, really, really good sort of narrative adventure game. And you play the game multiple times and you play as teens trying to escape this warring country. Each time you play, it’s a different point in time and you play as a different character trying to escape the country and each time you play you find more about the backstory of the economic and the political situation and the decisions you make in one character’s journey have a knock on effect and they affect the next journey. If you’re interested in single player, narrative games, I can’t recommend it highly enough and it’s, I don’t want to say free, but it’s on Game Pass, if you’ve got Game Pass it’s there, I can’t imagine it’s too much money.

[00:30:25] Dara: Can I get it on the Commodore 64?

[00:30:28] Daniel: Yeah, you could try.

[00:30:29] Richard: It sounds pretty retro, but I guess that’s just the 96 in the name.

[00:30:33] Daniel: Yeah well, the road 96 is the name of the, or the number of the road to escape the country. So it’s relevant to the game.

[00:30:40] Dara: It sounds educational as well, so you’re learning something while you play.

[00:30:43] Daniel: Look Dara you’re not going to play this game. Whenever I mention a game one here, there’s no point pretending. But yeah, no, a really good game, I recommend it. Dara won’t play it, maybe someone listening might.

[00:30:53] Dara: Well mine is a bit, I was going to say boring. It’s not boring. I’ve been enjoying the sunshine. So I think we’re all currently as we’re recording this, we’re all sitting in rooms baking. So over the weekend I was just out and about dog walking, running, sorting out the garden. Just making the most of the sunshine. I’m looking forward to getting out of this hot room again and getting back outside. One final question for you, Richard, and then you’re off the hook and we’ll wrap up but where can people find out more about you and about Forecast Forge?

[00:31:19] Richard: So forecast forge is www.forcastforge.com, there’s some product information on there, a few blog posts. Then I am most active on Twitter, username @RichardFergie, probably the two best places to find me. But yeah if you’ve got any specific questions, just ping me on Twitter is probably the easiest way.

[00:31:37] Dara: Amazing, and what about you, Dan?

[00:31:39] Daniel: LinkedIn and as always my website danalytics.co.uk.

[00:31:42] Dara: And for me, LinkedIn is the easiest. That’s it from us for this week, to hear more from me and Dan on GA4 and other analytics related topics, you can check out our previous episodes which are all in our archive over at measurelab.co.uk/podcast. Or you can obviously find them all in whatever podcast app you’re using to listen to this.

[00:32:03] Daniel: And if you want to suggest the topic or someone for us to speak to, there’s a Google Form in the show notes, or you can email podcast@measurelab.co.uk to get in touch with us both directly.[00:32:13] Dara: Our theme music is from Confidential, you can find a link to their music in our show notes. I’ve been Dara joined by Dan and also by Richard. So on behalf of all of us, thanks for listening and see you next time.

Written by

Daniel Perry-Reed

Daniel is Principal Analytics Consultant and Trainer at Measurelab - he is an analytics trainer, host of The Measure Pod podcast, and overall fanatic. He loves getting stuck into all things marketing, tech and data, and most recently with exploring app development and analytics via Firebase by building his own Android games.

#45 Forecasting analytics with machine learning (with Richard Fergie)

Transcript

Daniel Perry-Reed

Further reading

Easy ways to prepare your BigQuery warehouse for AI

Data pipeline optimisation with Google Cloud and Dataform

Dataform for BigQuery: A basic end-to-end guide

#45 Forecasting analytics with machine learning (with Richard Fergie)

Transcript

Daniel Perry-Reed

Subscribe to our newsletter:

Further reading

Easy ways to prepare your BigQuery warehouse for AI

Data pipeline optimisation with Google Cloud and Dataform

Dataform for BigQuery: A basic end-to-end guide