#47 Digging into Snowplow (with Jordan Peck)
This week Dan and Dara are joined by Snowplow’s Jordan Peck to chat shop about what Snowplow is, how and where it’s used, and of course the differences between Google Analytics. Jordan is a friend of Measurelab’s and the team saw his presentation at the latest MeasureCamp in London that spurred a lot of this conversation.
See SNowplow’s website for more details on who they are and what they do – https://bit.ly/3b9hwkc.
Read Dara’s review of Jordan’s talk (and others) – https://bit.ly/3besPY4.
For good measure, here’s Mark Rittman’s DJ Soundcloud profile for some House tracks! – https://bit.ly/3PIYMap.
In other news, Dan is pointing-and-clicking, Dara is watching a tour, and Jordan is gigging!
Follow Measurelab on LinkedIn for all the latest podcast episodes, analytics resources and industry news at https://bit.ly/3Ka513y.
Intro music composed by the amazing Confidential (Spotify https://spoti.fi/3JnEdg6).
If you’re liking the show, please show some support and leave a rating on Spotify.
If you have some feedback or a suggestion for Dan and Dara, fill in the form https://bit.ly/3MNtPzl to let them know. Alternatively, you can email podcast@measurelab.co.uk to drop them a message.
Transcript
[00:00:00] Dara: Hello, and welcome back to The Measure Pod, a podcast for people in the analytics world to talk about all things, surprise, analytics. Welcome to episode number 47. I’m Dara, I’m MD at Measurelab. I’m joined as always by Dan, who is an analytics consultant also at Measurelab. And we’re also very pleased to have Jordan Peck from Snowplow joining us on The Measure Pod today. So firstly, hey Jordan, welcome to The Measure Pod, thanks for joining us.
[00:00:42] Jordan: Hi Dara, hi Dan. Thanks for having me, great to be here.
[00:00:45] Dara: And you’re also here on the hottest day the UK has ever seen. So I think we probably all deserve a bit of credit for sitting in hot rooms recording a podcast today when it’s 40 degrees.
[00:00:55] Jordan: It’s pretty damn hot, like I’ve escaped the worst of it I think in my little sheltered apartment up in Leeds, but it still hit 39, 40 degrees and it’s not been pleasant.
[00:01:08] Dara: What we usually do at the beginning when we’ve got a guest on is basically just ask you to give a little bit of an overview of your analytics journey. So how did you get into analytics and a whistle stop tour leading you up where you are today.
[00:01:19] Jordan: So, I work at Snowplow now as a strategic solutions architect. So my first sort of start of my career around 10 years ago was in digital marketing. So I got into SEO around about 2012, worked in offsite. So link building and outreach, and did that for a couple of agencies in Leeds in the north of England. There’s were some pretty big, well known, household name clients on some pretty cool projects. Like SEO is kind of inherently technical and I always found like, understanding, alright, so we’ve built these links, so we’ve done this sort of thing, we targeted this keyword, what’s that actually led to. Started like investigating some of those things, and a couple of people gave me a bit of access to some digital analytics stuff.
[00:01:59] Jordan: From that point in digital marketing kind of got a bit broader, went to a couple of other digital marketing agencies, more as a general digital marketing person. So I did some social media ads, helped out on PPC accounts. Got access to like Google Analytics accounts Google Tag Manager, started getting involved in those sorts of things. Helped on like email and CRM campaigns and how all of the data would maybe flow through those projects, offsite, onsite, all those kinds of stuff. And then about five, six years ago started focusing in on web analytics. So did a lot of Google Analytics, GTM (Google Tag Manager), a lot of implementations, audits, analysis.
[00:02:33] Jordan: And then at one of my previous agencies, basically we were thinking that Google Analytics wasn’t really enough, we wanted to be able to send data off to an email marketing platform or to generate a better 360, you know, customer view. Like when was the last time this person interacted with an ad? When was the last time they received an email? When was the last time they made an offsite purchase? GA (Google Analytics) really just isn’t set up for doing that. That’s not a dig on Google Analytics, that’s just not what the platform’s designed for. We started looking around for open source tools or other solutions that might be a bit better than that, get access to our data, and we came across Snowplow, this is back in 2016 or so, as a solution.
[00:03:10] Jordan: We implemented it on one of our customer’s sites, did some really cool stuff. We built some automated email campaigns based on what kind of things customers had in their cart versus like how much they bought before. If they are part of certain organisations, we had the data in Amazon Redshift. So we built like some really cool dashboarding systems which was an incredible learning experience for me. So I went from a web analyst who lived in GTM (Google Tag Manager) and GA (Google Analytics), to learning Amazon Redshift SQL, and understanding how to write pure proper JavaScript, because there was no GTM (Google Tag Manager) templates for how to deploy a Snowplow tag and understanding like how batch ETL processes work and what happens when you’ve got a data loading job and it goes down, how would you then refill all of that with the correct data and not duplicate stuff. And all this kind of weird and wonderful stuff, which was so far away from what I’ve doing before that I found absolutely fascinating.
[00:04:01] Jordan: Went from there on to become a Senior Data Analyst at another agency. Again, focusing a lot on web analytics, digital analytics in general. Ran a team for a little while, and then back in 2020 saw an opportunity at Snowplow themselves and joined as a solutions architect and now a strategic solutions architect. So I work with our enterprise customers, our advanced, more complex customers who have the more complicated requirements, want to do more sophisticated things and help and advise them on how to use Snowplow in order to do those things.
[00:04:31] Dara: I think really the main prompt why we wanted to bring you on the show because you’ve been through the journey of having worked with GA (Google Analytics) and GTM (Google Tag Manager) extensively. And now you’re well, you have both worked with Snowplow as a customer and now you’re at Snowplow. And I guess we’ll probably end up talking maybe about some other tools as well, because what prompted this was, we saw you speak at MeasureCamp and you had a, quite a provocative title to presentation, which was post GA (Google Analytics) world, which obviously peaked our interest because we’re working with GA (Google Analytics), day in day out and now GA4 and a lot of our work at the moment is GA4 migrations. So we found your talk really interesting, because it was, I guess, presented an idea that because of some of the aspects of GA4, with UA (Universal Analytics) being sunsetted completely, now’s the time maybe for certain customers to think about whether GA (Google Analytics) is going to continue to be the right solution for them going forward.
[00:05:20] Dara: So found your talk really interesting, and even though you talked about Snowplow, you also talked about some other kind of newer players in the space as well, that are taking a privacy centric approach. If we start at least with Snowplow and for the benefit of listeners, can you give us a quick overview of maybe what Snowplow offers? You’re probably are sick of getting asked this question, but what are some of the kind of key differences between Snowplow and maybe GA (Google Analytics)?
[00:05:44] Jordan: Yeah, totally. Essentially Snowplow is a platform for creating and collecting behavioural data. What the hell does that mean Jordan? So behavioural data is data which can be generated on any platform that a business or a client or a customer, where their users interact with them. So the first party platform. So in obvious cases, that’s a website or mobile apps, that could be desktop applications, that could be support systems where they’re interacting, or email systems. Snowplow provides a way of creating the data in the format that you wanted from the get go, and then creating that data and then sending it through a cloud-grade, enterprise-grade pipeline, and storing it in a data warehouse for you to have direct access to every single data point that you collect.
[00:06:33] Jordan: So there’s a couple of things that make Snowplow unique in that respect. So, first of all, you define the data structures and the data formats that you want to collect upfront. You don’t get the tool, tool is not dictated to you. This is very different from what Google Analytics does. Google Analytics events of category, action, label is dictated to you and that is the schemer of the how the data is stored. Snowplows approach is the opposite of that. You say, this is what I think a user looks like a user has a user ID, the user ID is a 20 character random integer number. They have the date they signed up, they have their product tiering. They have if they’re a customer or not, etc. you define all of those.
[00:07:13] Jordan: And then when you collect information about a user, you’ve defined upfront The format you want the data to be in. So you create the data to match the format that you want at the point of collection. And then that is stored in a data warehouse, cloud data warehouse. We support Amazon Redshift, Snowflake and BigQuery on GCP (Google Cloud Platform). We’re also completely real time in BigQuery on GCP (Google Cloud Platform), you can have an event happen on a website and have it be queryable in BigQuery in five seconds, ish, depending on how much data you’re sending in, but five to 15 seconds is generally the magnitude that we see. And yeah, we enable you to track and create data on any of those platforms that you may have. So if your main use case is a desktop application or we work with Strava, their main use cases are smartwatches and smartphones.
[00:07:59] Jordan: We have customers creating data from backend applications, so server side applications. When you perform an action on the website, that might actually go back to a server-side environment and trigger something. You can actually send an event from that service-side application. We support like in 20 different programming languages for server-side apps. Ruby, Python, Java, name one we’ve got it. The pure idea behind Snowplow is you understand the analysis, the models that you want to do with your behavioural data, you define your data structure upfront. So you know what data that you want and in what shape, to enable you to do those analyses or build those models. And then you create the data at the point of creation, at the point of collection in that format as you defined, and the data flies through your pipeline in real time in that format.
[00:08:47] Jordan: It’s kind of interesting we just had our offsite last week, whole company offsite last week in Dublin, which was a lot of fun. But one of the sessions that we discussed there was this world of data engineering and data collection is basically all based around the idea of like data extraction. Someone actually mentioned a really interesting point. They’re like about 30 years ago, someone came up with the phrase “data is the new oil”. And like that is now precipitated around the data industry so much, including people who work in web analytics, you know, this idea that data is just sort of like lying around you and you have to go do some data mining, you have to do some data extraction, you have to send it to a data pipeline. You know, even like the vernacular around how people talk about data, like it’s just some data, an analyst’s job is to go out and find it and get it and collect it, extract it, replicate it, do something with it.
[00:09:39] Jordan: And in GA (Google Analytics) world you have a spade shaped like Universal Analytics event’s category, action, label, and you have to scoop the data up with that category action label shaped spade. The way we perceive the world is, you know what data that you want, you know how you want it to be because you know the models and the analytics and the use cases and the, you know, real time activation or whatever it is that you want to do. Well, create the data in that format from the start, you create the user data structure, you create product data structure, you create X number of data structures that you need for your different platforms and use cases. And get the data in that form from the format, from the get go, ensure it comes through as you expect and then you’re in the best position possible to go activate those use cases, rather than having to extract out of analytics.
[00:10:28] Jordan: We have customers who have GA360, they move data on a daily basis into BigQuery, then they have another daily job that extracts with BigQuery and moves it to like Snowflake. So the data’s already two days out of date. I mean the Universal Analytics schema in BigQuery is just like beyond a nightmare. They have to completely like do all this weird transformation to get the stuff somewhat looking like what they want to do, their, you know, churn propensity model or their likelihood to buy model. They have to do all of that and then they get really substandard data and their models perform really poorly. Like decide like these are the models that we think will like move the needle for our business, decide the features and the things that you need and the columns and the fields you need for that, how those fields and properties need to look and go get them straight away. Create the data in the format that you want it from the start, and then we’ll make sure that it stays up, we’ll make sure that it gets there in real time, we’ll make sure that it’s the format that you said it had to be.
[00:11:22] Dara: Do you think there’s a difference in the typical customer that would be suitable for Snowplow versus GA (Google Analytics) or any other analytics solution or is it the case that actually it could be any customer, but it depends on where they’re at in their journey.
[00:11:35] Jordan: That’s a great question. Let me put it this way, if you don’t have people in your team or your organisation who can write SQL against your data warehouse, Snowplow really isn’t the tool for you, right? Like, because that is the main point of consumption that we deliver. If you haven’t got people who can do that, and you are not prepared to invest in people who can do that. Then sorry Snowplow is not for you. We turn away a lot of deals, we turn away a lot of potential deals anyway, because of the basis of those sorts of things. And that’s not to say that those people are dumb or wrong or stupid, that isn’t necessarily the point. They haven’t reached that level of technical and data maturity and capabilities yet to gain the most value from Snowplow.
[00:12:11] Jordan: I’m not here to sell Snowplow, like we have salespeople, I’m not here to say the Google Analytics or GA4. But I’m not here to say that Google Analytics as a thing is bad or that they do a terrible job. They honestly do a fantastic job and completely reinvented the industry and they always have done a fantastic job. People can get lots of value out of Google Analytics, smart people have done incredible things with GA (Google Analytics). That being said, an inherent limitation I would say in the way that GA (Google Analytics) works, in that it is built to be a mass market tool.
[00:12:40] Jordan: So one of our co-founders was in a lucky position to actually speak to a product manager who worked at Google Analytics. And he said, do you have any idea how hard it is to roll out a change to Google Analytics. The www.google-analytics.com/collect endpoint gets hit untold billions of times a day, right. The interface is used by several hundreds of thousand, probably millions of individual users every single day. You have any idea how hard it is to change the core data model, to make a chance to report interface, to perform an upgrade. They just can’t, and it’s not really reasonable to expect a company like Google, even a company like Google, to be able to make a tool like Google Analytics as flexible and as composable and as customisable as Snowplow is. I would say, if you have really custom requirements, like really complex user journeys. Like tests in services on the website, in store consultations, order online, buy offline, all those kinds of user journeys, that’s really hard to get GA (Google Analytics) to do really well. If you’re someone like that, Snowplow, we think is a fantastic fit.
[00:13:45] Jordan: The other thing that I didn’t really touch on, but another thing that’s really important to our value position is because we deploy to the cloud, we’re not purely hosted. What we do is, the customer comes along and they are running on AWS. What they do is they provision and AWS subaccount, basically a brand new AWS account grants us access, and then we deploy our infrastructure into that subaccount. All of our core tech is open source, probably should have mentioned that as well. All of our core tech is open source, you can go find all of this on Github, all of the various Github repositories that we have for our JavaScript tracker, for our BigQuery loader, for our enrichment program, our validation app, all those things are available open source, and we deploy all of this into your own cloud. So it’s all your own infrastructure, the data never leaves your own cloud. So one of the really big value ads that we see amongst privacy conscious customers is this ability that all of the infrastructure, data process infrastructure and all of their data storage lives on their own infrastructure lives in their own AWS or GCP (Google Cloud Platform) account.
[00:14:49] Jordan: It never gets sent to a third party that you don’t know what on earth they’re going to do with it never gets sent to you know, a black box server that you don’t know exactly what they’re doing or who’s accessing it. It always lives inside your own cloud account. So with some of the really big privacy regulations and rulings that have happened in the EU, Snowplow becomes a really attractive option. Because rather than sending data off to google-analytics.com/collect and not really knowing what an earth’s going to happen. And then just looking at the data in a report and getting some weird export into BigQuery, you know exactly where it is, you know exactly what processes have been run in every individual event, you can look up the code and people do look up the code, open source, and you can see the data.
[00:15:34] Jordan: You can go into BigQuery, you can see all of the individual files you can control who accessed you can control retention policies. You can say like after 90 days or a year, delete them because we don’t have good reason to maintain access to the data after that period of time. That’s always been a very popular feature of our deployment model. We call it private SaaS, it is SaaS software, but it’s privately deployed into your own environment. That’s always been popular, it’s becoming more popular as privacy regulations become more front of mind. But yeah, if you’ve got really complex products, complex user journeys, complex analyses that you want to run, that makes Snowplow generally a good fit. And if you’re very conscious about where data is stored, who’s got access and data ownership, Snowplow becomes very popular there as well.
[00:16:17] Daniel: Jordan, you mentioned that it’s like a analytics maturity thing where, you know, you can’t be running a Snowplow instance if you’ve got no data engineers or SQL analysts or people that can actually make the most of the software in it or the platform rather, and that makes a lot of sense. One thing I’d asked though, and I asked this and we chatted to a guy called Liam a couple of weeks ago on the podcast and the same kind of conversation around Adobe Analytics, Google Analytics, and some of the overlap there. But one of the things that he mentioned that I think may be the case here as well is that even if there’s an obvious progression that Snowplow is more customisable, it’s more privacy secure, it’s a better fit for our business and it can track all the things that we want to track.
[00:16:53] Daniel: Inherently you are still probably going to have a marketing team that wants to use Google Analytics, and it’ll still be there in the background. And that inherently means that there’s going to be a kind of, well, my report says this, your report says that. How much does that factor into Snowplow or is that kind of, so off the radar for you guys that it’s like, the marketers will probably keep using Google Analytics, but this is a warehouse and this is the activation through other methods. I mean, how does that conversation normally go with your clients?
[00:17:17] Jordan: That’s an excellent point. And historically, I would say has certainly been a challenge for how to address that. Like, do we just say we aren’t going to tackle your marketing use cases. You keep using the tools you’re familiar with. Or do we try and broaden out feature sets in order to address those things? So it’s a very prevalent question. What we generally recommend customers to do is if you’re using a BI tool like Tableau or Looker, some of the most popular ones we see. They’ll be used to taking in, reporting requests, questions and creating dashboards and self-serve interfaces so that those internal users can still get the answers that they want. Because there’s no reason why Snowplow can’t give you traffic by channel, traffic by device, top landing pages, all the stuff that GA (Google Analytics) is really good at, it can totally do that. It does take a little bit more work, you’re not offloading that processing work to Google or Adobe, you’re not processing that work off to another party, you do have to do this yourself a little bit.
[00:18:09] Jordan: So answering these types of questions are much more straightforward. And then you can hook up Looker. You can say, this is my date column, this is my traffic column, this is my channel column and split out and you get a nice line chart and it looks like a GA (Google Analytics) report. But there’s loads more marketing use cases right than just how many visits from organic search have I had this week, right? There’s a lot more. We’ve had to decide to work a lot with partners recently on how to better support that. So there’s a couple of ways we are really looking at tackling the marketing aspect of this, and this is an extremely close to my level of experience having been a digital marketer who had to do ranking reports and top landing pages and conversion rate by device, all that kind of stuff.
[00:18:47] Jordan: So there’s kind of two ways that we see this, that you’re going to want to use Snowplow data. It’s generally accepted, I don’t think there’s many people in the industry who will debate that Snowplow data is generally of a much higher quality. We can be resistant to ad blockers, we’re resistant to things like ITP by Apple, where cookies are truncated at seven days we can use that. We track much more deeper data, we track 130 columns out of the box for every single event, plus all of the custom columns that you want to add on. So it’s not generally disputed that Snowplow data is richer and more detailed and better quality out of the box.
[00:19:22] Jordan: And a lot of people want to say like, right, well I want that Snowplow data. I want that in Marketo or I want that into Google Ads so I can follow my retargeting ads that way, or I want it into my CRM so that you know, our sales team know who are the most engaged prospects, that’s a very, very, very common question. And the old answer used to be, well, you’ve got the data, you’ve got access to the tech, off you go. That’s fine from our perspective as long as you think that you’re going after certain maturity level companies, but even then, that’s not really a reasonable response. So what we basically decided and what we’ve focused a lot of engineering and technical resources on the last few months is integrating with things like Google Tag Manager server-side, and what are known as reverse ETL tools.
[00:20:03] Jordan: So we’ve built a GTM (Google Tag Manager) server-side integration where essentially there’s a Snowplow client and there’s a Snowplow tag. So depending on your Snowplow deployment model, If you’re running full open source or you have us manage it for you. You can basically send an event out of browser, into GTM server-side, have the Snowplow client collect it, and then a Snowplow tag will forward that event onto Snowplow, as it would normally do. It then goes through our pipeline and gets plugged into your warehouse. Using that Snowplow client though it populates common event model in GTM (Google Tag Manager) server-side. So you can then use Facebook for GTM (Google Tag Manager) server-side, you can use Google’s, Google Ads tag in GTM (Google Tag Manager) server-side.
[00:20:39] Jordan: We’ve even built Snowplow-authored GTM (Google Tag Manager) server side tags for things like Braze and things like Iterable for marketing automation and things like that. So this really means that you can have Snowplow tracking happen on your site, have GTM (Google Tag Manager) server-side sitting in between the website and the vendors. Have the Snowplow event come in and then literally just cherry pick off properties that you need after Snowplow payload. Snowplow payload it’s really big, like between 130 and 150 properties, depending on how much you’ve customised it. You can literally take off the Snowplow event cherry pick, I want page URL, timestamp, this cookie ID and maybe like the event name, forward that onto Facebook Ads, do the same for Google Ads. You can use GTM server-side essentially as this Snowplow event relay, this event forwarder so that you can now get the same Snowplow events that happen in all of your other suspenders.
[00:21:28] Jordan: Your CRM team can look in say Salesforce Marketing Cloud, and go I want to create an audience of people who have looked at the website in the last seven days, have a lifetime value of etc. and send these people a particular email. The reverse ETL option is kind of a batch alternative to this. So the reverse ETL option says I have some data in a data warehouse, I’m going to run a query, show me all of the users who have purchased more than five times, but their average order of value was lower than £20, and are based in this particular geographic region.
[00:22:03] Jordan: A reverse ETL tool will essentially take that query and in a batch fashion, hourly, daily, whatever you need, move that table that’s generated from the warehouse and sync it to your chosen destination. So this works really well for things like CRMs if you want to update user records or customer records in your CRMs, also works really great for a bunch of marketing tools. If you want to do retargeting based on email or customer match in Facebook or Google, for instance, this works fantastically. So you can create any kind of audience that you can think of inside your data warehouse based on Snowplow or any data. Join all of that with your Snowplow data, come up with the most complex and clever segment that you can think of and then ask a reverse ETL vendor to sync that to Google Ads as a customer match audience to then be retargeted to. So those are the two real approaches that we’ve taken to being able to activate Snowplow data into marketing tools.
[00:22:57] Jordan: In terms of marketing analytics, like I said earlier, you can do all the sort of normal quote, unquote, web analytics that you would ever want to do. You can do new versus returning, you can do top exit pages, you can do conversion rates, AOV, all that kind of stuff that you would be kind of used to. And the hard thing that comes there is understanding the reports that are used commonly around the business already. So you’ve got an e-commerce team, you’ve got a marketing team, you’ve got an organic team, probably all of them are using GA (Google Analytics) but for different things, right. Go around those business units, figure out what reports, what custom dimensions, what filters, etc. they’ve got up in their analytics account and then figure out the most logical way to recreate that in Snowplow.
[00:23:39] Jordan: I’ve never come across an instance where some sort of custom filter or a custom dimension or whatever it is, can’t be recreated in Snowplow. But I would potentially argue, and I sometimes have argued with customers on this, that just recreating GA (Google Analytics) with Snowplow doesn’t seem like the best use of your time. Universal Analytics at least is like, what? A 15 year old, like close to 20 year old piece of tech. Why would you come to us and pay us a bunch of money to get access to a piece of software to recreate such an outdated piece of software, right? So we actually, as part of our job is to help challenge the thinking, and help customers come around to a better understanding of like what better kind of reports, what better way of looking at data will help activate their use cases better.
[00:24:22] Daniel: Yeah those reverse ETL tools are really interesting. I just see that kind of the death of the CDP, because you know, you’ve got a data warehouse and you’ve got a reverse ETL tool but that is the kind of fundamental or the full thing, right? I mean, that is the kind of end to end solution. So I’m with you on that and I think that’s a really exciting avenue to be continuously exploring for Snowplow is that reverse ETL side, because in a sense, you’re kind of taking the use case of something like Google Analytics that works well with the Google world in the Google ‘walled garden’, and then just kind of making it agnostic. And you’re saying, well, this is data based on your business and it’s agnostic to your marketing channels. And I think that’s a really good position to be, rather than trying to retrofit a Google product in the Google world for marketing that Google doesn’t manage.
[00:24:59] Jordan: It’s really interesting you bring that up Dan, we refer to this as what we call the composable CDP, the idea that the warehouse lives at the centre of your business, you might be using tools like Fivetran or Stitch to ETL data out of, NetSuite for your accounting data, Shopify for your e-commerce data, you know, CRMs, whatever. That’s totally fine, that’s what you should do. Use Snowplow for creating the best quality behavioural data, landing in the same day warehouse, right?
[00:25:26] Jordan: So use Snowplow to collect behavioural stuff from your mobile phone apps and your websites, much like a CDP like Segment would be doing. The data then ends up from the warehouse first in the best quality format it can be. You then join that with your purchase history, your e-com history, all that good stuff. And then you use a reverse ETL tool to sync out to those downstream applications. You get to choose your own data warehouse, you get to choose what data you bring in and sync in and you get to choose how all of those things are then synced downstream, without a single vendor being the sort of guardian of all of that. We’ve got a couple of good partners in that space, we’ve got great relationships with all the major data warehouses, great partnership with Snowflake and Databricks, and a couple of really good reverse ETL vendors, and yeah, like we feel that is a huge upheaval CDP space.
[00:26:12] Jordan: It’s a really big upheaval, which I don’t think the CDPs really saw coming, and a few of them are pivoting and doing some other things which is just fair enough and some of them are really, really interesting. But yeah, like we think the composable CDP is really powerful solution.
[00:26:24] Daniel: I really like that whole world and how that’s changing right now, that whole landscape. I find that the conversations I have with people that venture into the CDP world and especially the people that are kind of producing the content and spouting the CDP rhetoric, I find that is very much for the people that want to scratch the itch. It’s basically like the next stage or the kind of advanced marketer rather than the data engineer or data scientist. And I think this is, I need a solution that does this, I don’t have a data science team or a data engineering team or kind of, you know, SQL analysts to be able to do this. I need to throw money at a problem that isn’t hiring obviously, to be able to kind of make this thing work.
[00:26:56] Daniel: It’s very much the refocus of the industry around privacy and compliance first of all, as you’ve mentioned, Jordan. But also the other side that the kind of role of the data engineer being more widely adopted and people training in the space and because there’s been more data engineers now than there ever has been. So the idea that there’s more people with the skills to be able to do this themselves then why would I pay a third party when there’s ambiguity? Like you were saying about Google Analytics of how they’re processing data, but also we can just do this ourselves. And I think Census, Hightouch, those things are fascinating. We spoke to a guy called Mark Rittman a couple of weeks ago and he works in this space, the kind of big data space, the data pipeline and engineering space and he’s saying that everything is going agnostic and having this idea of the the composable CDP is like, you know, it’s basically a modern data stack, isn’t it? It’s ultimately a marketing modern data stack, but people give it fancy different names, but it’s collecting data that’s useful for you, activating it in ways that’s useful for you, and staying compliant with privacy laws and just reasonable decency.
[00:27:49] Dara: Can you tell us a bit Jordan about the pricing model for Snowplow?
[00:27:53] Jordan: As I mentioned, all of the Snowplow components are open source, and they are generally used by very data sophisticated businesses who see value in data ownership, but also the value in being able to build high quality data applications. So like CNN or any publisher that uses Snowplow, there are several. Their main value is how they can recommend content to other users. The thing that keeps you on the site, that keeps driving ad revenue for them as a publisher that keeps users engaged, it keeps advertisers happy. That’s all powered basically by how well they can recommend content to users. They see the value in having a data application powered by some of the high quality behavioural data, and they are willing to invest time, money, people, platforms, tech into making those data applications work very well.
[00:28:45] Jordan: So every release that we do as a commercial business, we, you know, run by the open source community. The open source community is a big part of our business. If you don’t fancy running open source, if you don’t fancy spinning up 70 different AWS services and managing their uptime and having data engineers on call over the weekend for when they fall over, you can come to Snowplow, so we’re a commercial open source software business. And you could essentially have us manage it for you. So you basically offload the management and support and all that stuff, all the stuff that you don’t want to do, basically. You offload that to us and we deploy our software into your cloud and we run and manage it for you. This means, for instance, like at the end of last year when there was this enormous vulnerability in this incredibly popular Java library, we managed to roll out something like 70 or 80 patches across all of our applications, across 150 to 200 customers in the space of a week because we were on that because we were, we obviously, it was a very public vulnerability. We found out what the issue was, we combed through our code base, we found where we were needed to roll updates out. Our engineers rolled the updates out, and then our support team were able to roll them out across all of our customers, basically without our customers intervention at all.
[00:30:01] Jordan: There’s a value to that, right? You offload that responsibility to us. So we cater to mainly businesses of all sizes. We work with early stage startups, we work with big enterprises that you’ll see on the high streets and you’ll hear about every day. We don’t have a hosted option yet I would say, I think our senior leadership team have been thinking about something like a hosted option where we essentially host all of it. Almost become like a bit of a Google Analytics while still offering the users the flexibility to create their own data structures, ec. So that’s quite a technical undertaking based on how our tech is set up. So I’m not sure that will come around any time really quickly.
[00:30:40] Jordan: If you’re interested, you can go to try Snowplow. So you can go to try.snowplowanalytics.com, which is essentially our free trial. It’s a really, really small version of Snowplow that we host. We spin up a Postgres database for you. We give you a tracking code, we give you an endpoint you can send data to. You can send data to that, you have 14 days and you essentially can send data to Snowplow and see what the data looks like and understand it, you can try it out.
[00:31:04] Jordan: The other option is what I actually displayed in my MeasureCamp talk, if you guys remember. Is our Snowplow open source quick start, which is essentially Terraform scripts. So if you have access to, if you are a bit more techy, you can download a bunch of scripts from GitHub and run two commands, and you can spin up an entire Snowplow pipeline in a matter of minutes. It’s not quite production grade if you wanted to run it open source fully on a high traffic production website, you’d probably have to do some tweaks and scale up some of the servers, but you will spin up and you guys can attest I did it in session over phone wifi, over a VPN and it spun up in a matter of minutes. So if you’re a little bit more techy and you want to understand how those components work and what they do and what it’s actually like to run a Snowplow pipeline in your own AWS account, Snowplow Open Source Quickstart is also available to you.
[00:31:56] Dara: Amazing, thank you, Jordan. Just to shift gears now, and you’re almost off the hook here. What we’d like to do to end the show is ask what you like to do outside of work. So when you’re not hard at work talking about Snowplow, what do you like to do to wind down?
[00:32:08] Jordan: Ah, what do I do to wind down?
[00:32:10] Dara: This is the hardest question you get asked.
[00:32:12] Jordan: It really is, I’m really big into dance music these days and have been for quite a while. I’m an amateur bedroom DJ, although other sort of commitments over the last while have limited my time to get my decks out and play a bit of music. But I do like doing that, I like going to events and going to gigs. I like doing that a lot, still a big football fan, as you can probably tell by my thick cockney accent, I’m a Chelsea fan and have been since I was eight years old. It’s been a very interesting few months to be a Chelsea fan, but we now have a new owner with Mr Todd Boehly. And yeah, I like to follow a lot of sports. I’ve been to cricket a few times up in Headingley in Leeds this year. But honestly, one of the big things is tech.
[00:32:49] Daniel: So two things, Jordan. So you said, you know of Mark Rittman, did you know he is an amateur DJ too? He’s got a Soundcloud profile that I’ll ping your way, it’s really quite fun to listen to. So I think there’s some more commonalities between you two there. The other question I have is whenever I talk about anything to do with video games, Dara has this kind of blank expression on his face. And I’m just wondering, as you were talking about football, I almost saw the same thing. So I just wanted to check in with Dara, did you understand any of that?
[00:33:12] Dara: I know who Chelsea are yeah. My dad’s actually a Chelsea fan, so I do know.
[00:33:16] Jordan: Oh for real?
[00:33:16] Dara: Yeah, yeah. Has been for a long time. He also has a cockney accent just like you. What about you Dan, what have you been up to?
[00:33:27] Daniel: It’s another video game Dara, so apologies in advance. But probably my favourite video game of all time is a point and click adventure game called Broken Sword, and Broken Sword 2. Point and click adventure games, they kind of died off a bit in the early noughties where people moved onto consoles. However, since the kind of rise of the smartphone, they’ve kind of had this huge resurgence and they’re back in the mainstream and they found a new home on the touchscreen. So they’ve just re-released the remastered version of Broken Sword 2 for Android, and I’ve just been playing that again, and it’s one of those games that I revisit every two years or so, and I could probably write you the walkthrough guide off the top of my head. Don’t test me on that, because I would. Anyway, Broken Sword 2, Android, has just had a huge update and a kind of remastered version, which I couldn’t recommend enough. How about you, Dara?
[00:34:07] Dara: Well, I didn’t listen to any of that, but I’m sure it was great. I’ve been watching the Tour de France. I’m not a cyclist and I’m not a huge cycling fan, but I watch the Tour de France every year. It’s just such a spectacle and I’ve got such respect for these guys who are putting themselves through absolute hell, we’re complaining about it being 38, 39 degrees and we’re sitting inside doing a podcast, and these guys are cycling up and down mountains. So yeah, it’s quite a sight. It’s just something about the tour de France, I think it’s just that kind of special event isn’t it. Last question for you, Jordan. Where can people find out a bit more about you?
[00:34:40] Jordan: People can find me on social. So I’m on Twitter, I’m on LinkedIn, quite a lot. In amongst the nonsense that you see on LinkedIn, I actually get quite a lot of value out of LinkedIn. So, yeah, I’m on LinkedIn and I’m also on a couple of public slacks. So I’m on the measure slack, so the MeasureSlack community, I hang around there a lot. There’s another slack called LocallyOptimistic.
[00:35:01] Dara: Great, we can include those in the show notes. What about you, Dan?
[00:35:04] Daniel: Same old, so LinkedIn, Twitter and danalytics.co.uk.
[00:35:08] Dara: And it’s probably just LinkedIn that is the main place for me. Okay, that’s it from us for this week as always to hear more from Dan and I about GA4 and all things analytics, you can check out previous episodes in our archive at measurelab.co.uk/podcast, or just use the app that you’re listening to this in.
[00:35:26] Daniel: And also, as always there’s a Google Form in the show notes. If you want to send any feedback, if you have any requests or if you have any people that you’d like us to talk to. Alternatively just email podcast@measurelab.co.uk, and that goes right into Dara and mine’s inbox.
[00:35:40] Dara: Our theme music is from Confidential, you can find a link to their music in our show notes. I’ve been Dara joined by Dan and by Jordan. So on behalf of all of us, thanks for listening and see you next time.