Behind the Cloud: Connecting GA4 to BigQuery

In this episode of Behind the Cloud, Matt explores the practical side of Google Analytics 4 (GA4) and its free export feature to BigQuery. We learn about export limits, batch exports, and streaming options for GA4 data. The episode also touches on setting up BigQuery within a GCP project.

Watch below 👇 or head over to our YouTube channel

Transcript

Introduction to GA4 and BigQuery Export

[00:00:00] Matt: Hello and welcome back to Behind the Cloud. This week we’re going to delve into something a little bit more practical. When Google released GA4 and eventually replaced, for most people, Universal Analytics, they also moved a fantastic feature from behind a 360 paywall into general availability. That feature is a free export of GA4 data to BigQuery.

[00:00:25] Matt: This is not just a great way of backing up your data and making it owned by yourself, it also importantly opens up opportunities for more bespoke analysis and reporting. You could begin to add more data sources into BigQuery and join this data with your GA4 data to unlock more holistic views and answer more important business questions.

[00:00:45] Matt: Now there are limits to the number of events that can be exported with the standard tier of GA4, 1 million a day for batch exports, which will be absolutely fine for most smaller businesses. But it’s worth noting that if your GA4 property regularly exceeds the export limit, the daily export will be paused and any days missed during that period will not be reprocessed.

[00:01:07] Matt: So keep a close eye on those numbers. There’s also an option to set up a stream of GA4 data to BigQuery, which has no limit. This is really useful if you really want more immediate access to the data within BigQuery, but it’s worth noting these intraday tables might be subject to change. So for longer term reporting, the daily, batch data is going to be a better option.

Setting up BigQuery in GCP

[00:01:28] Matt: Anyway, enough about the why, let’s see how to set this up practically. For brevity, I’m not going to go into how to set up a GCP project in this video. If you think that a video on how to set up a GCP project would be useful, then please let us know, and we can go ahead and get one of those created. So step one, we need to make sure that BigQuery is activated in our GCP project.

[00:01:50] Matt: In this instance, the quickest way for us to activate BigQuery is just to go to the left hand menu. Thank you. Head over to BigQuery and you’re going to see a page up here that asks you to enable it if you want to be able to use BigQuery. So BigQuery enable may take a minute or two just for that API to be enabled within your GCP project.

[00:02:11] Matt: And there we go, we can see now we have our GCP project and then these are just places for us to save queries and notebooks, etc. But we’re not going to worry about that for this particular video. Next, we’re going to head over to Google Analytics. Once we’re in our analytics property that we want to be sharing with We’re going to head down to the settings in the bottom left.

Linking GA4 to BigQuery

[00:02:29] Matt: You see a little cog click admin further down the page. We’re going to see product links. And this is where you can link various Google services to your GA4, either sending data to or from, and then inside of that, we can see BigQuery links. So we’re going to click BigQuery links. And then in the top corner, you can see this link button, we’re going to click link. And then we can select the BigQuery project that we want to link our GA4 data to. 

[00:02:56] Matt: So that’s going to be the project that we just enabled BigQuery in a moment ago. This project right here, confirm. 

Configuring Data Location

[00:03:07] Matt: Next is going to ask you for where you want the data to be located. This is important for a couple of reasons. And Google cloud has data centres and servers all over the world. So you really, you want to consider where the data needs to be located for regulatory purposes? Do you want the data close to where you are going to be analysing it? So you reduce any sort of latency. For example, if you’re in the US you wouldn’t necessarily want to have your data in India as it would take longer to process and query.

Selecting Data Streams and Events

[00:03:35] Matt: Click next, and this is where we’re going to configure our data streams and events. So this is deciding which data streams we want to be backing up into BigQuery. And do we want all of the events within those data streams, or we just want a selection of them? This can be useful when you’re trying to limit the number of events that are going out.

[00:03:53] Matt: Obviously, if you’re not on 360, you have this million events daily event limit for the batch exporting. So consider this part carefully. For example, within this particular GA4 property, we have two data streams. We have a production and a development data stream. So I want to remove this development data stream because I don’t need to be exporting that data.

[00:04:17] Matt: You can also add events to exclude that you don’t want to be sent over or you can specify them just by typing the name in if you’ve got that to hand. You can decide if you want to include advertising identifiers for mobile app streams and then you can also decide what type of export type you want.

So as I’ve already described daily being those once a day, drops of previous days, data streaming being continuous exports of data, but may be subject to change as more events come in and sessions are stitched together, et cetera, et cetera.

Choosing Export Types

[00:04:51] Matt: So for our use case, I only need to do it daily, but you could select both or one or one of the other. You can also export a daily user table or daily user data. This would actually create two tables. So you’d have a pseudonymous ID table. And you’d have a user ID table.

The pseudo ID will contain a row for every pseudonymous identifier and the data for user is updated when there’s a change to one of these fields, the second table, the user ID table will contain a row for every user ID and the data for this user is updated when there is a change to one of these fields, data for unconsented users can be exported to this table if it includes a user ID. So we can export that just for, just for demonstration purposes.

[00:05:38] Matt: I’m going to click next, and it’s just going to give us an overview of everything that we’ve, we’ve done. So we’ve linked up to our Brian prod project. The date is going to be exported to the US. That’s going to change that to the EU as I didn’t do so. We are exporting our daily data, we don’t need streaming data for this particular project. We’re also exporting our user data labels. Great, so we’ll submit that and then go ahead and create a link to BigQuery. And if everything is okay, you will see this link created on the right hand side.

Once you have the link created, you can come in and you can see the totals of events that are being sent. So if you aren’t sure how many events are going to be sent in your first link, you can keep coming in here over a couple of days, check, you will get notified if there’s any breaches to the daily limits. And as I say, it’s worth noting that any regular breaches of the daily limit are going to lead to pausing of the export.

[00:06:35] Matt: And then you’re not gonna be able to backdate that data that you’ve missed in the, in the, in the intervening days. So it’s worth, it’s something worth keeping an eye on and considering how you’re going to handle it. You start reaching those limits and that should be that. So in the next, within the next 24 hours, we should start to see data dropping into our BigQuery table.

Viewing Data in BigQuery

[00:06:54] Matt: We won’t be able to see it in the project we’ve just set up because it may take 24 hours for that data to appear in our project. But if we look in this, in this, Other projects, you will see that we’ve got our GA4 data coming through. We have our event table, which is our batch data. And we have our intraday table, which is our streaming data inserts.

[00:07:12] Matt: And there we have it. We have our GA4 data batch loading into BigQuery and we have our user data set up to be batch loaded into BigQuery. We can now go on and set up some nice analysis or join it with other data sources and build visualisations or whatever you want to do with that data. I hope this was useful for you. Please let me or Measurelab know if you have any questions or suggestions. We welcome the feedback. If you haven’t done so already, please like and subscribe to keep up to date with new videos that are going to be coming out very soon. Thanks.

Share:
Written by

Matthew is Head of Engineering and Technology at Measurelab and loves solving complex problems with code, cloud technology and data. Outside of analytics, he enjoys playing computer games, woodworking and spending time with his young family.

Subscribe to our newsletter: