How to build an interactive GA4 BigQuery data schema
What and why?
One of the great features of Google Analytics 4 (GA4) is the ability to pass data into BigQuery (BQ). There are many benefits to this which have already been covered in some of our previous blogs (i.e. 10 reasons to export your GA4 data to BigQuery). Passing data to BigQuery is no longer just available to enterprise GA360 customers, but to anyone using GA4 for free.
What I wanted to create was a simple interactive way to explore how GA4 data is saved in BigQuery. I found the GA360 interactive schema from E-Nor (now Merkle – Cardinal Path) by Justin Marciszewski to be a handy way to know where different elements can be found in the BQ schema, what I’d need to unnest to get to them, and at times found fields I hadn’t noticed were available (most recently total.sessionQualityDim).
GA4 has a new schema, different to that of the current schema if you are used to GA360/BigQuery, so this is hopefully useful for new and old GA/BigQuery users.
Finally, as well as outputting an interactive schema I wanted to learn what R was all about and what else may be possible.
How?
The accompanying blog from Justin has a nice to follow process on how the GA360 interactive map was put together.
In the interest of saving time, I skipped the scraping and tidying of data and simply created a data frame manually using the GA4 documentation.
Below is a summary of the steps involved to create the interactive schema using R, along with links to documentation I found helpful and full code on our GitHub for you to be able to recreate this for your own GA4 dataset.
Step 1
Install RStudio on my laptop from a local CRAN Mirror. With R being open source, this is a network of servers to be able to download and install the latest versions of R.
Step 2
Once installed, I then needed to figure out the basics of R. This video tutorial was really helpful in getting a basic understanding of R, from variables to data frames and packages.
Step 3
After I had understood the basics I then needed to install the collapsible tree package. This only needs to be done once for your laptop or whenever you update the version of R.
install.packages("collapsibleTree")
Code language: R (r)
Step 4
Once installed I needed to call the library associated with this package.
library(collapsibleTree)
Code language: R (r)
Step 5
Then I needed to declare the data frame. The same as the previous steps, this was pulled together manually to save time. A future update I can make will be to look at ways this can be automated.
ga4_ <- data.frame(
node = c(
NA,'event_','event_',.....
),
leaf = c(
'event_','event_date',......
),
variableType = c(
'RECORD','STRING',.....
)
)
Code language: R (r)
The full data frame can be found on our GitHub.
Step 6
The next step was to call the package referencing the data frame. This has been developed on from earlier iterations to add a custom fill, height, width and ensure the tree starts in a collapsed state. Other fields can also be declared as part of this with more details here.
collapsibleTreeNetwork(ga4_, attribute = "variableType", collapsed = TRUE, zoomable = FALSE, fill = "#66E251", height = 700, width = 1200)
Code language: R (r)
Step 7
The final step was to export this to html. This was simply done via the “Export > Save as Web Page” feature in the viewer tab of RStudio.
You can see the final interactive tree output that we have done over on the GA4 BigQuery Export Schema page.
Summary
At the moment, GA4 is still being developed so I fully expect the current schema to change a few times yet.
In the meantime, I hope this can help anyone who is either transitioning from GA360 to GA4, or to those who are completely new to GA4 and are keen to understand how the data from BigQuery is stored.
Please do get in touch with us if you have any suggestions for future updates, or to let us know how you are using it!