Behind the Cloud: GCP foundational best practice

In this episode of Behind the Cloud, Matt explains how to organise GCP projects by specific use cases, how to automate processes, establish clear naming conventions, and follow best practices around security, tagging, and more.

Watch below 👇 or head over to our YouTube channel

Transcript

Introduction

[00:00:00] Matthew: Hello and welcome to another episode of Behind the Cloud. We’ve had a little bit of a brief hiatus but now we’re back and we’re going to do another series of videos. Today we’re going to kick off with what arguably should have been the first video, which is how to build strong foundations on Google Cloud Platform. These are the rules, structures and policies that you should put in place to make sure that it’s got strong foundations that can scale and that aren’t going to give you headaches down the line.

[00:00:22] Matthew: So let’s kick off starting with structure. There’s various ways to structure things within the Google Cloud Platform. I think we’ve already touched it on a previous video, but you have organizations sitting at the very top. So if you can set up an organisation, I would advise you do so.

Setting Up Organisations and Projects

[00:00:36] Matthew: And then within those organisations, you have various different groups. cloud projects. So these are the containers and within those containers you have your services. Structure it in that way. Keep things focused. There’s a tendency for people to want to stuff loads of different solutions into a single GCP project, where it’s really not necessary and it gets a bit confusing 

Using Folders for Organisation

[00:00:53] Matthew: you don’t pay for an instance of a GCP project, you pay for services, so you may as well split them out to be single use and structure them in that way. Also make use of folders because folders are a great way of organising different types of projects, different teams even, different projects. Disparate parts of the business into different folders and putting your GCP projects into each of those folders makes it really nice and organised.

Automate Processes

[00:01:14] Matthew: Next, automate. Whenever you’re thinking of a solution, thinking of adding in new data, thinking of creating a pipeline, thinking of building out reporting tables, think, can I automate this? And the answer is almost certainly yes within the GCP. What you don’t want to do is have loads of little manual processes that you have to be undertaking across the platform because as you grow, as you scale, as you bring in more data, as you build more advanced, projects, those small manual tasks are going to get bigger and bigger.

Naming Conventions

[00:01:44] Matthew: And down the line, you’re either going to find yourself expending a lot of time and effort doing these manual processes or with a pretty hefty and expensive migration task on your hands. Next naming conventions and how you name things, not just from a project level, but down into the services, down into the tables, down into the data sets and columns, etc.

[00:02:01] Matthew: You need a joined up approach to this across the organisation, because if you don’t, it will very quickly descend into chaos.

Security and Least Privilege

[00:02:08] Matthew: Well, the biggest things I do notice is very nondescript names for projects So I might just say like Google Analytics, or it may be just a random name and that’s Very difficult for someone coming to that without any eyes on it previously to understand what it is What is that project?

[00:02:21] Matthew: So try and come up with a naming convention that gets a message across very quickly there’s loads of loads of resources online to find best ways of doing it Also think about it within your services also think about it with your tables datasets, you know your column names You want to label as much as you can? so Make sure you all have a joined up approach of how you do so Moving on to security. The principles of least privilege are really important here. Make sure that people only have access to the services that they need to have access to in order to perform their role.

[00:02:48] Matthew: And that goes for people and it goes for other platforms or the services . It’s reduces down. risk and attack surface. Attack surface being places of ingress that somebody could feasibly cause you issues with some some sort of attack.

Privileged Access Manager

[00:03:04] Matthew: A lot of time you’ll see agencies have been brought in They’ve set up and done a job, for you, but their emails and their access remained. I’m not saying the agency is nefarious, but just having their emails in there and having other points of ingress, it can be a problem.

[00:03:19] Matthew: Google’s actually released a new feature for the IAMs within GCP called privileged access manager. And this allows you to give people a level of access but put a time frame on it. 

[00:03:29] Matthew: So you could make someone viewer or owner for a certain amount of time and then it will automatically expire when they’re done. 

[00:03:35] Matthew: Try to follow good data practices. Only bring in the data you need. There’s no need to be bringing in everything just because you can. Over years that will build up and become more and more expensive.

Good Data Practices

[00:03:43] Matthew: Also try to stick with good practices like clustering, like partitions, like lifecycle management on Google Cloud Storage resources. And this will keep query costs and storage costs down over the long term.

[00:03:56] Matthew: Try to keep it tidy. Over years of experimentation and different projects and adding this in, adding that in, keep it tidy. Things can get really messy. So don’t be afraid to delete. Say, archive. Whenever you’re setting up a new project, try to give it a clear definition as to what it is, what it’s for, in what circumstances it can be deleted.

[00:04:14] Matthew: Is it a temporary thing? Is it something that needs to be permanent? Clean up your GCP projects. Clean up your resources within those projects when they’re no longer needed. 

Monitoring and Budget Alerting

[00:04:23] Matthew: Set up monitoring. Monitor data use, monitor costs, monitor services. Set up budget alerting, so you can understand if thresholds have been passed, and if things are more expensive than you want. Understand what people are doing, what services they’re accessing. If you have multiple different GCP projects, you can sync all that data into another GCP project, put that in BigQuery and then out to Visual Studio or something along those lines. And that is going to be a much more accessible place for a wider set of stakeholders at the company.

[00:04:51] Matthew: But make sure you do it, make sure you monitor, make sure you have good visibility on what’s going on within the platform.

[00:04:56] Matthew: Tag, label and describe everything. You can’t really move within the GCP without coming across some sort of description box or label or tagging. It could be on any given service but it’s also on every column, all the schemas within BigQuery, all the data sets.

Tagging, Labelling, and Descriptions

[00:05:10] Matthew: Do the naming, do the work to tag, to label because it’s going to make things a lot easier for the wider organisation to understand what they are, what can they use, what can’t they use, what does this data say and what doesn’t it say, those kinds of things.

[00:05:23] Matthew: And it’s also going to make it a lot easier to clean up when you need to clean up. It also unlocks the power of things like Dataplex. With all of this other metadata that’s sitting around the various services and data, Dataplex becomes much more useful and can be a really powerful tool of governance and management.

[00:05:38] Matthew: sort of a semantic understanding of what’s going on within the organization’s data and services.

Documentation

[00:05:43] Matthew: It’s also important to document, make sure you’ve got some sort of internal wiki or some sort of internal shared resource that talks about things like the naming conventions, talks about the tags, the services, what GCP projects exist, how it’s structured.

Keep it up to date, that way if somebody leaves the company who’s been really integral to the GCP journey so far. Not all of that knowledge leaves with them and there’s at least a starting point finally, keep up to date. There’s lots of new services and products coming online all the time, lots of changes to it. Generative AI and Google’s focus on AI is really pushing lots of new updates to all of these different services.

If you can make somebody in the organisation responsible for keeping up to date with things that’s going to help you not fall behind and not be using legacy creaking ways of doing things for the long term that’s going to lead to expensive migrations down the line.

[00:06:29] Matthew: That’s it for this week. Like I said before, we’re starting up a new series of these videos again. We had a little bit of a break, but we’re back on it now. So, keep an eye out over the next couple of weeks for the next video. And as always, if you enjoyed the video, please like and subscribe.

[00:06:42] Matthew: If you have any suggestions of videos that you’d like us to do or any comments at all, just please reach out to us at MeasureLab. Thanks.

Share:
Written by

Matthew is the Engineering Lead at Measurelab and loves solving complex problems with code, cloud technology and data. Outside of analytics, he enjoys playing computer games, woodworking and spending time with his young family.

Subscribe to our newsletter: