Data warehouse: Repair broken tables in BigQuery
At Measurelab, we love a bit of data warehousing using BigQuery; in fact, we are obsessed with finding the best approach to managing data warehouses in the most optimal way possible. So in this series, we want to share with you our knowledge and expertise to show what a fantastic and dynamic tool BigQuery is.
As you may know, BigQuery has some super powerful features, which allow us to build, manipulate and even run machine learning algorithms within the interface. However, we’re all human, and things can go wrong- so what do you do when that happens?
Imagine one morning you’re on the BigQuery interface and want to simply do an aggregation of your table to check the volumes, but you forget to clear down the options before running.
Just as the code finishes running, you realise months of daily updates have been lost as the table was accidentally overwritten and, even worse, any reporting in dashboard tools like Data Studio or Tableau using this table will now fail.
Before you look to completely re-run all your historical data which could both end up costing a lot of your time and increase your BigQuery spend, Google Cloud Platform has some great tools which allow you to recover and revert your datasets.
What can you do?
- If you have deleted a table, you have two days to recover.
- If you have an error on your table, you have seven days to recover.
And what isn’t possible?
- Remember at present you cannot restore a deleted dataset
The syntax
The syntax itself for reverting the tables is relatively simple and is not expensive to run.
- The interval can be seconds/hours/days; however, it cannot be over seven days due to the reversion time limit.
- You must remember when you are reverting if you have added correct data in the period you are reverting to you will lose this, so it’s crucial to check when the error occurred if possible.
- You can either use Data Manipulation Language (DML)* or the output functionality to overwrite the table you are reverting to, without a destination, you will simply see the data in a temp table.
Here is a link to the script on Github
Then voilà, check your table has reverted and if all looks well, have a tea and relax in the knowledge your reports feeding on this table will be back up and running when they next process (phew).
Whilst we should always be super careful when amending or overwriting tables if something does go wrong, remember to look at this syntax before you re-run the full history.
* And what is DML I hear you say?
DML stands for Data Manipulation Language. DML statement is used to insert, update or delete the records on tables and is super powerful when creating tables. We will be talking about these fantastic tools in a future post.
If you need any support or want to find out more about how Measurelab can help you leverage the full capability of your BigQuery data or want to know more about the analytics services and training services we offer, please contact us at hello@measurelab.co.uk or lace@measurelab.co.uk.