Data pipeline optimisation with Google Cloud and Dataform

Written by Prasanna Venkatesan | April 22, 2025

In our recent engagement with a client, we went on a journey to transform their data pipelines, tackling inefficiencies in performance and cost within their Google Cloud BigQuery environment. Our efforts culminated in a comprehensive optimisation strategy that used Dataform, improved SQL practices, and implemented tailored solutions for significant performance gains and cost savings. Here’s a deep dive into the highlights of our project.

Identifying inefficiencies in BigQuery workflows

We began by analysing the existing data architecture, identifying key areas of inefficiency:

Long view chains: Multiple layers of views increased complexity and degraded performance.
Lack of incremental logic: Transformations processed entire datasets unnecessarily, increasing costs.
Unoptimised tables: Absence of partitioning and clustering led to full table scans.

Transitioning to Dataform

To address these challenges, we transitioned from BigQuery Scheduled Queries to Dataform, unlocking the following benefits:

Improved maintainability: Modular SQL scripts made workflows easier to debug and maintain.
Version control: Seamless integration with Git allowed better collaboration and change tracking.
Automated dependency management: Dependencies were automatically managed, minimising manual intervention and errors.
Incremental processing capabilities: Focused data processing only on updated data, enhancing efficiency.
Assertions for data quality: Automated checks ensured consistency and reliability throughout the pipeline.

Key optimisation strategies

Query structure improvements: Refactored subqueries into Common Table Expressions (CTEs), improving readability and debugging efficiency.
Consolidation of intermediate tables: Combined redundant views into fewer tables, reducing complexity and redundant operations.
Implementation of incremental tables: Processed only new or updated data, significantly reducing data scanned and associated costs.
Partitioning of tables: Partitioned tables based on relevant date fields, improving query performance.
Eliminating full table scans: Restructured incremental tables to prevent full table scans, ensuring more targeted data processing.

Cost savings analysis

Our optimisation efforts translated into substantial cost savings:

Data processed for key tables:
- BigQuery costs were reduced by over 40% annually due to a 92% reduction in data processed for some large output tables.
Dashboard optimisation:
- Dashboard redesigns also contributed to query optimisation. By separating lightweight and heavy charts onto different pages and consolidating scorecards, redundant queries and unnecessary filters were eliminated. This resulted in a substantial reduction – hundreds of GBs per dashboard load – from the final output tables.
Operational efficiencies:
- Reduced manual dependency management, debugging, and new model setup, saving approximately 32 engineer hours monthly, saving close to 20% in cost per month.

Conclusion

This project demonstrates how targeted optimisations can transform data pipelines, improving performance while dramatically reducing costs. Leveraging tools like Dataform and best practices in SQL and BigQuery, we delivered a smarter, more efficient solution tailored to the client’s needs.

Resources

Read our blog post on Dataform for BigQuery: A basic end-to-end guide.
Watch our Behind the Cloud series on YouTube.

Tags: Dataform Google Cloud

Written by

Prasanna Venkatesan

Prasanna is a Senior Analytics Engineer at Measurelab. With a passion for building scalable and efficient data solutions, he is dedicated to making data more accessible and actionable for businesses. Recently, he has been exploring Machine Learning and AI to enhance data engineering processes and drive deeper insights. As a certified Google Cloud and AWS Data Engineer, Prasanna brings expertise in cloud-based data architectures, automation, and advanced analytics to help organizations unlock the full potential of their data.

Data pipeline optimisation with Google Cloud and Dataform

Identifying inefficiencies in BigQuery workflows

Transitioning to Dataform

Key optimisation strategies

Cost savings analysis

Conclusion

Resources

Prasanna Venkatesan

Further reading

Dataform for BigQuery: A basic end-to-end guide

Mastering data loading in BigQuery using Dataform

A step-by-step guide to migrating scheduled queries to Dataform

Data pipeline optimisation with Google Cloud and Dataform

Identifying inefficiencies in BigQuery workflows

Transitioning to Dataform

Key optimisation strategies

Cost savings analysis

Conclusion

Resources

Prasanna Venkatesan

Subscribe to our newsletter:

Further reading

Dataform for BigQuery: A basic end-to-end guide

Mastering data loading in BigQuery using Dataform

A step-by-step guide to migrating scheduled queries to Dataform