You’re sitting on a SQL time bomb: defuse it with Dataform

As a marketing team’s maturity evolves, disparate data sources centralise, increasingly sophisticated SQL queries begin to pull together insights and the stars align. In time data becomes central to all they do and they rely heavily on accurate and timely data to make informed decisions that drive campaign success and maximise customer engagement.  All is well until…BOOM, something breaks, critical reporting is not functioning as expected, stakeholders across the business are beginning to lose the trust in the data you worked so hard to instil.  The culprit? Outdated SQL operations and stacked scheduled queries.

Bottlenecks in knowledge, a lack of any sort of dev/prod workflow and no versioning unfortunately meant this was only a matter of time, it was a ticking time bomb…

The fragile foundation of stacked scheduled queries

Stacked scheduled queries involve a series of dependent SQL queries that run on a set schedule, each building on the results of the previous one. Any updates or edits are often made directly on live ‘data products,’ adding to the risk of failure. On top of this, these workflows often lack robust version control (with maybe some good eggs manually backing up SQL in a git repo). This makes it challenging to track changes or allow multiple team members to work concurrently without conflicts. To be clear here this approach is fine, up until a point. If you have one or two data sources and a couple of data models built on top, unpicking that basic tapestry should not be too much of a challenge should things go wrong. I will add a caveat to my caveat here though, if you can build your processes using Dataform from the off, definitely do so as it saves you a rebuild down the line. 

To summarise, some of the main issues with basic SQL approaches:

  • Single Points of Failure: If one query in the chain fails, it can disrupt the entire data pipeline, leading to incomplete or inaccurate data.
  • Lack of Transparency: As the number of queries grows, understanding the data flow becomes increasingly complex, making it harder to troubleshoot issues.
  • Scalability Challenges: Stacked queries are not built to efficiently handle large data volumes or adapt to changing business needs.
  • Manual Intervention: These systems often need manual oversight to ensure they run properly, increasing the risk of human error.

The hidden costs of outdated practices

You might think, so what. If it breaks I know where the problem will be, I built this masterpiece! But there are costs that you may not initially consider. 

It may take time to fix issues, delaying decisions and potentially having big impacts on how agile you can be as a business. What if an error in reporting is not obvious, something somewhere in the stack has gone wrong but it has not broken the downstream reporting. You may well be taking those errors into account in budgeting, forecasting, spending and compliance decisions. 

Anyone who has worked with stakeholders and data knows how hard fought buy-in and trust is when it comes to data. One undetected issue or misinterpreted value could well undo all the work undertaken to instill that trust. 

Should an problem arise there is the sticky issue of fixing it. Digging through long complex SQL statements chained together is a lengthy and difficult process, especially if you did not originally put that SQL together. Over the lifetime of a solution many many hours will be dedicated to trying to understand the SQL before a fix can even be undertaken. This is to not even mention all the repeated pieces of SQL being written over and over by individuals working in a silos.

  • Delayed Decision-Making: Inaccurate or incomplete data can delay decision-making processes, reducing business agility.
  • Financial Losses: Errors in reporting can lead to financial inaccuracies, affecting budgeting, forecasting, and compliance.
  • Eroded Trust: Stakeholders may lose confidence in the data, undermining the credibility of the marketing analytics team and the organisation as a whole.
  • Loss of Time: Both analysts and engineers often spend excessive time trying to understand complex query structures and rewriting code that has already been created many times before, leading to inefficiencies and wasted resources.

Embracing modern data engineering with Dataform

To prevent these issues, organisations need to adopt modern data engineering platforms like Dataform or its older brother dbt (I think this small blog by dbt does the best job of explaining why these tools were created), both of which are particularly suited to managing marketing data operations. The choice of tool can depend on your current data location. Dataform is tightly integrated with BigQuery and is free, making it an ideal choice if most of your data is already centralised there. Being Google Cloud partners, Dataform is where we spend a lot of our time.

So how does the adoption of Dataform help with the fragility and the costs of outdated SQL processes?

1. Automated dependency management

Dataform automatically manages dependencies between datasets and queries. This ensures that if one dataset changes, all dependent datasets are updated accordingly, eliminating single points of failure. Additions like compiled graphs make it super easy to see and manage dependencies.

Image of a dataform compiled graph showing 4 nodes compiled of 1 source, one output and 2 assertions.
Image taken from Google documentation on Dataform https://cloud.google.com/blog/products/data-analytics/transform-sql-into-sqlx-for-dataform/

2. Version control integration

By integrating with version control systems like Git, Dataform allows teams to track changes, collaborate effectively, and roll back to previous versions if necessary. It also allows the team to work on development versions of code bases without fear of breaking production level data products. 

3. Testing and validation

Dataform enables teams to include tests within the data workflow. This ensures data quality by validating datasets at each stage of the pipeline. These are called assertions and can be configured to inform you of issues, or stop a pipeline altogether.

4. Scalability and performance

Designed to handle large-scale data operations, Dataform optimises query execution and resource usage, allowing your data infrastructure to grow with your marketing data needs.

5. Improved transparency

With a clear visual representation of data workflows, Dataform makes it easier to understand and troubleshoot the data pipeline, reducing downtime and increasing efficiency. Dataform also moves you to a modular approach to SQL, making it much simpler to understand and update when necessary.

6. JavaScript for flexibility

Dataform supports the use of JavaScript (JS), providing additional flexibility to transform data beyond the capabilities of standard SQL. The ability to use JS variables and functions for repeated operations makes it easier to create reusable code, reducing redundancy and saving time. This utility allows for more dynamic transformations and custom logic, giving teams greater power to tailor their data workflows to specific requirements. If you so wished you could write your entire data modelling processes in JS!

The road to a resilient SQL data infrastructure

Transitioning to a modern data engineering platform might seem challenging, but the long-term benefits are well worth the effort. Here are a few things to consider to get you started

  • Assess your current infrastructure: Identify the pain points and vulnerabilities in your existing SQL operations.
  • Educate your team: Invest in training to ensure your team is proficient in modern data engineering tools and practices.
  • Plan the migration: Develop a step-by-step plan to transition from stacked scheduled queries to a platform like Dataform.
  • Iterate and improve: Use Dataform’s features to continuously test, monitor, and optimise your data workflows.

Conclusion

The risks of relying on outdated SQL operations are too big to ignore. Continuing to use stacked scheduled queries means taking unnecessary risks with your data integrity and marketing success. Adopting modern data engineering platforms like Dataform is not just a technical upgrade—it’s a strategic move that can protect your organisation’s future.

Measurelab’s advice? Take action now to modernise your SQL operations and build a strong, reliable data infrastructure that supports your marketing success. Reach out to Measurelab who can help with any Dataform questions or projects you may have in mind!

Share:
Written by

Matthew is Head of Engineering and Technology at Measurelab and loves solving complex problems with code, cloud technology and data. Outside of analytics, he enjoys playing computer games, woodworking and spending time with his young family.

Subscribe to our newsletter: