Why I'm Shifting My Data Team from Writing SQL to Training Agents

My team includes data engineers, analysts, data scientists, and product managers. On the surface, we do everything — building the data warehouse, generating reports, running models, handling requests.

But look closer, and you find an uncomfortable truth: 80% of our time goes to one thing — taking orders and writing SQL.

A product manager fires off a message: “Can you pull last week’s numbers for this feature?” An analyst opens their editor, spends half an hour understanding the ask, tracking down the right tables, writing the query, verifying definitions, pulling the data, and pasting it into the chat. Then waits for the next request. Five to eight times a day.

The Problem Isn’t Efficiency

I’ve always treated efficiency as a core team goal. The whole point of a data platform is to maximize how effectively people use data.

“Build more dashboards, build better self-serve query tools” — I’ve done all of that. It helped a little, but nothing fundamentally changed. Because the root issue isn’t headcount or tooling. It’s that the entire working model is unsustainable.

First, the data-pull model can’t scale. Business demand always grows faster than you can hire analysts. This year headcount is frozen, but request volume is up 30% from last year. Analysts become the bottleneck — not because they’re not working hard, but because they’re trapped in low-value repetitive work. The insight analysis that actually requires human judgment? There’s never time for it.

Second, engineering throughput can’t keep up. Data engineers spend their days writing ETL, producing data. But look at what they’re actually building — most of it is the same thing with slight variations: different dimensions, different metric definitions, different time windows. Average development cycle is too long. The business can’t wait.

Third, data silos can’t persist. Users listen to podcasts on their phone, audiobooks in the car, music on a smart speaker. Three surfaces, three separate data systems, fragmented user profiles. Ask “what does this user like?” and you get three different answers.

These three problems together point to one conclusion: the traditional data team’s way of working will rapidly lose ground in the AI era.

The Turning Point

In my previous piece on ChatBI, I ended with a line: the real goal isn’t getting machines to write SQL — it’s getting machines to understand data. The direction was clear then, but how to actually get there wasn’t.

What pushed me to act was OpenAI publishing their internal data agent approach in early 2026.

Our own ChatBI had been running for over a year, hitting 90%+ accuracy on datasets with 8 joined tables and thousands of fields. Standard queries it handled well. But the moment a question got slightly more complex — involving two tables that look similar but mean different things, or needing to understand the precise definition of an internal metric like “immersive DAU” — it started guessing.

Through the ChatBI project, we’d already learned that knowledge engineering matters more than model selection. We put 70% of our effort into table descriptions, rule libraries, synonym dictionaries, and example query banks. But all of that was manual and scattered, with a hard dependency on engineers and analysts continuously maintaining it.

OpenAI’s approach gave us the framework. They built six layers of context for their agent: table schemas and query history, human-annotated business definitions, field-level semantics extracted by parsing ETL code, organizational knowledge from internal documents, error-correction memory accumulated through use, and finally a live warehouse query fallback.

With all six layers, the agent understands data like a senior analyst who’s been at the company for three years. It knows this table only includes first-party traffic, that field had a data gap in December 2025, and you need to filter out test device IDs when querying in-car data.

With context, SQL generation stops being a guessing game.

What We’re Building

Once I understood this, I redesigned the team’s technical direction: rebuild how the data team works from the ground up, not just bolt an AI tool onto the existing workflow.

The core architecture is three engines.

The first engine is Data Context — the foundation for everything else. We’re building a six-layer context system: auto-collected table metadata and query logs (Layer 1), analyst-annotated metric definitions and business rules (Layer 2), field-level semantics generated by LLM parsing of ETL code (Layer 3), organizational knowledge extracted from internal documents (Layer 4), error-correction memory built up through agent usage (Layer 5), and live warehouse query fallback (Layer 6).

The most underestimated layer is Layer 3, code-level semantics. A table’s real meaning isn’t in its metadata — it’s in the code that produces it. A schema tells you what columns exist; only the ETL code tells you that this table contains only app traffic and excludes in-car and third-party channels. We use an LLM to batch-parse Spark and SQL scripts and auto-generate table descriptions, no manual maintenance required. After this layer went live, the agent’s ability to distinguish between similar-looking tables improved dramatically.

The second engine is Skills & Tools — standardized, reusable capability units. A Text-to-SQL skill built once can serve ad-hoc queries, dashboards, and operational analysis. The goal is 40%+ tool reuse, shifting the team from “build everything from scratch for every request” to “assemble existing capabilities.”

The third engine is the Memory System — connecting long-term user preferences, real-time intent, and cross-device identity. This engine directly powers search and recommendations, making the product feel like it knows you better the more you use it. The data team is no longer just a back-office report-generation department — it’s a core engine driving user experience.

What We Got Wrong

This hasn’t been a smooth ride.

One mistake: we exposed every tool to the agent upfront. The functionality was complete, but the agent made more errors. The reason: overlapping capabilities across tools. Humans pick the right tool from experience; the agent got confused. We simplified — reduced tool count, merged overlapping functions — and accuracy went up.

Another mistake: writing prompts that were too rigid. Early on we gave the agent highly detailed step-by-step instructions: do this first, then do that. What we found was that while many analysis problems share a general shape, the details vary enormously. Over-specified instructions led the agent down the wrong path. We switched to describing the goal without dictating the path, letting the model reason through its own execution steps. Results improved significantly.

The biggest mistake was conceptual: thinking that deploying ChatBI counted as “AI transformation.” ChatBI answers the What — what was last week’s DAU? What the business actually needs is the Why and the How — why did numbers drop last week? What should we do next?

Going from Text-to-SQL to an Analysis Agent is a full system rebuild: you need to add a Code Interpreter, build an analysis reasoning chain, support multi-step inference. It has nothing in common with writing SQL.

What’s Changing on the Team

The most interesting shift isn’t in the technology — it’s in team roles.

Before, the core skill for data engineers and analysts was writing SQL and building reports. Now, they’re developing two new capabilities: annotation and evaluation.

Through annotation, they become the agent’s teachers. Their domain knowledge is exactly the context the agent needs most: what’s the precise definition of this metric? What’s the difference between these two tables? What caused last month’s data anomaly? This knowledge used to live in analysts’ heads. Now it needs to be made explicit, structured, and fed to the agent.

Through evaluation, they become the agent’s examiners. We’ve built an evaluation system using real business questions as test cases, with human-written “ground truth” SQL and answers that we compare against the agent’s outputs. Every model upgrade or context update runs through this test suite. The work requires deep domain knowledge — it’s not something anyone can do.

Honest truth: there’s been friction. Some colleagues who were used to “take the order, write the SQL” found it hard to adapt. They felt that “annotating data” wasn’t what a data engineer should be doing. What they needed to realize was: every definition you annotate, every error you correct, makes the agent smarter. And a smarter agent can free you from five to eight repetitive data pulls a day — so you can do the work that actually requires your judgment.

That’s your competitive edge going forward.

A Formula

Something I keep coming back to:

Individual ultimate value = expertise × AI leverage

If expertise is zero, AI leverage multiplied by zero is still zero. If expertise runs deep, AI becomes a 10,000x amplifier.

The formula works for individuals and teams. A data team’s expertise is its understanding of the business, its intuition about data, its sensitivity to definition nuance. None of that gets replaced by AI. But if you’re still expressing that expertise through manually writing SQL, you’re doing 10,000x work with 1x leverage.