我为什么要把数据团队从写 SQL 转向训练 Agent

我带的团队里有数据工程师、分析师、数据科学家、产品经理，日常看起来什么都做 — 建数仓、出报表、跑模型、接需求。

但如果你走近看，会发现一个令人不安的事实：这个团队 80% 的时间，在做一件事 — 接单写 SQL。

产品经理来一句「帮我看看上周这个功能的表现」，分析师打开编辑器，花半小时理解需求、找表、写查询、核对口径、出数、贴到群里。然后等下一个需求。一天重复五到八次。

问题不是效率

我一直把提效作为团队的一个核心目标，数据平台存在的目标就是要让大家使用数据的效率最大化。

“建更多看板、做更好的自助查询工具”，这些我都做过，都有点用，但都没有真正改变什么。因为问题的根源不是人不够多或工具不够好，而是整个工作方式本身不可持续。

一是取数模式不可持续。业务需求的增长速度永远快过分析师的招聘速度。今年 headcount 冻结了，但需求量比去年多了 30%。分析师变成了瓶颈，不是他们不努力，而是他们被困在了低价值的重复劳动里。需要人类判断力的洞察分析，反而没时间做。

二是研发效率不可持续。数据工程师的日常是写 ETL，生产数据，但仔细看，会发现大量工作是重复的：换个维度、换个口径、换个时间窗口。新需求的平均研发周期太长，业务等不起。

三是数据孤岛不可持续。用户在手机 App 上听播客，在车里听有声书，在智能音箱上听音乐。三个端的数据各自为政，用户画像碎片化。你问「这个用户喜欢什么」，三个系统给你三个不同的答案。

这三个问题叠在一起，指向一个判断：传统数据团队的工作方式，在 AI 时代会迅速失去竞争力。

转折点

上一篇写 ChatBI，我在结尾留了一句：真正的目标不是让机器写 SQL，而是让机器理解数据。当时方向已经清楚，但具体怎么落地还不成形。

让我下决心动手的，是 2026 年初 OpenAI 公开了他们内部 data agent 的做法。

我们自己的 ChatBI 跑了一年多，在 8 张联表、上千字段的数据集上准确率做到了 90%+。标准查询它答得很好。但一旦问题稍微复杂一点，比如涉及两张看起来很像但含义不同的表，或者需要理解「沉浸 DAU」这种公司内部指标的精确口径，它就开始猜。

ChatBI 项目里我们其实已经意识到「知识工程比模型重要」，把 70% 的精力花在建表描述、规则库、同义词、样例库上。但这些做法手工、零散，强依赖数据工程师、分析师持续补。

OpenAI 的做法给了这个框架。他们给 agent 建了六层 context：从表结构、查询历史，到人工标注的业务口径，到通过解析 ETL 代码获取的字段级语义，到散落在内部文档里的组织知识，到 agent 自己在使用中积累的纠错记忆，最后到实时查询数仓的兜底能力。

六层叠加之后，agent 会像一个在公司干了三年的资深分析师一样理解数据。它知道这张表只包含一方流量、那个字段在 2025 年 12 月有过一次数据缺失、查车载数据时要排除测试设备 ID。

有了 context，SQL 生成就不再是碰运气。

我们在做什么

想明白这件事之后，我重新设计了团队的技术方向：从底层重构数据团队的工作方式，而不只是在现有工作上加一个 AI 工具。

核心架构是三个引擎。

第一个引擎是 Data Context，这是所有事情的基础。我们在建一个六层上下文系统，从自动采集的表元数据和查询日志（Layer 1），到分析师标注的指标口径和业务规则（Layer 2），到用 LLM 解析 ETL 代码生成的字段级语义（Layer 3），到从内部文档中提取的组织知识（Layer 4），到 agent 使用中积累的纠错记忆（Layer 5），到实时查询数仓的兜底探查（Layer 6）。

这里面最被低估的是 Layer 3（代码级语义）。表的真正含义不在元数据里，而在生产它的代码里。Schema 告诉你这张表有哪些列，但只有 ETL 代码才能告诉你这张表只包含 App 端流量、不含车载和第三方渠道。我们用 LLM 批量解析 Spark 和 SQL 脚本，自动生成表描述，不需要人工维护。这一层上线之后，agent 区分相似表的能力有了质的提升。

第二个引擎是 Skills & Tools，标准化的可复用能力单元。一个 Text-to-SQL 的 skill 开发一次，在取数、报表、运营分析场景都能用。目标是工具复用率达到 40% 以上，让团队从「每个需求从头写」变成「组装已有能力」。

第三个引擎是 Memory System，打通用户的长期偏好、即时意图和跨端身份。这个引擎直接服务推荐和搜索，让产品「越用越懂你」。数据团队不再只是出报表的后台部门，而是直接驱动用户体验的核心引擎。

踩过的坑

这个过程不是一帆风顺的。

一个坑是一开始把全部工具都暴露给了 agent。功能很全，但 agent 反而更容易出错。原因是工具之间有重叠的功能，人类能凭经验选对的工具，agent 会被搞混。后来我们做了精简 — 减少工具数量，合并重叠功能，准确率反而上去了。

另一个坑是 prompt 写得太死。早期我们给 agent 写了非常详细的操作手册式 prompt，第一步做什么、第二步做什么。结果发现，虽然很多分析问题有共同的大致框架，但细节差异很大。过于死板的指令反而把 agent 引到错误的路径上。后来改成只描述目标，不规定路径，让模型自己推理执行步骤，效果好了很多。

最大的坑是认知层面的：以为上了 ChatBI 就算「AI 转型」了。ChatBI 解决的是「What」：上周 DAU 是多少。但业务真正需要的是「Why」和「How」：为什么上周数据下降了？下一步该怎么做？

从 Text-to-SQL 到 Analysis Agent 是一次系统性重构，需要引入 Code Interpreter、构建分析思维链、支持多步推理，和写 SQL 完全是两回事。

团队在发生什么变化

最有意思的变化不在技术层面，而在团队角色上。

以前，数据工程师和分析师的核心技能是写 SQL 和做报表。现在，他们在构建两种新能力：标注和评测。

标注过程，他们变成了 agent 的老师。他们的业务理解是 agent 最需要的 context：这个指标的精确口径是什么？这两张表的区别是什么？上个月的数据异常是什么原因？这些知识以前存在分析师的脑子里，现在需要显性化、结构化，喂给 agent。

评测过程，他们变成了 agent 的考官。我们建了一套评测系统，用真实的业务问题作为题库，人工编写「标准答案」SQL和回答，然后对比 agent 的输出。每次模型升级或 context 更新，都要过一遍评测集。这份工作需要深厚的业务理解，不是随便谁都能做。

但说实话，这个过程中有阵痛。一些习惯了「接单写 SQL」的同事很不适应，觉得「标注数据」不是数据工程师该干的事。他们其实需要意识到的是：你标注的每一条口径、纠正的每一个错误，都在让 agent 变得更聪明。而一个够聪明的 agent，能把你从每天重复的五到八次取数中解放出来，去做真正需要你判断力的事。

这才是你未来的竞争力。

一个公式

最近我反复想的一件事：

个体的终极效用 = 专业积淀 × AI 杠杆

如果专业积淀为零，AI 杠杆乘上去还是零。如果积淀深厚，AI 就是 10000 倍的放大器。

这个公式对个人成立，对团队也成立。数据团队的专业积淀是对业务的理解、对数据的直觉、对口径的敏感。这些东西不会被 AI 替代。但如果你还在用手动写 SQL 的方式输出这些积淀，你就在用 1x 的杠杆做 10000x 的事。

My team includes data engineers, analysts, data scientists, and product managers. On the surface, we do everything — building the data warehouse, generating reports, running models, handling requests.

But look closer, and you find an uncomfortable truth: 80% of our time goes to one thing — taking orders and writing SQL.

A product manager fires off a message: “Can you pull last week’s numbers for this feature?” An analyst opens their editor, spends half an hour understanding the ask, tracking down the right tables, writing the query, verifying definitions, pulling the data, and pasting it into the chat. Then waits for the next request. Five to eight times a day.

The Problem Isn’t Efficiency

I’ve always treated efficiency as a core team goal. The whole point of a data platform is to maximize how effectively people use data.

“Build more dashboards, build better self-serve query tools” — I’ve done all of that. It helped a little, but nothing fundamentally changed. Because the root issue isn’t headcount or tooling. It’s that the entire working model is unsustainable.

First, the data-pull model can’t scale. Business demand always grows faster than you can hire analysts. This year headcount is frozen, but request volume is up 30% from last year. Analysts become the bottleneck — not because they’re not working hard, but because they’re trapped in low-value repetitive work. The insight analysis that actually requires human judgment? There’s never time for it.

Second, engineering throughput can’t keep up. Data engineers spend their days writing ETL, producing data. But look at what they’re actually building — most of it is the same thing with slight variations: different dimensions, different metric definitions, different time windows. Average development cycle is too long. The business can’t wait.

Third, data silos can’t persist. Users listen to podcasts on their phone, audiobooks in the car, music on a smart speaker. Three surfaces, three separate data systems, fragmented user profiles. Ask “what does this user like?” and you get three different answers.

These three problems together point to one conclusion: the traditional data team’s way of working will rapidly lose ground in the AI era.

The Turning Point

In my previous piece on ChatBI, I ended with a line: the real goal isn’t getting machines to write SQL — it’s getting machines to understand data. The direction was clear then, but how to actually get there wasn’t.

What pushed me to act was OpenAI publishing their internal data agent approach in early 2026.

Our own ChatBI had been running for over a year, hitting 90%+ accuracy on datasets with 8 joined tables and thousands of fields. Standard queries it handled well. But the moment a question got slightly more complex — involving two tables that look similar but mean different things, or needing to understand the precise definition of an internal metric like “immersive DAU” — it started guessing.

Through the ChatBI project, we’d already learned that knowledge engineering matters more than model selection. We put 70% of our effort into table descriptions, rule libraries, synonym dictionaries, and example query banks. But all of that was manual and scattered, with a hard dependency on engineers and analysts continuously maintaining it.

OpenAI’s approach gave us the framework. They built six layers of context for their agent: table schemas and query history, human-annotated business definitions, field-level semantics extracted by parsing ETL code, organizational knowledge from internal documents, error-correction memory accumulated through use, and finally a live warehouse query fallback.

With all six layers, the agent understands data like a senior analyst who’s been at the company for three years. It knows this table only includes first-party traffic, that field had a data gap in December 2025, and you need to filter out test device IDs when querying in-car data.

With context, SQL generation stops being a guessing game.

What We’re Building

Once I understood this, I redesigned the team’s technical direction: rebuild how the data team works from the ground up, not just bolt an AI tool onto the existing workflow.

The core architecture is three engines.

The first engine is Data Context — the foundation for everything else. We’re building a six-layer context system: auto-collected table metadata and query logs (Layer 1), analyst-annotated metric definitions and business rules (Layer 2), field-level semantics generated by LLM parsing of ETL code (Layer 3), organizational knowledge extracted from internal documents (Layer 4), error-correction memory built up through agent usage (Layer 5), and live warehouse query fallback (Layer 6).

The most underestimated layer is Layer 3, code-level semantics. A table’s real meaning isn’t in its metadata — it’s in the code that produces it. A schema tells you what columns exist; only the ETL code tells you that this table contains only app traffic and excludes in-car and third-party channels. We use an LLM to batch-parse Spark and SQL scripts and auto-generate table descriptions, no manual maintenance required. After this layer went live, the agent’s ability to distinguish between similar-looking tables improved dramatically.

The second engine is Skills & Tools — standardized, reusable capability units. A Text-to-SQL skill built once can serve ad-hoc queries, dashboards, and operational analysis. The goal is 40%+ tool reuse, shifting the team from “build everything from scratch for every request” to “assemble existing capabilities.”

The third engine is the Memory System — connecting long-term user preferences, real-time intent, and cross-device identity. This engine directly powers search and recommendations, making the product feel like it knows you better the more you use it. The data team is no longer just a back-office report-generation department — it’s a core engine driving user experience.

What We Got Wrong

This hasn’t been a smooth ride.

One mistake: we exposed every tool to the agent upfront. The functionality was complete, but the agent made more errors. The reason: overlapping capabilities across tools. Humans pick the right tool from experience; the agent got confused. We simplified — reduced tool count, merged overlapping functions — and accuracy went up.

Another mistake: writing prompts that were too rigid. Early on we gave the agent highly detailed step-by-step instructions: do this first, then do that. What we found was that while many analysis problems share a general shape, the details vary enormously. Over-specified instructions led the agent down the wrong path. We switched to describing the goal without dictating the path, letting the model reason through its own execution steps. Results improved significantly.

The biggest mistake was conceptual: thinking that deploying ChatBI counted as “AI transformation.” ChatBI answers the What — what was last week’s DAU? What the business actually needs is the Why and the How — why did numbers drop last week? What should we do next?

Going from Text-to-SQL to an Analysis Agent is a full system rebuild: you need to add a Code Interpreter, build an analysis reasoning chain, support multi-step inference. It has nothing in common with writing SQL.

What’s Changing on the Team

The most interesting shift isn’t in the technology — it’s in team roles.

Before, the core skill for data engineers and analysts was writing SQL and building reports. Now, they’re developing two new capabilities: annotation and evaluation.

Through annotation, they become the agent’s teachers. Their domain knowledge is exactly the context the agent needs most: what’s the precise definition of this metric? What’s the difference between these two tables? What caused last month’s data anomaly? This knowledge used to live in analysts’ heads. Now it needs to be made explicit, structured, and fed to the agent.

Through evaluation, they become the agent’s examiners. We’ve built an evaluation system using real business questions as test cases, with human-written “ground truth” SQL and answers that we compare against the agent’s outputs. Every model upgrade or context update runs through this test suite. The work requires deep domain knowledge — it’s not something anyone can do.

Honest truth: there’s been friction. Some colleagues who were used to “take the order, write the SQL” found it hard to adapt. They felt that “annotating data” wasn’t what a data engineer should be doing. What they needed to realize was: every definition you annotate, every error you correct, makes the agent smarter. And a smarter agent can free you from five to eight repetitive data pulls a day — so you can do the work that actually requires your judgment.

That’s your competitive edge going forward.

A Formula

Something I keep coming back to:

Individual ultimate value = expertise × AI leverage

If expertise is zero, AI leverage multiplied by zero is still zero. If expertise runs deep, AI becomes a 10,000x amplifier.

The formula works for individuals and teams. A data team’s expertise is its understanding of the business, its intuition about data, its sensitivity to definition nuance. None of that gets replaced by AI. But if you’re still expressing that expertise through manually writing SQL, you’re doing 10,000x work with 1x leverage.