It's all judgment. Always has been.

The question "can you code?" was never really about whether you could code. It was a test of whether you belonged in tech. "Can you read a balance sheet?" was a test of whether you belonged in finance. "Can you diagnose a heart murmur?" was a test of whether you belonged in medicine. Nominally they were skill tests. In practice they were selection filters for something deeper.

What they were measuring was judgment — the accumulated sense of when something is wrong, when a number doesn't add up, when a patient is sicker than they look, when a system is about to fall over. Skill was the proxy for that, and a good enough one that most people stopped noticing it was a proxy and started treating it as the thing itself. The correlation worked because you couldn't acquire deep coding skill without also spending enough time in the room to learn when things broke. The being-in-the-room was the part that mattered. The skill came along for the ride.

Then AI made the skill cheap.

This is not news. Everyone has figured out by now that AI can draft a memo, write working code, summarize a filing, produce a first pass at a balance-sheet analysis. What most people haven't fully absorbed yet is what that commoditization actually changed. It didn't just make a tool cheaper. It uncoupled skill from judgment. A person with AI can now produce skill-level outputs without ever having been in the room when the thing broke. The tests we built to identify "belongs here" are still sampling skill, but skill no longer proves what it used to prove.

· · ·

I'm a CPA. My license is a judgment credential. It isn't a tax-code memorization test or a bookkeeping speed test. It's the downstream result of a very expensive, very slow process: four years of supervised practice, an exam that's mostly about reasoning under ambiguity, ethics training that exists to keep the frame in front of you, and the right at the end to sign an opinion that real people stake real decisions on. The signature isn't the skill. The signature is the judgment. That's what gets certified, and that's what can get taken away.

I'm also building an AI accounting firm. The platform is TypeScript, Python, Docker, Azure. I didn't write most of the code. What I wrote is the spec — what "correct" means for a bank reconciliation in this industry, what materiality looks like for this client at this stage, which anomalies to escalate and which to absorb, which journal entry is reasonable and which one would cost me my license. In 2016 nobody would have funded a CPA to build a software company, because the execution gap was too wide for judgment alone to close. In 2026 the execution gap is small. The judgment gap is what the market is paying for. The thing I spent ten years earning is, for the first time, the most valuable thing on the balance sheet.

· · ·

This is not a story about me. It's about the same insight showing up at three scales at once.

At the individual scale, what you're actually doing when you prompt an AI is encoding intent — telling the model what you think good looks like for this specific task. The AI has no intent of its own. It will execute on whatever intent you supply, including a stupid one. What separates useful output from slop is the judgment embedded in the instructions, not the capability of the model. When a controller prompts an AI for a bank reconciliation, she isn't asking the model to reconcile. She's telling it what reconciled means in her company, which G/L accounts tie out and which don't, which reconciling items are normal and which should trigger a call to the bank. All of that is judgment she earned by doing the work. Someone who hasn't done the work can't supply it, because they don't know what they don't know.

At the team scale, the winning shape in vertical AI has stabilized into the same pattern everywhere you look. The dominant legal AI platform was founded by a securities litigator paired with a machine learning researcher. The dominant clinical documentation AI was founded by a practicing cardiologist paired with an ML professor. The lawyer and the cardiologist didn't outsource the hard part. They were the hard part. The judgment about what a well-drafted discovery response looks like, or which physician note structures catch readmissions — that is the scarce thing, and that is what the domain expert brings. The execution can come from the researcher, or increasingly, from the model. Agentic CPA is the same pattern: the CPA supplies the judgment, the model supplies the execution.

At the executive scale, what the CEO is supposed to be bringing is judgment — about what to build, who to hire, what to ship, what risks to take, who to trust. When execution was hard, technical founders made sense because they could both supply the judgment and do the work. When execution gets cheap, judgment is all that's left at the top. It's also the most dangerous job to get wrong, because the AI below will execute on whatever intent is supplied. If the intent at the top is bad, every layer below it amplifies the error at scale.

· · ·

Here is the hard problem. Skill is testable in an afternoon. You can sit someone down with a problem, a rubric, and a clock, and in under an hour know whether they can do the thing. Judgment isn't. Any individual judgment call looks defensible ex ante. Only the track record, accumulated over years against real stakes, reveals whether the person was actually calibrated. The reason every serious profession built licensure is that they absorbed this lesson a century ago. Bar admission, medical residency, CPA supervised practice — these exist because judgment can only be evaluated by putting a person in the job, watching them work under pressure, and noting who holds up. It takes years. There is no shortcut. The professions paid the cost because the alternative was worse.

AI leadership does not have this machinery yet. We are hiring AI CEOs the way we hired tech CEOs in 2015 — on founder mythology and capital allocation track records, neither of which is a judgment test. The skill tests we would have used are now easy to pass, because the AI can supply the skill layer. What we are left with is no test at all. And the gap between "has good judgment" and "sounds like they have good judgment" is wider than it has ever been, because AI has also lowered the cost of sounding credible.

· · ·

If you are a controller or CFO buying AI this quarter, the question I would ask is not how many ML engineers are on the vendor's team. The question is: who on that team has closed books, signed an opinion, or been personally accountable for being wrong about something that mattered. Who has earned the judgment the AI is being asked to encode. Because the AI is only ever as good as the intent it receives. And the intent is only ever as good as the person who had to live with the consequences last time it was wrong.

It's all judgment. Always has been.

The Scalability of Judgment