The instinct to build everything
When you start an AI company, the gravitational pull is toward building your own model. Fine-tune it. Train it on your data. Differentiate on reasoning quality. Every funded AI startup has a slide about their proprietary model — how it's been trained on domain-specific data, how it outperforms the foundation models on their particular benchmark.
I get it. I felt the same pull. When we started Agentic CPA, the obvious move was to fine-tune a model on accounting data. Build something that "understands" debits and credits at a deeper level than a general-purpose model. Own the intelligence layer.
We didn't do that. Here's why.
The replacement cycle
Think about the last two years of AI models. GPT-4 was the state of the art. Then Claude 3 matched it. Then GPT-4o. Then Claude 3.5 Sonnet, which was meaningfully better than everything before it. Then o1. Then Claude 4. Then GPT-5. Then Claude 4.5. Then GPT-5.3. Then Claude Opus 4.6. Then GPT-5.4. I'm probably forgetting a few — that's the point.
Each one obsoleted the last. Not over years — over months. Sometimes weeks. GPT-4o was retired in February 2026, less than two years after launch. If you shipped a product that depended on being smarter than any specific model, the window before three better models showed up — available through an API for pennies per call — was measured in quarters.
If your differentiation is "we have a better model," your moat has a half-life measured in quarters.
Fine-tuning doesn't fix this. You fine-tune on one generation's base model. A better one comes out three months later. Now you need to fine-tune again. And again. You're on a treadmill, spending engineering hours to maintain a marginal advantage that keeps getting eaten by the next foundation model release.
This isn't a theoretical concern. I've watched multiple AI startups go through exactly this cycle. They spend six months fine-tuning a model, ship it, and three months later the base model from Anthropic or OpenAI can do the same thing out of the box. The fine-tuning advantage evaporated. The engineering hours are gone.
What doesn't change
You know what hasn't changed in two years? The QuickBooks Online API. The Business Central OData layer. Bank feed formats. The way vendor invoices are laid out. Chart of accounts logic. Accrual vs. cash basis rules. The fact that debits equal credits.
The tools that connect AI to real-world accounting systems are stable because the real world is stable. The QuickBooks API doesn't get a major overhaul every six months. Intuit changes it slowly, carefully, with deprecation warnings and migration periods, because millions of businesses depend on it. Same with Business Central. Same with bank feeds. Same with payroll providers.
The reasoning layer is a sprint. The tools layer is a marathon. And marathons reward consistency, not speed.
When we build a tool that reads a vendor invoice, extracts the line items, maps them to the right GL accounts, and posts the entry through the Business Central API — that tool works today, it worked six months ago, and it will work six months from now. The AI model we point at the invoice might change three times in that period. The tool doesn't care. It just needs structured output, and every model worth using can produce structured output.
The build-vs-leverage decision
The reasoning layer is someone else's R&D problem. Anthropic is spending billions per year on it. OpenAI is spending billions per year on it. Google is spending billions per year on it. You cannot outspend them. You shouldn't try.
What you can do is build the domain-specific tooling they'll never build. Anthropic is never going to build a tool that knows how to reconcile a Business Central subledger against a bank feed. OpenAI is never going to build a tool that understands the specific way construction companies handle retainage accounting. Google is never going to build a tool that maps a cannabis dispensary's POS data to the right 280E-compliant chart of accounts.
They don't know your domain. They don't want to know your domain. They want to build the best general-purpose reasoning engine and let you figure out what to point it at.
So let them. Take their reasoning engine — the best one available today, whichever that is — and spend your time building the stuff they can't build. The domain logic. The system integrations. The operational workflows. The error handling for the seventeen weird edge cases that only show up in real accounting work.
The AI labs are building better brains every quarter. Your job isn't to build a brain. It's to build the hands, eyes, and institutional knowledge that make the brain useful.
The Intuit + Anthropic signal
In case this argument felt abstract, Intuit and Anthropic just made it concrete. In February 2026, they announced a multi-year partnership to bring custom AI agents to the Intuit platform. Businesses will be able to build agents using Anthropic's Claude Agent SDK, and Intuit's tools — QuickBooks, TurboTax, Credit Karma, Mailchimp — will be surfaced directly inside Anthropic products via MCP integrations. Rolling out spring 2026.
This is the commoditization happening in real time. The reasoning layer isn't just getting cheaper. It's being wired directly into the platforms that millions of businesses already use. If the AI brain is a feature embedded in your accounting software, it is definitionally not your moat.
But here's what Intuit can't do. They can build an agent that answers questions about your QuickBooks data. They can build an agent that categorizes transactions. What they can't do is build an agent that understands your specific business, manages your specific workflows, catches the specific errors your specific team makes, and coordinates across the three other systems you use alongside QuickBooks.
Intuit's agent will be good at being QuickBooks. It will not be good at being your accounting department. That's the gap. That's where domain tooling lives.
What to build instead
Build the tools. Build the connectors between systems that don't talk to each other. Build the domain logic that turns raw AI reasoning into correct accounting entries. Build the validation layers that catch the mistakes models make. Build the operational infrastructure that lets one CPA oversee the work of fifteen agents across twenty clients.
Let someone else build the brain. They're better at it than you, and they're doing it for functionally zero marginal cost to you. Your API bill for a month of Claude usage costs less than one day of a senior developer's time. That's the trade.
The companies that win in AI-enabled professional services won't be the ones with the best model. They'll be the ones with the best tools, the deepest domain knowledge, and the most reliable operational infrastructure. The model is the commodity input. Everything around the model is the product.
Don't build the agent. Build what the agent needs to be useful.
This is the second in a series on how accounting principles apply to AI agent architecture. Next: why you should teach agents methods, not answers.