Your AI Vendor's Privacy Policy Is Not a Security Architecture

In March 2023, Zoom quietly updated its Terms of Service to grant itself "a perpetual, worldwide, non-exclusive, royalty-free, sublicensable, and transferable license" to use customer data for AI training.

Nobody noticed for five months.

Not the Fortune 500 legal teams running board meetings on Zoom. Not the accounting firms discussing client financials. Not the hospitals conducting telehealth. Five months of calls, screen shares, and chats — all covered by terms that let Zoom feed them to AI models.

When a Hacker News post finally flagged it in August, Zoom CEO Eric Yuan called it a "process failure" and reversed the changes within four days. Which raises the obvious question: if it took five months for anyone to notice, what happens when the next company doesn't reverse course?

The parade of policy changes

Zoom wasn't an outlier. It was just the most visible example of a pattern that has now repeated across every major technology company.

Meta has spent the last two years in a cycle with European regulators: announce plans to train AI models on public Facebook and Instagram posts, face pushback, pause, then try again a few months later. Each cycle pushes the boundary a little further. By spring 2025, they were restarting European AI training after updating "safeguards" — the same data, slightly different framing.

LinkedIn turned on AI training for all users by default in November 2025. The opt-out was buried in privacy settings, and it wasn't backdated — everything collected before you opted out stayed in the training data.

And then there's acquisitions. When Facebook bought WhatsApp for $19 billion in 2014, both companies promised nothing would change about WhatsApp's privacy. Two years later, WhatsApp announced it would share user data with Facebook without opt-in consent. The European Commission fined Facebook $122 million for providing misleading information during the acquisition review. The fine didn't undo the data sharing.

Every major technology company that handles your data has either changed its privacy terms in the last three years, is actively trying to, or will. The question isn't whether the policy will change. It's what happens to your data when it does.

Why AI is different from every other tool you use

When your QuickBooks data sits in a database, there's a clear mental model. Your records are rows. If a vendor does something wrong, a regulator can order them to delete those rows. If you leave, you export your data. The records are discrete, identifiable, and removable.

AI training breaks that model completely.

When your data trains an AI model, it becomes part of the model's weights — the mathematical parameters that define how the model behaves. Your revenue figures, your vendor payment terms, your client names — they stop being records and start being patterns that influence how the model responds to every future query, for every future customer. That influence can't be removed. You can't un-train a model on specific data without retraining from scratch. There's no "delete" button for model weights. MIT researchers confirmed in January 2026 that AI models carry significant memorization risk — they can reproduce training data verbatim under the right conditions.

In March 2023, Samsung engineers used ChatGPT — a SOC 2 compliant service — for three tasks over twenty days. They entered proprietary source code, semiconductor yield data, and internal meeting transcripts. All three incidents were accidental. All three sent proprietary information into OpenAI's training pipeline. SOC 2 didn't prevent it, couldn't detect it, and offered no path to recover the data.

With traditional software, a privacy violation means someone saw data they shouldn't have. With AI training, a privacy violation means your data has been permanently absorbed into a system that serves millions of other users, and no one can fully undo it.

SOC 2 doesn't solve this

SOC 2 is the standard answer when a CFO asks about AI security. It shows up on every vendor's trust page, usually with a shield icon nearby. It's a real certification covering real things: access management, encryption, monitoring policies, incident response. For traditional SaaS, it's a reasonable baseline.

For AI, it has three blind spots that matter.

Training data absorption. SOC 2 audits whether a vendor has policies governing who can access your data. It does not audit whether your data enters model weights. A vendor can be fully SOC 2 compliant and still fold your financial statements into a model that serves their other customers.

Inference logging. When you send a prompt to an AI API, the vendor retains that prompt for some period — 7 days, 30 days, sometimes longer. SOC 2 audits whether access to those logs is controlled. It doesn't audit what happens to the content. Are they used for fine-tuning? Aggregated for product improvement? SOC 2 checks the lock on the filing cabinet, not what's being done with the files inside.

Multi-tenant isolation. AI workloads run on shared GPU infrastructure. Academic research has demonstrated side-channel attacks that can extract data across GPU tenants. SOC 2 audits logical access — network segmentation, role-based permissions. It doesn't audit hardware-level isolation on shared compute.

ChatGPT held a SOC 2 certification the day Samsung's engineers pasted in proprietary code. The certification was valid. The security program was real. None of it was designed to prevent, detect, or remediate the actual risk that materialized.

What real security actually looks like

Signal doesn't say "we won't read your messages." Signal says "we can't read your messages."

That distinction is everything. Signal's architecture puts the encryption keys on your device, not on Signal's servers. Even if Signal's CEO decided tomorrow to start reading every message, even if they were acquired by a company that wanted the data, even if they were served a warrant — they can't. The system wasn't designed to allow it. Compliance with privacy isn't a policy. It's a consequence of how the system was built.

In 2024, Signal needed to add contact discovery. Most companies would upload contact lists and match server-side. Signal instead "turned their architecture inside out," building a system where the matching happens without Signal ever learning who's in your address book. That's not a policy decision. That's an engineering project — expensive, slow, and impossible to reverse with a terms-of-service update.

Applied to AI, architectural security means specific things. API calls that are stateless — the vendor processes your request and discards it. Data that never enters a vector database where it could be queried by other customers. Zero-retention options where logs don't just have access controls but literally don't exist. The difference between "our policy says we don't train on your data" and "our system is built so that training on your data would require re-engineering the pipeline."

Anthropic's commercial API is a concrete example of this distinction. Their consumer chat product changed terms in August 2025 — opted-in users now face five-year data retention for training. But the commercial API tier is architecturally separated: data is retained for seven days maximum, never enters a training pipeline, and zero-data-retention is available where nothing persists beyond the active request. The consumer tier and the commercial tier aren't just different policies. They're different systems. If you're evaluating an AI vendor, the question is which tier your data flows through — and whether that separation is a policy line or a pipeline boundary.

Three questions that actually reveal the answer

You don't need a fifty-question vendor assessment to understand how an AI tool handles your financial data. You need three questions. If the vendor's answers are good, the details will follow. If the answers are bad, no amount of additional detail will fix it.

Question one: Where does my data live after the API call completes?

This is the baseline. When you send a prompt containing financial data to an AI service, what happens to that data after you get a response? Is it retained for 7 days? 30 days? Indefinitely? Does it go into a training pipeline? A logging system? A vector database?

A good answer sounds like: "Prompts are retained for 7 days for abuse monitoring, then permanently deleted. They never enter a training pipeline. Zero-retention is available, where no data persists beyond the active request." A bad answer sounds like: "We take data security very seriously and comply with all applicable regulations." That sentence means literally nothing. Push for specifics.

Question two: What would you need to change to start using my data for training?

This is the question most vendors have never been asked, and it's the most revealing one. You're not asking whether they currently train on your data. You're asking what stands between today's promise and tomorrow's policy change. Is the barrier a line in a terms-of-service document? Or is it an engineering project that would take months?

A good answer describes architecture: "We'd need to rebuild our data pipeline, add a new ingestion path from API logs to our training infrastructure, modify our compute isolation, and re-architect our storage layer." A bad answer describes policy: "We'd need to update our privacy policy and notify customers." If the only thing standing between your data and a training pipeline is a document that the vendor's legal team can revise in an afternoon, that's not security. That's a promise.

Question three: Is that change a policy decision or an engineering project?

This is the killer. If the answer to question two was "we'd update our terms of service," then the answer to question three is "policy decision." Your data is one board meeting away from being training data. If the answer to question two involved rebuilding infrastructure, then the answer to question three is "engineering project." That's not a guarantee forever — anything can be re-engineered — but it means the barrier to misuse is measured in months of development work, not minutes of legal review.

Policy changes are retroactive — they apply to every piece of data you've ever sent. Architecture changes are prospective — they can only affect data sent after the change is built and deployed. That asymmetry is your margin of safety.

The risk nobody's talking about: vector databases

Even vendors that don't train on your data may still have a significant exposure if they use retrieval-augmented generation — RAG. This is the architecture where your documents are converted to vector embeddings and stored in a database so the AI can search them when answering questions. It sounds safer than training. In some ways it is. In other ways it creates risks that are less understood and harder to detect.

Vector embeddings were long assumed to be a form of anonymization — the original text is converted to numbers, so it's no longer readable. That assumption is wrong. Research published in 2025 demonstrated that embeddings can be reversed to reconstruct the original text with up to 92% accuracy for short passages. Your financial data stored as vectors isn't anonymized. It's encoded in a format that a motivated attacker can decode.

In multi-tenant environments — where multiple customers' data lives in the same vector database — the risks compound. Security researchers documented a case where a shared vector database inadvertently surfaced one client's proprietary information in response to another client's queries. Red team exercises have found API keys retrieved from markdown files embedded weeks earlier, and non-public board deck content surfaced through general queries about "company priorities."

Then there's data poisoning. A 2024 study called PoisonedRAG demonstrated that inserting just five malicious documents into a corpus of millions was enough to cause a 90% targeted misinformation rate. Five documents. In millions. The AI returned attacker-chosen answers to specific questions with near-perfect reliability. If your financial data lives in a shared RAG system, you're trusting not just the vendor's security but the integrity of every other document in that database.

What CPAs should be asking themselves

AICPA Section 1.700 is unambiguous: a CPA in public practice "shall not disclose any confidential client information without the client's specific consent." If you send client data to an AI vendor whose terms allow training, you may be in violation. The burden is on the CPA to demonstrate adequate safeguards. "They had a privacy policy" is not a safeguard. It's a document.

Specific AI guidance from the AICPA is still emerging, but the principle isn't new. Section 1.700.060 already requires CPAs to enter confidentiality agreements with third-party providers and ensure appropriate controls are in place. The fact that the third party is now an AI model doesn't change the obligation — it makes it harder to fulfill, because the "disclosure" may be permanent and irreversible.

The FTC has weighed in too. In February 2024, they warned that companies quietly changing terms of service to expand data usage could be committing unfair or deceptive practices. They've ordered algorithmic disgorgement — requiring companies to delete AI models trained on improperly collected data. In 2021, Everalbum was forced to destroy its facial recognition models entirely. The remedy existed, but only after the damage was done, and it required destroying the whole model — not surgically removing specific data from it.

Legal enforcement is damage control. It's real, but it's reactive. By the time the FTC orders a model deleted, your client's financial data has already been in the weights for months or years. The right approach is to choose systems where the problem can't occur in the first place.

The architecture test

Here's the principle this all comes down to. The question isn't whether you trust your AI vendor today. You probably do, and you might be right to. The question is: does your trust depend on a document that can change with a board vote, or on a system that can't change without being rebuilt?

Every company in the examples above — Zoom, Meta, LinkedIn, WhatsApp — had privacy policies that users trusted. Some of those policies were sincere when written. All of them changed. The vendors that protected users through those changes weren't the ones with the best policies. They were the ones whose systems made policy changes irrelevant.

When you evaluate an AI tool for your business — especially one that will touch financial data, client information, or anything covered by professional confidentiality obligations — skip the trust page. Skip the shield icons. Ask the three questions. If the answers describe policy, you have a promise. If the answers describe architecture, you have protection. The difference between those two things is the difference between hoping nothing goes wrong and knowing it can't.

* * *

This is the second in a series on AI security for financial teams. Next: "Three Agents, No Trust" — why the multi-agent systems handling your data introduce risks that single-model architectures don't, and what to do about it.