Private Inference vs Cloud AI: What Enterprises Actually Lose When They Send Data to OpenAI
Cloud AI data risks most teams miss: default retention, legal holds overriding deletion, zero-click exploits. Decision framework for private inference.
Here's something most enterprise teams don't know: in June 2025, OpenAI told customers that data they thought was deleted wasn't actually gone.
A court order related to copyright litigation required OpenAI to preserve all data, including content from users who had opted out of retention. Prompts, outputs, conversation histories. All of it, held indefinitely for legal discovery.
This wasn't a breach. Nobody hacked anything. It was a US company complying with a US court order. Completely normal, completely legal, and completely outside what most enterprise customers expected when they signed up.
That incident captures the core issue with cloud AI: the gap between what you assume happens to your data and what actually happens.
The Retention Reality
Most teams never read the fine print on data retention. Here's what it actually says.
OpenAI's standard API tier retains your prompts and outputs for 30 days. That's the default. You don't opt in. It just happens. The stated reason is abuse monitoring, which is legitimate. But 30 days is a long time for sensitive data to sit on someone else's servers.
Want zero retention? You need to be on an enterprise tier, request it specifically, and get approved. It's not a toggle in settings.
Azure OpenAI works similarly. Pay-as-you-go customers get 30-day retention by default. Zero Data Retention requires an Enterprise Agreement or Microsoft Customer Agreement. If you're on a standard subscription, your data sticks around.
ChatGPT Enterprise lets admins configure retention, but the minimum is 90 days. You can't go lower. And conversations are saved indefinitely by default until someone actively deletes them.
The pattern across providers: retention is the default, deletion requires enterprise contracts and explicit configuration, and external factors like litigation can override whatever you configured anyway.
What "Zero Data Retention" Actually Means
ZDR sounds great on a sales call. Your data goes in, gets processed, disappears. Nothing stored, nothing logged, nothing accessible.
That's partially true. With ZDR enabled, providers commit to not writing your data to persistent storage after the request completes. They exclude it from abuse logs. They don't let human reviewers see it. They don't use it for training.
But ZDR is a policy, not architecture.
Your data still exists in memory during processing. It still travels over networks to reach provider infrastructure. Provider employees still have technical ability to access systems where your data flows, even if policy prohibits them from doing so.
And as June 2025 showed, legal obligations can override policy commitments. A court order, a regulatory investigation, a law enforcement request. Your contractual deletion settings don't trump legal requirements in the provider's jurisdiction.
For a lot of use cases, that's fine. General queries, creative tasks, non-sensitive content. ZDR provides reasonable protection.
For genuinely sensitive data, the question is different. Can you prove to auditors that the data never left compliant infrastructure? ZDR doesn't give you that proof. Your own infrastructure does.
The Prompt Injection Problem
OWASP ranked prompt injection as the number one threat to LLM applications in 2025. And unlike traditional security vulnerabilities, this one is hard to patch because it exploits how LLMs work, not bugs in the code.
The basic idea: attackers craft inputs that the model interprets as instructions rather than data. The LLM can't reliably distinguish between "here's content to analyze" and "here are new instructions to follow."
In June 2025, researchers disclosed EchoLeak, a vulnerability in Microsoft 365 Copilot that demonstrated how bad this can get.
An attacker sends you an email. You never open it, never click anything. But Copilot, trying to be helpful, pulls the email into its context when answering your questions. Hidden instructions in the email cause Copilot to grab sensitive data from your session and embed it in an image URL. Microsoft's own servers auto-fetch that image, sending your data to the attacker.
Zero clicks. Zero user interaction. The AI's helpfulness weaponized against its users.
Microsoft patched it before public disclosure, but the vulnerability proved that prompt injection isn't theoretical. Production systems with millions of users can be exploited to exfiltrate corporate data through the AI itself.
When you use cloud AI, your prompts flow through shared infrastructure processing millions of requests. Every document in a RAG pipeline, every email Copilot reads, every file an AI assistant accesses is a potential injection vector. You're trusting that the provider's defenses catch everything. They won't.
The Shadow AI Tax
IBM's 2025 Cost of a Data Breach Report had a striking finding: one in five organizations experienced breaches due to shadow AI. Not authorized enterprise deployments. Employees using consumer AI tools with work data.
These breaches cost $670,000 more than average. They took longer to detect. They hit customer PII and intellectual property harder than other breach types.
The scenario is mundane. Someone pastes customer details into ChatGPT to draft an email. Someone uploads a confidential document to Claude for summarization. Someone uses a free AI tool to analyze financial projections.
Nobody intends to leak anything. The tools are helpful. But consumer tiers don't have enterprise retention policies. That data enters systems governed by consumer terms of service, not your enterprise agreements.
Cloud AI normalizes sending data to external services. When everything looks like a friendly chatbot, employees stop thinking about which data should stay internal. The line between approved tools and personal accounts blurs.
Shadow AI is a cloud AI problem because the pattern of "send data to helpful AI" becomes reflexive. Private infrastructure doesn't eliminate the risk, but it changes the default. Data stays internal unless someone actively moves it out.
What Provider Employees Can Access
Cloud providers implement access controls. They audit employee access. They fire people who violate policies. But technical access exists.
When you send data to a provider, some set of their employees have technical ability to access your prompts, outputs, system instructions, and usage patterns. Access controls reduce who can see what. They don't make access impossible.
The Coinbase breach in May 2025 is instructive. Overseas customer support contractors leaked user data after a $20 million extortion demand. Technical access controls were in place. Human-layer compromise happened anyway.
For most data, provider access controls are sufficient. But for genuinely sensitive material, the question isn't whether the provider means to protect your data. It's whether any external party should have technical access at all.
When Private Actually Makes Sense
Private inference isn't always right. It costs more to operate. It requires infrastructure expertise. For a lot of use cases, cloud APIs win on simplicity and that's fine.
Private inference makes sense in specific situations.
The data would hurt you if it leaked. M&A planning, litigation strategy, customer health records, product roadmaps. Information where exposure to any third party creates real risk.
Auditors need evidence you control. SOC 2, HIPAA, industry-specific compliance. When you need to prove data handling rather than point to vendor attestations.
You're training on proprietary data. If your competitive advantage comes from specialized models trained on internal data, keeping that training data and resulting weights on your infrastructure protects your IP.
You need architectural guarantees. When policy commitments aren't sufficient and you want cryptographic proof that data never left compliant infrastructure.
Cloud AI makes sense when data isn't sensitive, when speed matters more than control, when you lack infrastructure expertise, or when variable workloads make dedicated infrastructure inefficient.
The mistake is defaulting to cloud for everything without asking whether specific use cases warrant private infrastructure.
The Architecture Difference
The difference between private and cloud AI isn't just who owns the servers. It's policy versus architecture.
Cloud AI gives you policy guarantees. The provider commits to certain behaviors. You trust their implementation, their employee compliance, their legal situation.
Private inference gives you architectural guarantees. Data exists in memory during processing, then disappears. There's no retention to configure because retention is impossible by design. Audit trails live in systems you control. Compliance evidence comes from your infrastructure, not vendor attestations.
Self-hosted inference with hardware-signed attestations provides verifiable proof of what happened to your data. Not "we promise we deleted it." Cryptographic evidence that the data was processed on specific hardware running specific code and never persisted.
For enterprise deployments in regulated industries, this changes the compliance conversation. Instead of explaining your vendor's practices to auditors, you show them your own systems.
Swiss jurisdiction adds legal protection on top of technical architecture. The Federal Act on Data Protection limits grounds for data disclosure. Combined with EU adequacy status, Swiss-hosted infrastructure supports cross-border data flows while maintaining defensible compliance.
Making the Switch
Moving from cloud to private doesn't have to be all-or-nothing. Most organizations transition incrementally.
Start by classifying data sensitivity. Which AI use cases involve data that shouldn't leave your infrastructure? Route those to private deployment. Keep general-purpose queries on cloud APIs.
Pilot with a specific workload. Deploy inference infrastructure for one high-sensitivity use case. Validate that latency and throughput meet production requirements before expanding.
Consider fine-tuning. Generic cloud models may underperform on specialized tasks. Fine-tuning on your data improves accuracy while keeping training data private.
Measure systematically. Track performance across private and cloud deployments. Make routing decisions based on actual data, not assumptions about what should be faster or cheaper.
Expand coverage as operational maturity grows. Maintain cloud APIs for genuinely non-sensitive use cases or burst capacity. Private infrastructure for everything that matters.
FAQ
Does OpenAI train on my API data?
Not by default for API, Enterprise, or Business tiers. Consumer ChatGPT is different. But "not training" doesn't mean "not retaining." Default retention is 30 days regardless of training settings.
Is Azure OpenAI safer than OpenAI directly?
Azure adds Microsoft's enterprise security layer. Data stays within Azure infrastructure rather than going to OpenAI's systems. But you're still subject to 30-day retention defaults on pay-as-you-go, and Microsoft has its own legal obligations.
What about Anthropic and Claude?
Anthropic changed policies in 2025. Consumer chats now train models unless you opt out. API and enterprise deployments remain excluded from training. Check current terms for your specific tier.
Can prompt injection be prevented?
Not completely. Microsoft, OpenAI, and others deploy detection systems, but the fundamental vulnerability is how LLMs interpret instructions. Defense-in-depth helps. Elimination isn't currently possible.
How much does private inference cost compared to cloud?
Depends on volume and utilization. High-volume, consistent workloads often cost less on private infrastructure when you factor in per-token pricing, compliance overhead, and retry costs. Low-volume, variable workloads usually favor cloud pay-per-use.
What latency should I expect from self-hosted models?
Sub-100ms is achievable with proper optimization. Comparable to cloud API latency for most use cases. The gap has closed significantly as inference frameworks have matured.
Do I need ML expertise to run private inference?
Some operational knowledge is required, but the bar has lowered. Managed inference platforms handle much of the complexity. You need infrastructure skills more than ML research skills.
What if I'm already locked into a cloud provider?
Start with a hybrid approach. Keep existing workflows on cloud. Add private infrastructure for new sensitive use cases or specific high-risk workloads. Migrate incrementally rather than switching everything at once.
Is this just security theater? Do breaches actually happen through AI?
IBM's 2025 data found 13% of organizations reported AI-related breaches, with 97% of those lacking proper AI access controls. Shadow AI caused 20% of breaches in organizations that experienced them. The risk is real and quantified.
What's the minimum setup for private inference?
A single GPU instance running an open-weight model through vLLM or similar can serve many use cases. Scales from there based on throughput requirements. You don't need a datacenter to start.
The Bottom Line
Cloud AI isn't bad. For most use cases, it works fine. The APIs are convenient, the models are good, and the operational burden is low.
But convenience has tradeoffs. Your data sits on someone else's infrastructure, subject to their policies, their employee access controls, their legal obligations. You're trusting rather than verifying.
For data that would genuinely hurt your organization if exposed, that tradeoff doesn't make sense. Private inference gives you architectural control instead of policy promises. What happens to your data depends on systems you own, not contracts you signed.
The question isn't "cloud or private" for everything. It's "which data requires control I can verify?" Route accordingly.