NeuroscaleEngineering
AI Architecture

Amazon Bedrock vs Google Vertex AI vs Azure AI — The Real Architecture Difference

11 min readBy Neuroscale Engineering
Amazon BedrockVertex AIAzure AI FoundryEnterprise AICloud ArchitectureAgentCoreGeminiGPT-5

Bedrock vs Vertex AI vs Azure AI Architecture

Three Clouds, Three Bets

Every team that builds anything serious on top of an LLM eventually has to pick a managed AI platform. The shortlist is always the same — Amazon Bedrock, Google Vertex AI, Azure AI Foundry — and the marketing pages make them look interchangeable. They are not. Each platform is the visible surface of a very different architectural bet about where the value in enterprise AI actually sits, and once you commit to one, the gravity is real.

The cleanest way to read these platforms is to stop thinking of them as model APIs and start thinking of them as opinions. Bedrock is an opinion about model choice. Vertex is an opinion about silicon and data. Foundry is an opinion about workflow. The architecture follows from the opinion, the trade-offs follow from the architecture, and the bill follows from your traffic shape. Pick the wrong opinion for your workload and you are not buying a platform — you are buying a multi-year migration.

Bedrock: The Marketplace That Doesn't Pick Sides

Amazon Bedrock launched in April 2023 as a thin abstraction over a growing roster of foundation models. Three years later it is still the only major platform where Claude, Llama, Mistral, Cohere, Nova, and DeepSeek all sit behind the same API and the same IAM policy. The architectural commitment is "we will not pick the winning model for you," and the entire stack is built around that.

Underneath, Bedrock runs on AWS-built silicon — Trainium for training and Inferentia for inference — with provisioned throughput as an option for workloads that cannot tolerate cold starts. The serverless path is what you actually want for most apps: pay per token, no instance sizing, no autoscaling drama. For inference-heavy traffic patterns, Bedrock's serverless mode tends to come in roughly 25–30% cheaper per token than equivalent provisioned setups elsewhere, which is the closest thing the three platforms have to a clear cost edge.

The agent story is where Bedrock got serious in 2025. AgentCore went GA in October, AgentCore Evaluations followed in March 2026, and Policy Controls landed in the same release. Policy Controls is the interesting one — it lets you enforce what an agent is allowed to do outside of the agent's own reasoning loop, before any tool gets called. That matters because trusting a model to decide whether it should be allowed to call a tool is exactly the kind of thing that ends up on a postmortem slide. Knowledge Bases now sit on top of S3 Vectors as of December 2025, which trades a bit of recall for a vector storage bill that is roughly 90% cheaper than running OpenSearch or Pinecone alongside.

The catch with Bedrock is the one you would expect from any AWS service: there is no single bill. Inference, Knowledge Base storage, Guardrails, agent invocations, CloudWatch logs, and cross-region calls all bill separately, and the total never quite matches what you priced on the calculator. None of this is unusual for AWS — it just means you should expect to spend a week with Cost Explorer before you actually understand what your AI feature costs.

Vertex AI: Google Owns the Whole Stack

Google's bet is structurally different. Where Bedrock is a marketplace, Vertex AI is the productized exhaust of an integrated stack that goes all the way down to silicon. Google designs the TPU, trains Gemini on the TPU, serves Gemini from the TPU, and stores your grounding data in BigQuery — which already speaks to Vertex over a private fabric. There are seams in this stack, but most of them are inside Google.

The 2026 generation of TPUs is what makes this story credible at scale. The TPU 8t variant targets training, the TPU 8i variant targets serving with 288 GB of HBM and 384 MB of on-chip SRAM, and Google quotes around 80% better performance per dollar versus the previous generation. For batch inference above roughly 10,000 requests per hour, that hardware advantage shows up in your invoice. Below that threshold, the gap closes and you are mostly choosing on developer experience.

Vertex's other architectural moat is the BigQuery integration. You can run an embedding model directly from a SQL query, store the vector back into a BigQuery column, and call Gemini against the row without the data ever leaving Google's network. For teams that already have a serious data warehouse, this collapses the data prep pipeline that would otherwise be three glue jobs and a Lambda. AutoML on Vertex remains the strongest one-click path from raw data to a fine-tuned model, and Google's own benchmarks have it cutting training time by 40–60% versus a hand-rolled equivalent — believable if your data is clean, optimistic if it isn't.

Agentic workloads on Vertex live inside the Gemini Enterprise platform. Google originated the A2A — agent-to-agent — protocol in April 2025, and Vertex was the first platform to ship a managed runtime for multi-agent coordination using it. If you believe agents are the next architectural unit, Vertex has the head start. If you don't, you are paying for it anyway.

The real cost of Vertex is gravitational, not financial. Once your data is in BigQuery and your inference is on TPU, you are not casually moving to another cloud. Google gives you a beautiful integrated stack, and beautiful integrated stacks have a way of becoming permanent.

Azure AI Foundry: The OpenAI Moat in a Microsoft Wrapper

Azure AI Foundry, which Microsoft renamed from Azure AI Studio in 2024, is the strangest of the three because its architecture is downstream of a partnership rather than a pure engineering decision. Azure runs OpenAI's flagship models — GPT-4o, o3, o4-mini, and the GPT-5 family for regulated tenants — under an exclusivity arrangement that no other hyperscaler has been able to match. The real GPT models, with the real Azure compliance boundary around them, only run here.

That partnership is the moat, and the rest of Foundry is built to amplify it. The platform plugs cleanly into Microsoft 365, Teams, Power Platform, GitHub Copilot, and Entra ID, which means an enterprise that is already in Microsoft's gravity well can ship a working AI feature against existing identity, existing data residency policies, and existing audit pipelines. Azure quotes sub-200ms p50 latency for most inference paths and horizontal scale beyond 1,000 requests per second. The numbers are fine. The reason Foundry wins deals is that the AI shows up inside the tool the user already has open.

The cost model is where Foundry diverges. Provisioned Throughput Units (PTUs) are the production primitive — you reserve capacity in advance and Azure guarantees latency and throughput against it. PTUs can cut cost by up to 70% on workloads that actually run flat, and they will quietly destroy your unit economics on workloads that are bursty. The mistake teams make is buying PTUs because predictable billing feels safer, then realizing six weeks later that they are paying for capacity nobody is using between 2 a.m. and 7 a.m.

Microsoft's agent story arrived later than Google's and Amazon's. The Microsoft Agent Framework shipped as an open-source SDK in December 2025, focused on multi-agent orchestration inside the Microsoft tool surface — Copilot Studio, Fabric, and the Power Platform. It is less abstract than A2A and less integrated than AgentCore, and that is roughly the shape of every Microsoft AI product: not the best primitive, but the one that already lives inside the workflow your enterprise is actually paying for.

Where the Bills Actually Land

The three pricing models look comparable on a spec sheet and diverge sharply in production. For a representative workload of 10–50 million tokens per month, Bedrock generally lands 15–25% cheaper than equivalent setups on the other two, mostly because the serverless path doesn't double-charge you for idle capacity. Vertex pulls ahead once batch volume crosses ten thousand requests per hour, where the TPU economics start to dominate, and again once your grounding data is already in BigQuery. Azure wins on flat, predictable, regulated workloads where PTU reservations make sense and the OpenAI exclusivity is a hard requirement.

The number nobody wants to talk about is egress. If your data lives in S3 and you are inferring on Azure, every gigabyte of context is an egress charge — roughly $0.09 per GB on the cross-cloud path. For a moderately active RAG pipeline, that line alone can add 10–20% to your AI bill, and it is the kind of cost that doesn't show up until quarter-end. The cleanest fix is to keep inference inside the cloud where your data already lives, which is also why most of the "what platform should we pick" decisions are really "what cloud are we already on" decisions wearing a hat.

How to Actually Choose

The honest decision tree is short. If you are already an AWS shop and you want optionality across model vendors, Bedrock is the right answer. If your data is in BigQuery, your team trusts Google's silicon roadmap, or you are committed to A2A-style multi-agent systems, Vertex AI is the right answer. If you are a Microsoft enterprise, you need GPT models behind a compliance boundary, or your end users live inside Teams and Office, Foundry is the right answer.

The trap is letting a vendor pick for you because of a benchmark deck. None of these platforms is meaningfully ahead on raw model quality for ninety percent of enterprise workloads. The differences that matter are architectural — where the data sits, where identity is enforced, what shape the bill takes, and which workflow surface your users already trust. Get those four right and the model choice is mostly a footnote.

A growing share of serious deployments in 2026 are deliberately multi-cloud — a primary platform handling 80% of traffic and regulated data, a secondary platform serving as a release valve for capacity, pricing, or model availability. That pattern is more expensive on paper and dramatically cheaper the first time a single-vendor outage takes down a customer-facing surface. Concentration risk is a real cost, and the platforms have noticed.

Related

Get notified when we publish

One email per article. No spam. Unsubscribe anytime.

Comments