Sitecore on Azure + Front Door: nice and fast… until Front Door fails

Frank van Rooijen | 30-11-2025

If you host Sitecore on Azure, Azure Front Door is often the default choice for global routing, WAF, caching, and optimized edge delivery. Makessense: one entry point, worldwide PoP network, lots of knobs for performance and security. But it’s still a dependency. And if that dependency falls over, your entire site goes with it.

On October 29, 2025, we got a very loud reminder when Azure Front Door imploded. Not a minor wobble—global trouble with timeouts, DNS issues, and latency. Sitecore services (including XM Cloud) got hit directly: failed builds, deploys that worked sometimes, requests vanishing into timeouts. And no, before anyone starts pointing at partners: this was 100% on Azure’s side.

What happened (short version, because you want to build stuff)

Microsoft pushed a configuration change in Azure Front Door that went sideways. As a result, AFD nodes in multiple regions became unhealthy or routed traffic incorrectly, causing:

Connection timeouts and DNS resolution failures for AFD/CDN customers
Increased latency due to traffic rerouting through less healthy paths
A big chain reaction: Microsoft 365, Azure Portal, and a ton of customer apps went down too

Microsoft eventually rolled the change back and rerouted traffic step by step. By October 30, 2025 at 06:13 UTC, things were stable again.

Impact on Sitecore (aka why your build suddenly broke)

Sitecore’s SaaS products also run on Azure and use Front Door in their delivery chain. So once AFD acts up, you get:

Service degradation in XM Cloud
Intermittent build/deploy failures (the worst kind—sometimes yes, sometimes no)
Request failures/timeouts to delivery and management endpoints
Latency spikes in regions where rerouting was needed

“But our hosting is solid?” — sure, but your stack is only as strong as the weakest dependency

Front Door is a single global ingress. That’s its power and its Achilles’ heel.
Your availability chain usually looks like:

Client → Azure Front Door → (WAF/caching/routing) → App Service / AKS / VM → Sitecore roles / Edge / APIs

If Front Door is down, you never reach the rest. Doesn’t matter how robust your Sitecore setup behind it is.
So no, this wasn’t a “Sitecore problem” or a “partner problem.”
It’s classic cloud reality: shared responsibility + shared blast radius.

Two Front Door incidents close together = time to get serious about failover

One hit: bad luck.
Two hits shortly after each other: architecture action.
Microsoft’s own guidance is pretty clear: build global routing redundancy so you have an alternative path when Front Door isn’t available. Their reference architecture uses Azure Traffic Manager as a fallback router.

Practical meaning:
Two ingress paths
1. Primary: Azure Front Door
2. Fallback: Traffic Manager (or another edge path) routing directly to origins

Traffic Manager runs health probes and can switch over via DNS if AFD fails.
Important: this only works if your origins are also multi-region/redundant. Otherwise you’re failing over to… the same broken back door.

Concrete failover options (without a thesis)

Option A — Front Door + Traffic Manager fallback (Microsoft canonical)
• AFD stays your main edge layer
• Traffic Manager switches to origins if AFD is down
• Best for: mission-critical sites with predictable traffic

Option B — Dual Front Door (active/passive)
• Two AFD profiles in different rings/subscriptions
• Failover via DNS/Traffic Manager
• Best for: orgs that need AFD features (WAF, rules engine) but want redundancy

Option C — Multi-CDN / Edge strategy
• AFD primary, another CDN/edge as backup
• More complex config, smaller single-vendor blast radius
• Best for: large environments where edge availability > simplicity

What we’re doing now (and what you should do too)

Start working on a fallback mechanism. Direction is obvious:

Don’t let Front Door be a single point of failure
Test fallback routing (don’t just draw it)
Prepare runbooks for “AFD down” scenarios
Monitor dependencies—not just “site down?” but “what took it down?”

Goal is simple:
if Front Door fails again, we stay online longer than Microsoft does.

TL;DR for busy people

Azure Front Door outage on Oct 29, 2025 caused widespread disruption, including Sitecore SaaS degradation and build/deploy issues.
Root cause was an Azure config screw-up, so no blame for uxbee/Sitecore partners.
Front Door is powerful but central dependency.
Microsoft explicitly recommends global routing redundancy using Traffic Manager fallback.
Two incidents close together = time to build and test failover for real.