Your database will go down. Your external API will timeout. Your cache will be invalidated. Your CDN will have an outage. Every production system, no matter how carefully engineered, will experience partial failures. The question is not whether failures will happen — it is whether your application responds to them gracefully or catastrophically.
Graceful degradation is the architectural practice of designing your application to remain partially functional during failures rather than failing completely. Instead of showing an unhandled 500 error when the database is unavailable, a well-designed system serves cached data, shows a meaningful fallback state, and continues to function for the parts that are still operational.
The Failure Spectrum
FAILURE RESPONSE SPECTRUM:
Catastrophic Partial Failure Graceful Degradation
│ │ │
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌────────────────┐
│ 500 │ │ Blank │ │ Cached content │
│ Error │ │ Screen │ │ + status banner│
│ Page │ │ │ │ + core features│
└────────┘ └──────────┘ └────────────────┘
Users see: Users see: Users see:
"Something Nothing "Some features may
went wrong" be limited. We're
working on it."Pattern 1: Stale-While-Revalidate Caching
The most effective degradation strategy for read-heavy applications is serving cached data when the live source is unavailable:
// lib/cache.ts
import { Redis } from '@upstash/redis';
const redis = Redis.fromEnv();
interface CacheOptions {
ttl: number; // How long fresh data is valid (seconds)
staleTtl?: number; // How long stale data can be served as fallback (seconds)
}
export async function getCachedOrFetch<T>(
key: string,
fetchFn: () => Promise<T>,
options: CacheOptions
): Promise<{ data: T; isStale: boolean }> {
const { ttl, staleTtl = ttl * 10 } = options;
// Try to get fresh cached data first
const cached = await redis.get<{ data: T; cachedAt: number }>(key);
if (cached) {
const ageSeconds = (Date.now() - cached.cachedAt) / 1000;
const isFresh = ageSeconds < ttl;
if (isFresh) {
return { data: cached.data, isStale: false };
}
}
// Try to fetch fresh data
try {
const freshData= await fetchFn();
// Store with full TTL
await redis.set(key, { data: freshData, cachedAt: Date.now() }, { ex: staleTtl });
return { data: freshData, isStale: false };
} catch (error) {
// Fetch failed — return stale cached data if available
if (cached) {
console.warn(`[Cache] Returning stale data for key: ${key}. Reason: ${error}`);
return { data: cached.data, isStale: true };
}
// No stale data available — rethrow
throw error;
}
}// Usage in a Next.js API route
export async function GET(request: Request) {
const { data: products, isStale } = await getCachedOrFetch(
'products:featured',
() => db.query('SELECT * FROM products WHERE featured = true LIMIT 10'),
{ ttl: 300, staleTtl: 3600 } // Fresh for 5min, serve stale for 1hr
);
return NextResponse.json(
{ products, dataFreshness: isStale ? 'stale' : 'live' },
{
headers: {
'Cache-Control': isStale ? 'max-age=60' : 'max-age=300',
'X-Data-Freshness': isStale ? 'stale' : 'live',
},
}
);
}Pattern 2: Circuit Breaker
A circuit breaker prevents a degraded downstream service from being hammered with requests, giving it time to recover:
// lib/circuit-breaker.ts
type CircuitState = 'closed' | 'open' | 'half-open';
interface CircuitBreakerConfig {
failureThreshold: number; // Failures before opening
successThreshold: number; // Successes to close from half-open
timeout: number; // ms before trying again from open state
}
export class CircuitBreaker<T> {
private state: CircuitState = 'closed';
private failures = 0;
private successes = 0;
private nextAttempt = 0;
constructor(
private readonly name: string,
private readonly fn: () => Promise<T>,
private readonly config: CircuitBreakerConfig
) {}
async execute(): Promise<T> {
if (this.state === 'open') {
if (Date.now() < this.nextAttempt) {
throw new Error(`Circuit breaker '${this.name}' is OPEN. Service unavailable.`);
}
// Try again — transition to half-open
this.state= 'half-open';
}
try {
const result= await this.fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
if (this.state= 'half-open') {
this.successes++;
if (this.successes >= this.config.successThreshold) {
this.reset();
}
} else {
this.failures = 0;
}
}
private onFailure() {
this.failures++;
if (this.failures >= this.config.failureThreshold) {
this.state = 'open';
this.nextAttempt = Date.now() + this.config.timeout;
console.error(`[CircuitBreaker] '${this.name}' OPENED after ${this.failures} failures.`);
}
}
private reset() {
this.state = 'closed';
this.failures = 0;
this.successes = 0;
}
getState() { return this.state; }
}
// Usage
const dbCircuit = new CircuitBreaker(
'database',
() => db.query('SELECT 1'), // Health check function
{ failureThreshold: 5, successThreshold: 2, timeout: 30000 }
);
export async function queryWithCircuitBreaker<T>(queryFn: () => Promise<T>): Promise<T> {
if (dbCircuit.getState() === 'open') {
throw new Error('Database circuit is open — serving cached data only');
}
return queryFn();
}
Pattern 3: Fallback Data and Feature Flags
Define explicit fallback states for every data-dependent UI element:
// app/products/page.tsx
import { getCachedOrFetch } from '@/lib/cache';
import { ProductGrid } from '@/components/ProductGrid';
import { DegradedBanner } from '@/components/DegradedBanner';
export default async function ProductsPage() {
let products = [];
let isDegraded = false;
try {
const result = await getCachedOrFetch(
'products:all',
() => db.query('SELECT * FROM products WHERE active = true'),
{ ttl: 300, staleTtl: 7200 }
);
products = result.data;
isDegraded = result.isStale;
} catch (error) {
// Database completely unavailable — use empty fallback
isDegraded = true;
console.error('Products page: database unavailable, showing degraded state');
}
return (
<main>
{isDegraded && (
<DegradedBanner message="Product catalog may be temporarily out of date. Pricing and availability may differ." />
)}
{products.length > 0 ? (
<ProductGrid products={products} />
) : (
<div className="empty-state">
<h2>Our catalog is temporarily unavailable</h2>
<p>Please check back in a few minutes. We apologize for the inconvenience.</p>
<a href="mailto:support@example.com">Contact Support</a>
</div>
)}
</main>
);
}Pattern 4: Health Check Endpoint for Load Balancer Integration
// app/api/health/route.ts
export async function GET() {
const checks = {
database: false,
cache: false,
storage: false,
};
// Check database
try {
await db.query('SELECT 1', [], { timeout: 2000 });
checks.database = true;
} catch { /* fails silently */ }
// Check Redis cache
try {
await redis.ping();
checks.cache = true;
} catch { /* fails silently */ }
// Check file storage
try {
await storage.exists('health-check-file.txt');
checks.storage = true;
} catch { /* fails silently */ }
const allHealthy = Object.values(checks).every(Boolean);
const criticalHealthy = checks.database; // Database is the critical dependency
return Response.json(
{ status: allHealthy ? 'healthy' : criticalHealthy ? 'degraded' : 'unhealthy', checks },
{ status: allHealthy ? 200 : criticalHealthy ? 200 : 503 }
);
}Return 200 for degraded but partially functional — your load balancer should not route away from a degraded server that can still serve cached content. Reserve 503 for completely non-functional states.
Graceful Degradation Checklist
- [ ] Cache reads: Every expensive data fetch has a cache layer with stale-fallback.
- [ ] Empty states: Every data-dependent UI component has an explicit empty/error state.
- [ ] Circuit breakers: Repeated failures to downstream services open circuits instead of cascading.
- [ ] User communication: A degraded banner explains limited functionality without alarming users.
- [ ] Health endpoint:
/api/healthdistinguishes between healthy, degraded, and unhealthy. - [ ] Retry logic: Client-side retries with exponential backoff and idempotency keys.
- [ ] Monitoring: Alerts fire on degraded states before they escalate to complete failures.
Conclusion
Systems fail in production — this is not pessimism, it is physics. The engineering investment in graceful degradation pays dividends every time an infrastructure dependency has an outage and your application continues serving cached content while competitors show error pages. Cache reads with stale fallbacks, circuit breakers that prevent cascade failures, explicit empty states, and meaningful user communication are the patterns that separate resilient production systems from fragile ones. Build the failure modes before you need them.