"Shift-left" testing became the dominant quality engineering philosophy of the 2020s: catch bugs earlier by running tests closer to the developer. Write unit tests. Add integration tests. Gate on CI. The logic was sound and the results were real.
But shift-left has a blind spot: production is different from every pre-production environment. Real users hit edge cases no test anticipated. Production data has patterns your seeds never replicated. CDN caching behaviors differ from local runs. Infrastructure degrades in ways staging never simulates.
Shift-everywhere testing completes the quality loop. It extends automated quality verification into the production environment itself, using synthetic monitoring, real-user monitoring (RUM), and controlled chaos engineering to catch what CI cannot.
The Full Quality Lifecycle
SHIFT-EVERYWHERE QUALITY PIPELINE:
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐
│ Unit │ │ Contract │ │ E2E / │ │ Production │
│ Tests │ │ Tests │ │ Perf CI │ │ Monitoring │
│ (local) │ │ (PR) │ │ (merge) │ │ (always-on) │
└─────────┘ └──────────┘ └──────────┘ └──────────────────┘
◄──── Traditional Shift-Left ────► ◄── Shift-Everywhere ──►The right side of this pipeline — production quality monitoring — is the shift-everywhere addition most teams are missing.
1. Synthetic Monitoring: Automated E2E Tests in Production
Synthetic monitoring runs Playwright (or Puppeteer) tests against your live production environment on a schedule. Unlike real user traffic, synthetic tests run at defined intervals, from specific geographic locations, and always exercise the same flows — giving you a deterministic signal for critical path availability.
// tests/synthetic/checkout-monitor.spec.ts
// This runs against PRODUCTION every 5 minutes via a scheduled workflow
import { test, expect } from '@playwright/test';
test('critical checkout flow is operational', async ({ page }) => {
// Use a test account with a pre-loaded balance — never a real card
await page.goto(process.env.PROD_URL + '/checkout');
await page.getByLabel('Email').fill(process.env.SYNTHETIC_TEST_EMAIL!);
await page.getByLabel('Password').fill(process.env.SYNTHETIC_TEST_PASSWORD!);
await page.getByRole('button', { name: 'Continue' }).click();
// Verify checkout page loads and key elements are present
await expect(page.getByRole('heading', { name: 'Order Summary' })).toBeVisible({
timeout: 10000,
});
// Verify price calculation is displaying (data integrity check)
const total = page.locator('[data-testid="order-total"]');
await expect(total).toBeVisible();
await expect(total).not.toContainText('NaN');
await expect(total).not.toContainText('undefined');
});# .github/workflows/synthetic-monitoring.yml
name: Synthetic Production Monitor
on:
schedule:
- cron: '*/5 * * * *' # Every 5 minutes
jobs:
synthetic:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '22' }
- run: npm ci
- run: npx playwright install chromium --with-deps
- name: Run Synthetic Monitor
env:
PROD_URL: ${{ secrets.PROD_URL }}
SYNTHETIC_TEST_EMAIL: ${{ secrets.SYNTHETIC_TEST_EMAIL }}
SYNTHETIC_TEST_PASSWORD: ${{ secrets.SYNTHETIC_TEST_PASSWORD }}
run: npx playwright test tests/synthetic/
- name: Alert on Failure
if: failure()
uses: slackapi/slack-github-action@v1.26.0
with:
payload: '{"text":"🚨 Production synthetic test FAILED: checkout flow is broken"}'
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}2. Real-User Monitoring (RUM): Measuring Actual Performance
Synthetic tests measure the happy path from a controlled location. RUM measures what actual users experience across all their device types, network speeds, and geographic regions.
Integrate Web Vitals tracking directly into your Next.js application:
// app/layout.tsx
import { useReportWebVitals } from 'next/web-vitals';
export function WebVitalsReporter() {
useReportWebVitals((metric) => {
// Send to your analytics platform
const body = {
name: metric.name, // LCP, FID, CLS, TTFB, INP
value: metric.value,
rating: metric.rating, // 'good', 'needs-improvement', 'poor'
url: window.location.pathname,
timestamp: Date.now(),
};
// Use sendBeacon for non-blocking analytics
navigator.sendBeacon('/api/vitals', JSON.stringify(body));
});
return null;
}Set up alerting when Core Web Vitals degrade below your SLOs:
// app/api/vitals/route.ts
import { NextResponse } from 'next/server';
const THRESHOLDS = {
LCP: { good: 2500, poor: 4000 }, // Largest Contentful Paint
INP: { good: 200, poor: 500 }, // Interaction to Next Paint
CLS: { good: 0.1, poor: 0.25 }, // Cumulative Layout Shift
TTFB: { good: 800, poor: 1800 }, // Time to First Byte
};
export async function POST(request: Request) {
const metric = await request.json();
const threshold = THRESHOLDS[metric.name as keyof typeof THRESHOLDS];
if (threshold && metric.value > threshold.poor) {
// Alert: this user experienced a "poor" web vital
await sendPagerDutyAlert({
summary: `Core Web Vital degraded: ${metric.name} = ${metric.value}ms on ${metric.url}`,
severity: 'warning',
});
}
return NextResponse.json({ received: true });
}3. Error Rate Monitoring as a Quality Gate
Deploy an error tracking tool (Sentry, Axiom, or Datadog) and configure post-deployment quality gates that automatically roll back if error rates spike:
// scripts/post-deploy-check.ts
// Run this 5 minutes after every production deployment
async function postDeployQualityCheck() {
const deployTime = new Date(process.env.DEPLOY_TIMESTAMP!);
const now = new Date();
// Query error rate for the last 5 minutes from your observability platform
const errorRate = await queryErrorRate({
from: deployTime,
to: now,
environment: 'production',
});
const BASELINE_ERROR_RATE = 0.01; // 1% — your established normal
const REGRESSION_THRESHOLD = BASELINE_ERROR_RATE * 3; // 3x increase = rollback
if (errorRate > REGRESSION_THRESHOLD) {
console.error(`🚨 Error rate spiked to ${(errorRate * 100).toFixed(2)}% after deploy`);
console.error('Initiating automatic rollback...');
await triggerVercelRollback(process.env.DEPLOYMENT_ID!);
await sendSlackAlert(`Production error rate spiked to ${errorRate}% — rolled back deployment`);
process.exit(1);
}
console.log(`✅ Post-deploy quality check passed. Error rate: ${(errorRate * 100).toFixed(2)}%`);
}4. Chaos Engineering: Deliberately Breaking Production Safely
The most advanced form of shift-everywhere testing is controlled chaos: deliberately injecting failures into production to verify your resilience mechanisms actually work.
Start small with Game Days — scheduled periods where your team deliberately kills a service or saturates a queue and measures system behavior. Document the results in a runbook:
# Game Day Runbook: Database Connection Pool Exhaustion
## Hypothesis
If we exhaust all database connections, the application should:
1. Return a 503 with a `Retry-After` header
2. Queue new requests rather than dropping them
3. Auto-recover within 30 seconds when connections free up
## Method
1. Run: `psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'prod_db'"`
2. Monitor error dashboard for 2 minutes
3. Verify auto-recovery
## Expected vs. Actual
| Expectation | Result | Pass? |
|-------------|--------|-------|
| 503 returned | 500 returned — unhandled exception | ❌ |
| Queue requests | Dropped immediately | ❌ |
| Recover in 30s | Recovered in 45s | ⚠️ |
## Action Items
- Add connection exhaustion handling to middleware
- Implement request queuing with timeoutThe Shift-Everywhere Maturity Model
| Level | Capability | Implementation |
|---|---|---|
| 1 | CI gates only | Unit + E2E tests on PR merge |
| 2 | Synthetic monitoring | Scheduled Playwright in production |
| 3 | RUM + alerting | Web Vitals + error rate dashboards |
| 4 | Post-deploy gates | Automatic rollback on regression |
| 5 | Chaos engineering | Controlled resilience experiments |
Most teams are at Level 1 or 2. Level 3 and 4 are achievable in a single sprint. Level 5 requires cultural maturity and careful tooling.
Conclusion
CI is necessary but not sufficient. Production environments have emergent behaviors that no pre-production setup can fully replicate. Shift-everywhere testing closes the quality loop by monitoring critical user flows synthetically, tracking real user experience with Web Vitals, and using post-deployment error rate gates to catch regressions that bypassed CI. The teams with the highest deployment frequency and the lowest incident rates are the ones who do not stop testing when the code merges.