Platform Engineering: Stop Making Every Team Reinvent the Wheel

Every company reaches a point where teams spend more time fighting infrastructure than building features. Platform engineering fixes this by building an internal developer platform — a paved road that makes the right way the easy way.

The Problem Platform Engineering Solves

Without a platform team, every product team independently figures out:

Team A: "How do we deploy to Kubernetes?"
Team B: "How do we deploy to Kubernetes?"
Team C: "How do we deploy to Kubernetes?"
Team D: "How do we deploy to Kubernetes?"

Result: 4 different deployment pipelines, 4 different patterns,
        4 times the maintenance burden

Each team builds their own:

CI/CD pipelines
Monitoring setup
Secret management
Database provisioning
Service mesh configuration
Logging infrastructure

This is cognitive load — the mental overhead of managing infrastructure that has nothing to do with your product.

What Is an Internal Developer Platform?

An IDP is a self-service layer on top of your infrastructure. Developers interact with the platform, not directly with Kubernetes, Terraform, or AWS.

┌─────────────────────────────────────────┐
│           Developer Experience          │
│  (CLI, Portal, API, Templates)          │
├─────────────────────────────────────────┤
│          Platform Services              │
│  (Deploy, Monitor, Scale, Secure)       │
├─────────────────────────────────────────┤
│          Infrastructure                 │
│  (Kubernetes, AWS, Terraform)           │
└─────────────────────────────────────────┘

Developers see the top layer. The platform team manages everything below.

Before vs After

Task	Without Platform	With Platform
Deploy a service	Write Dockerfile, Helm chart, CI pipeline, ingress rules	`platform deploy`
Create a database	File a ticket, wait 3 days, get credentials	`platform db create --type postgres`
Add monitoring	Learn Prometheus, write dashboards, configure alerts	Automatic — comes with every service
Rotate secrets	Manual process, SSH into servers	`platform secrets rotate`
Spin up a new service	2-3 days of boilerplate	`platform service create my-api`

The Golden Path

A golden path is an opinionated, supported way to do something. It's not the only way — it's the recommended way.

# golden-path/service-template/platform.yaml
kind: Service
metadata:
  name: payment-api
  team: payments
  tier: critical
spec:
  language: typescript
  framework: fastify
  runtime:
    replicas: 3
    cpu: "500m"
    memory: "512Mi"
  database:
    type: postgres
    size: small
  monitoring:
    alerts: true
    dashboard: true
    slo: 99.9
  deployment:
    strategy: rolling
    canary: true
    rollback: automatic

From this single file, the platform provisions:

A Kubernetes deployment with 3 replicas
A PostgreSQL database with backups
Prometheus metrics and Grafana dashboards
Alert rules based on SLOs
A CI/CD pipeline with canary deploys
TLS certificates and ingress
Structured logging shipped to your log aggregator

The developer writes one file. The platform handles the rest.

Building Blocks

1. Service Catalog (Backstage)

Backstage (by Spotify, now CNCF) is the standard for internal developer portals:

// catalog-info.yaml — registered in Backstage
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-api
  description: Handles payment processing
  annotations:
    github.com/project-slug: org/payment-api
    pagerduty.com/service-id: P12345
    grafana/dashboard-selector: payment-api
  tags:
    - typescript
    - fastify
    - payments
spec:
  type: service
  lifecycle: production
  owner: team-payments
  dependsOn:
    - resource:postgres-payments
    - component:user-api

Backstage gives you:

A searchable catalog of all services, APIs, and infrastructure
Ownership tracking (who owns what?)
Tech docs alongside the service
Software templates for creating new services
Plugin ecosystem (PagerDuty, GitHub, Kubernetes, etc.)

2. Software Templates

Instead of copying boilerplate from an old project, use templates:

# backstage-template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: typescript-service
  title: TypeScript Microservice
  description: Create a new TypeScript service with all platform integrations
spec:
  parameters:
    - title: Service Details
      properties:
        name:
          type: string
          description: Service name
        team:
          type: string
          description: Owning team
        database:
          type: string
          enum: [none, postgres, redis]
          default: none

  steps:
    - id: scaffold
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          team: ${{ parameters.team }}

    - id: create-repo
      action: publish:github
      input:
        repoUrl: github.com?owner=org&repo=${{ parameters.name }}

    - id: register
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.create-repo.output.repoContentsUrl }}

A developer clicks "Create Service" in Backstage, fills in 3 fields, and gets:

A GitHub repo with boilerplate code
CI/CD pipeline configured
Kubernetes manifests generated
Monitoring pre-configured
Service registered in the catalog

Time to first deploy: 10 minutes instead of 2 days.

3. Infrastructure as Code

The platform team abstracts infrastructure behind simple interfaces:

# What the platform team manages (Terraform)
module "service" {
  source = "./modules/platform-service"

  name        = var.service_name
  team        = var.team
  tier        = var.tier
  replicas    = var.replicas
  database    = var.database_config
  monitoring  = var.monitoring_config
}

# This module internally handles:
# - EKS namespace
# - IAM roles
# - RDS instance
# - Security groups
# - DNS records
# - Certificate
# - Prometheus ServiceMonitor
# - Grafana dashboard
# - PagerDuty integration

Developers never see Terraform. They interact with the platform abstraction.

4. Developer CLI

A CLI that wraps platform operations:

# Create a new service
platform create service my-api --lang typescript --db postgres

# Deploy
platform deploy --env staging
platform deploy --env production --canary 10%

# Promote canary to full rollout
platform deploy promote

# Rollback
platform deploy rollback

# Manage databases
platform db create --type postgres --size medium
platform db connect my-api-db
platform db backup my-api-db

# Manage secrets
platform secrets set API_KEY=sk-abc123
platform secrets list
platform secrets rotate --all

# View logs
platform logs my-api --env production --since 1h

# Check service health
platform status my-api

Every command is a wrapper around Kubernetes, AWS, Terraform, etc. The developer never needs to learn those tools directly.

5. CI/CD Pipeline

A standardized pipeline that every service uses:

# .github/workflows/platform.yml
# Generated automatically by the platform template
name: Platform CI/CD

on:
  push:
    branches: [main]
  pull_request:

jobs:
  build:
    uses: org/platform-workflows/.github/workflows/build.yml@v2
    with:
      language: typescript
      node-version: 22

  test:
    needs: build
    uses: org/platform-workflows/.github/workflows/test.yml@v2
    with:
      language: typescript

  security:
    needs: build
    uses: org/platform-workflows/.github/workflows/security.yml@v2

  deploy-staging:
    needs: [test, security]
    if: github.ref == 'refs/heads/main'
    uses: org/platform-workflows/.github/workflows/deploy.yml@v2
    with:
      environment: staging

  deploy-production:
    needs: deploy-staging
    uses: org/platform-workflows/.github/workflows/deploy.yml@v2
    with:
      environment: production
      strategy: canary

Teams don't write CI/CD pipelines. They inherit them from the platform. When the platform team improves the pipeline (adds security scanning, speeds up builds), every team benefits automatically.

Measuring Platform Success

DORA Metrics

Metric	Before Platform	After Platform
Deployment frequency	Weekly	Multiple times daily
Lead time for changes	2 weeks	< 1 day
Change failure rate	15%	< 5%
Mean time to recovery	4 hours	< 30 minutes

Developer Satisfaction

Survey your developers regularly:

"How easy is it to deploy a new service?"
"How much time do you spend on infrastructure tasks?"
"Do you feel productive?"
"What's your biggest pain point?"

If developers are still fighting infrastructure, the platform isn't doing its job.

Adoption Rate

Track how many teams use platform features:

Service template usage:    85% of new services
Standard CI/CD pipeline:   92% of repos
Platform CLI daily users:  73% of developers
Self-service database:     68% of new databases

Low adoption means the platform is too complex or doesn't solve real problems. The best platform is one developers choose to use, not one they're forced to use.

Common Mistakes

1. Building Too Much, Too Early

Bad:  Build a complete platform for 18 months, launch it all at once
Good: Start with the biggest pain point, ship in 2 weeks, iterate

Start with what hurts most. Usually it's one of:

Deploying a service takes too long
Creating a new service is painful
Monitoring is inconsistent

Fix that first. Then expand.

2. Not Treating It as a Product

Your platform is an internal product. Your developers are your users. This means:

Talk to your users (developers) regularly
Prioritize based on their pain points
Write documentation
Provide support
Measure satisfaction

3. Forcing Adoption

Bad:  "All teams must migrate to the platform by Q3"
Good: "The platform makes deployment 10x faster — teams are
       migrating because they want to"

If you have to force adoption, your platform doesn't solve real problems.

4. Ignoring the Developer Experience

Bad:  platform deploy --cluster prod-us-east-1 --namespace payments \
      --image registry.internal/payment-api:sha-abc123 \
      --replicas 3 --strategy rolling --max-surge 1

Good: platform deploy

Sane defaults. Minimal configuration. Progressive disclosure — simple by default, powerful when needed.

Platform Team Structure

A typical platform team:

Platform Team (4-8 engineers)
├── Infrastructure (Kubernetes, cloud, networking)
├── Developer Experience (CLI, portal, templates)
├── CI/CD (pipelines, build systems)
└── Observability (monitoring, logging, alerting)

What the Platform Team Does NOT Do

Build product features
Own application code
Make product decisions
Deploy services for other teams (self-service!)

The platform team builds the tools. Product teams use the tools.

Getting Started

Week 1-2: Understand the Pain

# Interview 5-10 developers
# Ask:
# - What's your biggest infrastructure pain point?
# - How long does it take to deploy?
# - What do you wish was easier?
# - What do you spend time on that feels wasteful?

Week 3-4: Build the First Thing

Pick the highest-impact, lowest-effort improvement. Common starting points:

Option A: Standardized CI/CD pipeline (reusable workflows)
Option B: Service creation template (Backstage or Cookiecutter)
Option C: Developer CLI for common operations

Month 2-3: Expand

- Add monitoring to the golden path
- Create a service catalog
- Automate database provisioning
- Add security scanning to CI

Month 4+: Iterate

- Measure adoption and satisfaction
- Fix what's not working
- Add what's most requested
- Remove what nobody uses

Quick Reference

Component	Tool Options
Developer Portal	Backstage, Port, Cortex
CI/CD	GitHub Actions, GitLab CI, Dagger
Infrastructure	Terraform, Pulumi, Crossplane
Kubernetes	ArgoCD, Flux, Helm
Monitoring	Prometheus + Grafana, Datadog
Logging	Loki, ELK, Datadog
Secrets	Vault, AWS Secrets Manager, SOPS
Service Mesh	Istio, Linkerd, Cilium

Summary

Platform engineering is about removing friction:

Golden paths — Opinionated, supported ways to build and deploy
Self-service — Developers provision what they need without tickets
Abstractions — Hide infrastructure complexity behind simple interfaces
Consistency — Every service gets monitoring, security, and CI/CD by default
Measurement — Track DORA metrics, developer satisfaction, and adoption

The goal isn't to build the most sophisticated platform. It's to build the platform that makes your developers most productive. Start small, solve real pain, and iterate.