Every company reaches a point where teams spend more time fighting infrastructure than building features. Platform engineering fixes this by building an internal developer platform — a paved road that makes the right way the easy way.
The Problem Platform Engineering Solves
Without a platform team, every product team independently figures out:
Team A: "How do we deploy to Kubernetes?"
Team B: "How do we deploy to Kubernetes?"
Team C: "How do we deploy to Kubernetes?"
Team D: "How do we deploy to Kubernetes?"
Result: 4 different deployment pipelines, 4 different patterns,
4 times the maintenance burdenEach team builds their own:
- CI/CD pipelines
- Monitoring setup
- Secret management
- Database provisioning
- Service mesh configuration
- Logging infrastructure
This is cognitive load — the mental overhead of managing infrastructure that has nothing to do with your product.
What Is an Internal Developer Platform?
An IDP is a self-service layer on top of your infrastructure. Developers interact with the platform, not directly with Kubernetes, Terraform, or AWS.
┌─────────────────────────────────────────┐
│ Developer Experience │
│ (CLI, Portal, API, Templates) │
├─────────────────────────────────────────┤
│ Platform Services │
│ (Deploy, Monitor, Scale, Secure) │
├─────────────────────────────────────────┤
│ Infrastructure │
│ (Kubernetes, AWS, Terraform) │
└─────────────────────────────────────────┘Developers see the top layer. The platform team manages everything below.
Before vs After
| Task | Without Platform | With Platform |
|---|---|---|
| Deploy a service | Write Dockerfile, Helm chart, CI pipeline, ingress rules | platform deploy |
| Create a database | File a ticket, wait 3 days, get credentials | platform db create --type postgres |
| Add monitoring | Learn Prometheus, write dashboards, configure alerts | Automatic — comes with every service |
| Rotate secrets | Manual process, SSH into servers | platform secrets rotate |
| Spin up a new service | 2-3 days of boilerplate | platform service create my-api |
The Golden Path
A golden path is an opinionated, supported way to do something. It's not the only way — it's the recommended way.
# golden-path/service-template/platform.yaml
kind: Service
metadata:
name: payment-api
team: payments
tier: critical
spec:
language: typescript
framework: fastify
runtime:
replicas: 3
cpu: "500m"
memory: "512Mi"
database:
type: postgres
size: small
monitoring:
alerts: true
dashboard: true
slo: 99.9
deployment:
strategy: rolling
canary: true
rollback: automaticFrom this single file, the platform provisions:
- A Kubernetes deployment with 3 replicas
- A PostgreSQL database with backups
- Prometheus metrics and Grafana dashboards
- Alert rules based on SLOs
- A CI/CD pipeline with canary deploys
- TLS certificates and ingress
- Structured logging shipped to your log aggregator
The developer writes one file. The platform handles the rest.
Building Blocks
1. Service Catalog (Backstage)
Backstage (by Spotify, now CNCF) is the standard for internal developer portals:
// catalog-info.yaml — registered in Backstage
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payment-api
description: Handles payment processing
annotations:
github.com/project-slug: org/payment-api
pagerduty.com/service-id: P12345
grafana/dashboard-selector: payment-api
tags:
- typescript
- fastify
- payments
spec:
type: service
lifecycle: production
owner: team-payments
dependsOn:
- resource:postgres-payments
- component:user-apiBackstage gives you:
- A searchable catalog of all services, APIs, and infrastructure
- Ownership tracking (who owns what?)
- Tech docs alongside the service
- Software templates for creating new services
- Plugin ecosystem (PagerDuty, GitHub, Kubernetes, etc.)
2. Software Templates
Instead of copying boilerplate from an old project, use templates:
# backstage-template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: typescript-service
title: TypeScript Microservice
description: Create a new TypeScript service with all platform integrations
spec:
parameters:
- title: Service Details
properties:
name:
type: string
description: Service name
team:
type: string
description: Owning team
database:
type: string
enum: [none, postgres, redis]
default: none
steps:
- id: scaffold
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
team: ${{ parameters.team }}
- id: create-repo
action: publish:github
input:
repoUrl: github.com?owner=org&repo=${{ parameters.name }}
- id: register
action: catalog:register
input:
repoContentsUrl: ${{ steps.create-repo.output.repoContentsUrl }}A developer clicks "Create Service" in Backstage, fills in 3 fields, and gets:
- A GitHub repo with boilerplate code
- CI/CD pipeline configured
- Kubernetes manifests generated
- Monitoring pre-configured
- Service registered in the catalog
Time to first deploy: 10 minutes instead of 2 days.
3. Infrastructure as Code
The platform team abstracts infrastructure behind simple interfaces:
# What the platform team manages (Terraform)
module "service" {
source = "./modules/platform-service"
name = var.service_name
team = var.team
tier = var.tier
replicas = var.replicas
database = var.database_config
monitoring = var.monitoring_config
}
# This module internally handles:
# - EKS namespace
# - IAM roles
# - RDS instance
# - Security groups
# - DNS records
# - Certificate
# - Prometheus ServiceMonitor
# - Grafana dashboard
# - PagerDuty integrationDevelopers never see Terraform. They interact with the platform abstraction.
4. Developer CLI
A CLI that wraps platform operations:
# Create a new service
platform create service my-api --lang typescript --db postgres
# Deploy
platform deploy --env staging
platform deploy --env production --canary 10%
# Promote canary to full rollout
platform deploy promote
# Rollback
platform deploy rollback
# Manage databases
platform db create --type postgres --size medium
platform db connect my-api-db
platform db backup my-api-db
# Manage secrets
platform secrets set API_KEY=sk-abc123
platform secrets list
platform secrets rotate --all
# View logs
platform logs my-api --env production --since 1h
# Check service health
platform status my-apiEvery command is a wrapper around Kubernetes, AWS, Terraform, etc. The developer never needs to learn those tools directly.
5. CI/CD Pipeline
A standardized pipeline that every service uses:
# .github/workflows/platform.yml
# Generated automatically by the platform template
name: Platform CI/CD
on:
push:
branches: [main]
pull_request:
jobs:
build:
uses: org/platform-workflows/.github/workflows/build.yml@v2
with:
language: typescript
node-version: 22
test:
needs: build
uses: org/platform-workflows/.github/workflows/test.yml@v2
with:
language: typescript
security:
needs: build
uses: org/platform-workflows/.github/workflows/security.yml@v2
deploy-staging:
needs: [test, security]
if: github.ref == 'refs/heads/main'
uses: org/platform-workflows/.github/workflows/deploy.yml@v2
with:
environment: staging
deploy-production:
needs: deploy-staging
uses: org/platform-workflows/.github/workflows/deploy.yml@v2
with:
environment: production
strategy: canaryTeams don't write CI/CD pipelines. They inherit them from the platform. When the platform team improves the pipeline (adds security scanning, speeds up builds), every team benefits automatically.
Measuring Platform Success
DORA Metrics
| Metric | Before Platform | After Platform |
|---|---|---|
| Deployment frequency | Weekly | Multiple times daily |
| Lead time for changes | 2 weeks | < 1 day |
| Change failure rate | 15% | < 5% |
| Mean time to recovery | 4 hours | < 30 minutes |
Developer Satisfaction
Survey your developers regularly:
"How easy is it to deploy a new service?"
"How much time do you spend on infrastructure tasks?"
"Do you feel productive?"
"What's your biggest pain point?"If developers are still fighting infrastructure, the platform isn't doing its job.
Adoption Rate
Track how many teams use platform features:
Service template usage: 85% of new services
Standard CI/CD pipeline: 92% of repos
Platform CLI daily users: 73% of developers
Self-service database: 68% of new databasesLow adoption means the platform is too complex or doesn't solve real problems. The best platform is one developers choose to use, not one they're forced to use.
Common Mistakes
1. Building Too Much, Too Early
Bad: Build a complete platform for 18 months, launch it all at once
Good: Start with the biggest pain point, ship in 2 weeks, iterateStart with what hurts most. Usually it's one of:
- Deploying a service takes too long
- Creating a new service is painful
- Monitoring is inconsistent
Fix that first. Then expand.
2. Not Treating It as a Product
Your platform is an internal product. Your developers are your users. This means:
- Talk to your users (developers) regularly
- Prioritize based on their pain points
- Write documentation
- Provide support
- Measure satisfaction
3. Forcing Adoption
Bad: "All teams must migrate to the platform by Q3"
Good: "The platform makes deployment 10x faster — teams are
migrating because they want to"If you have to force adoption, your platform doesn't solve real problems.
4. Ignoring the Developer Experience
Bad: platform deploy --cluster prod-us-east-1 --namespace payments \
--image registry.internal/payment-api:sha-abc123 \
--replicas 3 --strategy rolling --max-surge 1
Good: platform deploySane defaults. Minimal configuration. Progressive disclosure — simple by default, powerful when needed.
Platform Team Structure
A typical platform team:
Platform Team (4-8 engineers)
├── Infrastructure (Kubernetes, cloud, networking)
├── Developer Experience (CLI, portal, templates)
├── CI/CD (pipelines, build systems)
└── Observability (monitoring, logging, alerting)What the Platform Team Does NOT Do
- Build product features
- Own application code
- Make product decisions
- Deploy services for other teams (self-service!)
The platform team builds the tools. Product teams use the tools.
Getting Started
Week 1-2: Understand the Pain
# Interview 5-10 developers
# Ask:
# - What's your biggest infrastructure pain point?
# - How long does it take to deploy?
# - What do you wish was easier?
# - What do you spend time on that feels wasteful?
Week 3-4: Build the First Thing
Pick the highest-impact, lowest-effort improvement. Common starting points:
Option A: Standardized CI/CD pipeline (reusable workflows)
Option B: Service creation template (Backstage or Cookiecutter)
Option C: Developer CLI for common operationsMonth 2-3: Expand
- Add monitoring to the golden path
- Create a service catalog
- Automate database provisioning
- Add security scanning to CIMonth 4+: Iterate
- Measure adoption and satisfaction
- Fix what's not working
- Add what's most requested
- Remove what nobody usesQuick Reference
| Component | Tool Options |
|---|---|
| Developer Portal | Backstage, Port, Cortex |
| CI/CD | GitHub Actions, GitLab CI, Dagger |
| Infrastructure | Terraform, Pulumi, Crossplane |
| Kubernetes | ArgoCD, Flux, Helm |
| Monitoring | Prometheus + Grafana, Datadog |
| Logging | Loki, ELK, Datadog |
| Secrets | Vault, AWS Secrets Manager, SOPS |
| Service Mesh | Istio, Linkerd, Cilium |
Summary
Platform engineering is about removing friction:
- Golden paths — Opinionated, supported ways to build and deploy
- Self-service — Developers provision what they need without tickets
- Abstractions — Hide infrastructure complexity behind simple interfaces
- Consistency — Every service gets monitoring, security, and CI/CD by default
- Measurement — Track DORA metrics, developer satisfaction, and adoption
The goal isn't to build the most sophisticated platform. It's to build the platform that makes your developers most productive. Start small, solve real pain, and iterate.