Running applications in production means handling variable traffic and catching problems before users notice them. Kubernetes provides built-in scaling mechanisms and integrates with powerful monitoring tools to keep your applications healthy.
Manual Scaling
The simplest way to scale is to adjust the replica count:
# Scale to 5 replicas
kubectl scale deployment web-app --replicas=5
# Scale down to 2
kubectl scale deployment web-app --replicas=2
# Check the current state
kubectl get deployment web-appOr update the YAML:
spec:
replicas: 5kubectl apply -f deployment.yamlHorizontal Pod Autoscaler (HPA)
HPA automatically adjusts the number of pod replicas based on CPU, memory, or custom metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80This scales between 2 and 10 replicas, adding pods when average CPU exceeds 70% or memory exceeds 80%.
Prerequisites: HPA requires the Metrics Server to be installed:
# Install Metrics Server (minikube)
minikube addons enable metrics-server
# Or install manually
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
HPA commands:
# Create HPA from command line
kubectl autoscale deployment web-app --min=2 --max=10 --cpu-percent=70
# Check HPA status
kubectl get hpa
# Watch HPA in real time
kubectl get hpa -w
# Describe HPA (shows scaling events)
kubectl describe hpa web-app-hpaLoad Testing with HPA
Test your autoscaler by generating load:
# Start the application
kubectl apply -f deployment.yaml
kubectl apply -f hpa.yaml
# Generate load from a temporary pod
kubectl run load-test --image=busybox --rm -it -- sh -c \
"while true; do wget -q -O- http://web-app-service; done"
# In another terminal, watch the HPA scale up
kubectl get hpa -w
kubectl get pods -wVertical Pod Autoscaler (VPA)
VPA adjusts CPU and memory requests/limits for individual pods:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: web-app
minAllowed:
cpu: "50m"
memory: "64Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"VPA is useful when you are unsure how much CPU and memory your application needs. It observes actual usage and adjusts accordingly.
Resource Monitoring with kubectl
# View node resource usage
kubectl top nodes
# View pod resource usage
kubectl top pods
# View pods sorted by CPU
kubectl top pods --sort-by=cpu
# View pods sorted by memory
kubectl top pods --sort-by=memory
# View resource usage in a specific namespace
kubectl top pods -n productionCluster Monitoring with Prometheus and Grafana
For production monitoring, deploy Prometheus (metrics collection) and Grafana (visualization):
# Install using Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespaceThis installs:
| Component | Purpose |
|---|---|
| Prometheus | Collects and stores metrics |
| Grafana | Dashboards and visualization |
| Alertmanager | Handles alerts and notifications |
| Node Exporter | Exports hardware/OS metrics |
| kube-state-metrics | Exports Kubernetes object metrics |
Access Grafana:
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
# Default credentials:
# Username: admin
# Password: prom-operatorKey Metrics to Monitor
| Metric | What It Tells You |
|---|---|
| CPU utilization per pod | Are pods overloaded? |
| Memory usage per pod | Memory leaks, OOM risk |
| Pod restart count | Crashes, stability issues |
| Request latency (p50, p95, p99) | User experience |
| Error rate (5xx responses) | Application health |
| Pod scheduling failures | Resource constraints |
| Node disk usage | Storage issues |
| Network I/O | Bandwidth bottlenecks |
Logging
Kubernetes logs are available through kubectl logs, but for production you need centralized logging:
# View pod logs
kubectl logs web-app-abc123
# Follow logs
kubectl logs -f web-app-abc123
# View logs from all pods with a label
kubectl logs -l app=web-app --all-containers
# View logs from a previous (crashed) container
kubectl logs web-app-abc123 --previous
# View logs with timestamps
kubectl logs web-app-abc123 --timestampsCentralized logging stack (EFK):
| Component | Role |
|---|---|
| Elasticsearch | Stores and indexes logs |
| Fluentd/Fluent Bit | Collects logs from all pods |
| Kibana | Search and visualize logs |
Fluent Bit runs as a DaemonSet (one pod per node), collecting logs from all containers:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:latest
volumeMounts:
- name: varlog
mountPath: /var/log
- name: containers
mountPath: /var/lib/docker/containers
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
- name: containers
hostPath:
path: /var/lib/docker/containersAlerting
Set up alerts in Prometheus to notify you before problems become outages:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-alerts
namespace: monitoring
spec:
groups:
- name: app
rules:
- alert: HighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate on {{ $labels.service }}"
description: "Error rate is above 5% for the last 5 minutes."
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is crash looping"
- alert: HighMemoryUsage
expr: |
container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} memory usage above 90%"Troubleshooting Common Issues
Pods stuck in Pending:
kubectl describe pod <pod-name>
# Look for: Insufficient cpu, Insufficient memory, or no matching node
# Fix: Add nodes, reduce resource requests, or remove pod affinity rules
Pods in CrashLoopBackOff:
kubectl logs <pod-name> --previous
# Look for: application errors, missing config, failed health checks
# Fix: Check environment variables, ConfigMaps, Secrets, and application code
Service not routing traffic:
kubectl get endpoints <service-name>
# If empty: labels don't match between Service selector and Pod labels
kubectl describe service <service-name>
kubectl get pods --show-labels
High resource usage:
kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory
# Fix: Adjust resource limits, optimize code, or scale outSummary
Kubernetes scales applications horizontally (HPA) and vertically (VPA) based on real-time metrics. You learned how to set up autoscaling, monitor clusters with Prometheus and Grafana, centralize logs, configure alerts, and troubleshoot common issues. These operational skills are essential for running reliable production workloads.