Scaling & Monitoring — Kubernetes Essentials | Sabaoon Academy

Running applications in production means handling variable traffic and catching problems before users notice them. Kubernetes provides built-in scaling mechanisms and integrates with powerful monitoring tools to keep your applications healthy.

Manual Scaling

The simplest way to scale is to adjust the replica count:

# Scale to 5 replicas
kubectl scale deployment web-app --replicas=5

# Scale down to 2
kubectl scale deployment web-app --replicas=2

# Check the current state
kubectl get deployment web-app

Or update the YAML:

spec:
  replicas: 5

kubectl apply -f deployment.yaml

Horizontal Pod Autoscaler (HPA)

HPA automatically adjusts the number of pod replicas based on CPU, memory, or custom metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

This scales between 2 and 10 replicas, adding pods when average CPU exceeds 70% or memory exceeds 80%.

Prerequisites: HPA requires the Metrics Server to be installed:

# Install Metrics Server (minikube)
minikube addons enable metrics-server

# Or install manually
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

HPA commands:

# Create HPA from command line
kubectl autoscale deployment web-app --min=2 --max=10 --cpu-percent=70

# Check HPA status
kubectl get hpa

# Watch HPA in real time
kubectl get hpa -w

# Describe HPA (shows scaling events)
kubectl describe hpa web-app-hpa

Load Testing with HPA

Test your autoscaler by generating load:

# Start the application
kubectl apply -f deployment.yaml
kubectl apply -f hpa.yaml

# Generate load from a temporary pod
kubectl run load-test --image=busybox --rm -it -- sh -c \
  "while true; do wget -q -O- http://web-app-service; done"

# In another terminal, watch the HPA scale up
kubectl get hpa -w
kubectl get pods -w

Vertical Pod Autoscaler (VPA)

VPA adjusts CPU and memory requests/limits for individual pods:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: web-app
        minAllowed:
          cpu: "50m"
          memory: "64Mi"
        maxAllowed:
          cpu: "2"
          memory: "2Gi"

VPA is useful when you are unsure how much CPU and memory your application needs. It observes actual usage and adjusts accordingly.

Resource Monitoring with kubectl

# View node resource usage
kubectl top nodes

# View pod resource usage
kubectl top pods

# View pods sorted by CPU
kubectl top pods --sort-by=cpu

# View pods sorted by memory
kubectl top pods --sort-by=memory

# View resource usage in a specific namespace
kubectl top pods -n production

Cluster Monitoring with Prometheus and Grafana

For production monitoring, deploy Prometheus (metrics collection) and Grafana (visualization):

# Install using Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

This installs:

Component	Purpose
Prometheus	Collects and stores metrics
Grafana	Dashboards and visualization
Alertmanager	Handles alerts and notifications
Node Exporter	Exports hardware/OS metrics
kube-state-metrics	Exports Kubernetes object metrics

Access Grafana:

kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80

# Default credentials:
# Username: admin
# Password: prom-operator

Key Metrics to Monitor

Metric	What It Tells You
CPU utilization per pod	Are pods overloaded?
Memory usage per pod	Memory leaks, OOM risk
Pod restart count	Crashes, stability issues
Request latency (p50, p95, p99)	User experience
Error rate (5xx responses)	Application health
Pod scheduling failures	Resource constraints
Node disk usage	Storage issues
Network I/O	Bandwidth bottlenecks

Logging

Kubernetes logs are available through kubectl logs, but for production you need centralized logging:

# View pod logs
kubectl logs web-app-abc123

# Follow logs
kubectl logs -f web-app-abc123

# View logs from all pods with a label
kubectl logs -l app=web-app --all-containers

# View logs from a previous (crashed) container
kubectl logs web-app-abc123 --previous

# View logs with timestamps
kubectl logs web-app-abc123 --timestamps

Centralized logging stack (EFK):

Component	Role
Elasticsearch	Stores and indexes logs
Fluentd/Fluent Bit	Collects logs from all pods
Kibana	Search and visualize logs

Fluent Bit runs as a DaemonSet (one pod per node), collecting logs from all containers:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      containers:
        - name: fluent-bit
          image: fluent/fluent-bit:latest
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: containers
              mountPath: /var/lib/docker/containers
              readOnly: true
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: containers
          hostPath:
            path: /var/lib/docker/containers

Alerting

Set up alerts in Prometheus to notify you before problems become outages:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-alerts
  namespace: monitoring
spec:
  groups:
    - name: app
      rules:
        - alert: HighErrorRate
          expr: |
            rate(http_requests_total{status=~"5.."}[5m])
            / rate(http_requests_total[5m]) > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High error rate on {{ $labels.service }}"
            description: "Error rate is above 5% for the last 5 minutes."

        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} is crash looping"

        - alert: HighMemoryUsage
          expr: |
            container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} memory usage above 90%"

Troubleshooting Common Issues

Pods stuck in Pending:

kubectl describe pod <pod-name>
# Look for: Insufficient cpu, Insufficient memory, or no matching node
# Fix: Add nodes, reduce resource requests, or remove pod affinity rules

Pods in CrashLoopBackOff:

kubectl logs <pod-name> --previous
# Look for: application errors, missing config, failed health checks
# Fix: Check environment variables, ConfigMaps, Secrets, and application code

Service not routing traffic:

kubectl get endpoints <service-name>
# If empty: labels don't match between Service selector and Pod labels
kubectl describe service <service-name>
kubectl get pods --show-labels

High resource usage:

kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory
# Fix: Adjust resource limits, optimize code, or scale out

Summary

Kubernetes scales applications horizontally (HPA) and vertically (VPA) based on real-time metrics. You learned how to set up autoscaling, monitor clusters with Prometheus and Grafana, centralize logs, configure alerts, and troubleshoot common issues. These operational skills are essential for running reliable production workloads.