Dockerfiles — Docker Fundamentals | Sabaoon Academy

A Dockerfile is a text file containing instructions for building a Docker image. Each instruction creates a layer in the final image. Writing efficient Dockerfiles is a core Docker skill.

Basic Dockerfile Structure

Here is a Dockerfile for a Node.js application:

# Use an official base image
FROM node:20-alpine

# Set the working directory inside the container
WORKDIR /app

# Copy dependency files first (for better caching)
COPY package.json pnpm-lock.yaml ./

# Install dependencies
RUN npm install -g pnpm && pnpm install --frozen-lockfile

# Copy the rest of the application
COPY . .

# Expose the port the app runs on
EXPOSE 3000

# Define the command to run the app
CMD ["node", "server.js"]

Build and run it:

docker build -t my-node-app .
docker run -d -p 3000:3000 my-node-app

Dockerfile Instructions Reference

Instruction	Purpose
`FROM`	Set the base image
`WORKDIR`	Set the working directory
`COPY`	Copy files from host to image
`ADD`	Copy files (also handles URLs and tar extraction)
`RUN`	Execute a command during build
`CMD`	Default command when container starts
`ENTRYPOINT`	Fixed command that always runs
`EXPOSE`	Document which ports the app uses
`ENV`	Set environment variables
`ARG`	Build-time variables
`VOLUME`	Define mount points for persistent data
`USER`	Set the user for subsequent commands
`LABEL`	Add metadata to the image

FROM — Choosing a Base Image

Every Dockerfile starts with FROM. Choose the smallest base image that meets your needs:

# Full Debian-based (large, ~350MB)
FROM node:20

# Slim variant (smaller, ~80MB)
FROM node:20-slim

# Alpine-based (smallest, ~50MB)
FROM node:20-alpine

# Start from scratch (for compiled binaries)
FROM scratch

Alpine images are popular because they are tiny, but they use musl instead of glibc, which can cause compatibility issues with some packages.

COPY vs ADD

Use COPY for straightforward file copying. Use ADD only when you need its extra features:

# Copy a single file
COPY server.js /app/server.js

# Copy a directory
COPY src/ /app/src/

# ADD can extract tar archives automatically
ADD archive.tar.gz /app/

# ADD can fetch URLs (but curl in RUN is preferred)
ADD https://example.com/file.txt /app/

RUN — Executing Build Commands

Combine related commands into a single RUN instruction to reduce layers:

# Bad — creates 3 separate layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean

# Good — single layer, includes cleanup
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

CMD vs ENTRYPOINT

CMD sets the default command, which can be overridden:

CMD ["node", "server.js"]

# Runs: node server.js
docker run my-app

# Overrides CMD: runs bash instead
docker run my-app bash

ENTRYPOINT sets a fixed command. Arguments are appended:

ENTRYPOINT ["node"]
CMD ["server.js"]

# Runs: node server.js
docker run my-app

# Runs: node repl.js
docker run my-app repl.js

Always use the exec form (JSON array) instead of the shell form for proper signal handling:

# Good — exec form (PID 1, receives signals correctly)
CMD ["node", "server.js"]

# Avoid — shell form (wraps in /bin/sh -c, signal issues)
CMD node server.js

ENV and ARG

ENV sets variables available at build time and runtime:

ENV NODE_ENV=production
ENV PORT=3000

ARG sets variables available only at build time:

ARG NODE_VERSION=20
FROM node:${NODE_VERSION}-alpine

ARG BUILD_DATE
LABEL build-date=$BUILD_DATE

Pass build args with --build-arg:

docker build --build-arg BUILD_DATE=$(date -u +%Y-%m-%d) -t my-app .

Layer Caching and Build Optimization

Docker caches each layer. If a layer has not changed, Docker reuses the cache. The order of instructions matters:

FROM node:20-alpine
WORKDIR /app

# Step 1: Copy dependency files (changes rarely)
COPY package.json pnpm-lock.yaml ./

# Step 2: Install dependencies (cached unless package.json changed)
RUN npm install -g pnpm && pnpm install --frozen-lockfile

# Step 3: Copy application code (changes frequently)
COPY . .

# Step 4: Build
RUN pnpm build

CMD ["node", "dist/server.js"]

If you copy everything first and then install dependencies, every code change would invalidate the dependency cache.

The .dockerignore File

Create a .dockerignore file to exclude files from the build context:

node_modules
.git
.env
*.log
dist
.next
coverage
.DS_Store
Dockerfile
docker-compose.yml
README.md

This speeds up builds and prevents sensitive files from ending up in the image.

Multi-Stage Builds

Multi-stage builds let you use multiple FROM instructions to create smaller final images:

# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json pnpm-lock.yaml ./
RUN npm install -g pnpm && pnpm install --frozen-lockfile
COPY . .
RUN pnpm build

# Stage 2: Production
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production

# Copy only what we need from the builder
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

EXPOSE 3000
CMD ["node", "dist/server.js"]

The final image only contains the production files — no source code, no dev dependencies, no build tools.

Security Best Practices

Run your application as a non-root user:

FROM node:20-alpine

# Create a non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

WORKDIR /app
COPY --chown=appuser:appgroup . .
RUN npm install --production

# Switch to non-root user
USER appuser

EXPOSE 3000
CMD ["node", "server.js"]

Practical Example: Python Flask App

FROM python:3.12-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Create non-root user
RUN useradd -m appuser
USER appuser

EXPOSE 5000

CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

Build, tag, and run:

docker build -t flask-app:1.0 .
docker run -d -p 5000:5000 flask-app:1.0
curl http://localhost:5000

Build Commands

# Build with a tag
docker build -t my-app:v1 .

# Build from a specific Dockerfile
docker build -f Dockerfile.prod -t my-app:prod .

# Build with no cache (force rebuild all layers)
docker build --no-cache -t my-app .

# Build with build arguments
docker build --build-arg NODE_ENV=production -t my-app .

# Show build output (BuildKit)
docker build --progress=plain -t my-app .

Summary

Dockerfiles define how images are built, layer by layer. You learned the key instructions, how layer caching works, how to optimize builds with multi-stage patterns, and security best practices. In the next lesson, you will learn about volumes for persistent data and networks for container communication.