Skip to main content
SEO Fundamentals·Lesson 1 of 5

How Search Engines Work

Before you can optimize for search engines, you need to understand what they do. Search engines have three jobs: discover pages, understand them, and rank them.

The Three Stages

1. Crawling     Discover pages by following links
2. Indexing     Read and store page content
3. Ranking      Decide which pages answer a query

Crawling

Google uses automated programs called crawlers (or Googlebot) that follow links from page to page across the web.

When Googlebot visits your site, it:

  1. Fetches your page's HTML
  2. Finds all the links on that page
  3. Adds those links to a queue
  4. Visits the next URL in the queue
  5. Repeats billions of times per day

Crawl budget is how many pages Google will crawl on your site in a given timeframe. For most small-to-medium sites, crawl budget isn't a concern. But for large sites (100k+ pages), you need to make sure Google spends its budget on your important pages.

Things that waste crawl budget:

ProblemExample
Duplicate pagesSame content at multiple URLs
Infinite URL parameters/products?sort=price&page=1&color=red&...
Redirect chainsA → B → C → D (fix to A → D)
Broken links (404s)Links to pages that don't exist
Blocked resourcesCSS/JS blocked by robots.txt

Indexing

After crawling, Google processes the page:

  1. Renders the page — executes JavaScript to see the final HTML
  2. Extracts content — reads text, headings, images, links
  3. Understands meaning — determines what the page is about
  4. Stores it — adds the page to Google's index (a massive database)

Not every crawled page gets indexed. Google may skip pages that are:

  • Too similar to other pages (duplicate content)
  • Too thin (very little useful content)
  • Blocked by noindex tags
  • Low quality or spammy

Ranking

When someone searches, Google:

  1. Finds all indexed pages that match the query
  2. Scores each page on hundreds of ranking factors
  3. Returns results in order of relevance and quality

The main ranking factors:

FactorWhat It Means
RelevanceDoes the content match the search query?
QualityIs the content comprehensive, accurate, and useful?
AuthorityDo other reputable sites link to this page?
User experienceIs the page fast, mobile-friendly, and easy to use?
FreshnessIs the content up to date?

How Google Finds Your Pages

Google discovers pages through:

  1. Links from other sites — the primary discovery method
  2. Your sitemap — an XML file listing all your pages
  3. Google Search Console — you can manually submit URLs
  4. Internal links — links between pages on your own site

Sitemaps

A sitemap tells Google about all the pages on your site:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-03-20</lastmod>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/blog/my-post</loc>
    <lastmod>2026-03-18</lastmod>
    <priority>0.7</priority>
  </url>
</urlset>

Submit your sitemap in Google Search Console at Sitemaps > Add a new sitemap.

robots.txt

The robots.txt file tells crawlers which pages they can and cannot access:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://example.com/sitemap.xml
  • Allow — pages crawlers can visit
  • Disallow — pages crawlers should skip
  • This is a suggestion, not enforcement — well-behaved bots follow it, malicious ones ignore it

Google Search Console

Google Search Console (GSC) is your direct line to Google. It shows you:

  • Which pages are indexed (and which aren't)
  • What queries bring traffic to your site
  • Crawl errors and issues
  • Mobile usability problems
  • Core Web Vitals scores

Every website owner should set up GSC. It's free and essential for understanding how Google sees your site.

Key Reports

ReportWhat It Shows
PerformanceClicks, impressions, CTR, and average position
CoverageWhich pages are indexed, excluded, or errored
SitemapsSitemap submission status
URL InspectionHow Google sees a specific URL
Core Web VitalsPage speed and experience scores

Summary

  • Search engines crawl (discover), index (understand), and rank (order) pages
  • Googlebot follows links to discover new pages — internal linking matters
  • Not every crawled page gets indexed — content quality and uniqueness matter
  • Ranking depends on relevance, quality, authority, and user experience
  • Use Google Search Console to monitor how Google sees your site
  • Submit a sitemap to help Google find all your pages