How Search Engines Work — SEO Fundamentals | Sabaoon Academy

Before you can optimize for search engines, you need to understand what they do. Search engines have three jobs: discover pages, understand them, and rank them.

The Three Stages

1. Crawling    → Discover pages by following links
2. Indexing    → Read and store page content
3. Ranking     → Decide which pages answer a query

Crawling

Google uses automated programs called crawlers (or Googlebot) that follow links from page to page across the web.

When Googlebot visits your site, it:

Fetches your page's HTML
Finds all the links on that page
Adds those links to a queue
Visits the next URL in the queue
Repeats billions of times per day

Crawl budget is how many pages Google will crawl on your site in a given timeframe. For most small-to-medium sites, crawl budget isn't a concern. But for large sites (100k+ pages), you need to make sure Google spends its budget on your important pages.

Things that waste crawl budget:

Problem	Example
Duplicate pages	Same content at multiple URLs
Infinite URL parameters	`/products?sort=price&page=1&color=red&...`
Redirect chains	A → B → C → D (fix to A → D)
Broken links (404s)	Links to pages that don't exist
Blocked resources	CSS/JS blocked by robots.txt

Indexing

After crawling, Google processes the page:

Renders the page — executes JavaScript to see the final HTML
Extracts content — reads text, headings, images, links
Understands meaning — determines what the page is about
Stores it — adds the page to Google's index (a massive database)

Not every crawled page gets indexed. Google may skip pages that are:

Too similar to other pages (duplicate content)
Too thin (very little useful content)
Blocked by noindex tags
Low quality or spammy

Ranking

When someone searches, Google:

Finds all indexed pages that match the query
Scores each page on hundreds of ranking factors
Returns results in order of relevance and quality

The main ranking factors:

Factor	What It Means
Relevance	Does the content match the search query?
Quality	Is the content comprehensive, accurate, and useful?
Authority	Do other reputable sites link to this page?
User experience	Is the page fast, mobile-friendly, and easy to use?
Freshness	Is the content up to date?

How Google Finds Your Pages

Google discovers pages through:

Links from other sites — the primary discovery method
Your sitemap — an XML file listing all your pages
Google Search Console — you can manually submit URLs
Internal links — links between pages on your own site

Sitemaps

A sitemap tells Google about all the pages on your site:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-03-20</lastmod>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/blog/my-post</loc>
    <lastmod>2026-03-18</lastmod>
    <priority>0.7</priority>
  </url>
</urlset>

Submit your sitemap in Google Search Console at Sitemaps > Add a new sitemap.

robots.txt

The robots.txt file tells crawlers which pages they can and cannot access:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

Sitemap: https://example.com/sitemap.xml

Allow — pages crawlers can visit
Disallow — pages crawlers should skip
This is a suggestion, not enforcement — well-behaved bots follow it, malicious ones ignore it

Google Search Console

Google Search Console (GSC) is your direct line to Google. It shows you:

Which pages are indexed (and which aren't)
What queries bring traffic to your site
Crawl errors and issues
Mobile usability problems
Core Web Vitals scores

Every website owner should set up GSC. It's free and essential for understanding how Google sees your site.

Key Reports

Report	What It Shows
Performance	Clicks, impressions, CTR, and average position
Coverage	Which pages are indexed, excluded, or errored
Sitemaps	Sitemap submission status
URL Inspection	How Google sees a specific URL
Core Web Vitals	Page speed and experience scores

Summary

Search engines crawl (discover), index (understand), and rank (order) pages
Googlebot follows links to discover new pages — internal linking matters
Not every crawled page gets indexed — content quality and uniqueness matter
Ranking depends on relevance, quality, authority, and user experience
Use Google Search Console to monitor how Google sees your site
Submit a sitemap to help Google find all your pages