How LLMs Actually Work — AI for Developers | Sabaoon Academy

You do not need a PhD to build useful products on top of language models. You do need a clear mental model. This lesson strips the magic away and replaces it with mechanics you can reason about when things go wrong.

Prediction machines, not databases

A large language model is trained to predict the next token (a piece of text — often a word or subword) given everything that came before. Repeat that millions of times and you get fluent text.

Implications:

The model does not "look up" your proprietary README in a secret vault unless that text was in its training data or you provide it in the prompt (or via retrieval — a later course).
It will happily sound authoritative about facts it only plausibly completes. Always verify externally for anything that matters.

Tokens and the context window

Input and output are measured in tokens, not characters. Roughly, 1 token ≈ 4 characters in English, but code and symbols can tokenize differently.

The context window is how much text the model can attend to at once — system prompt, your message, tools output, prior turns, and the reply. If you exceed it, older material is dropped or summarization is applied (depending on the product).

Practical rule: Keep critical instructions and schemas close to where they are used, and avoid stuffing giant logs unless you truly need them.

Temperature and sampling

When the API exposes temperature, you are controlling randomness:

Lower (e.g. 0–0.3): more deterministic, better for codegen and structured outputs.
Higher (e.g. 0.8–1.2): more creative, riskier for anything that must compile.

For production features that call models automatically, prefer lower temperatures and strict formats (JSON schema, function calling) over prose.

Pre-training vs fine-tuning vs RLHF

Pre-training teaches broad language and code patterns from huge corpora.
Fine-tuning adapts behavior toward a domain or format (expensive, needs data discipline).
RLHF / preference tuning shapes helpfulness and safety from human or model feedback — why different vendors feel different even at similar sizes.

You can usually ship a first version with good prompting plus retrieval before touching custom training.

Failure modes to expect

Hallucination: plausible false specifics — mitigate with citations, tool calls, and tests.
Instruction drift: long chats forget early rules — restate constraints or use short sessions.
Jailbreak-style misuse: treat user input as untrusted if your app forwards it to a model that can act on systems.

Key takeaways

LLMs complete text; they do not magically know your private repo without you supplying context.
Token budgets are real — design prompts and architecture around the window.
Tune temperature and format to the task; reserve high creativity for brainstorming only.