Common terms in the AI/LLM world, alphabetically ordered and updated regularly.
Context Window
The maximum number of tokens a model can process in one request — input + output combined.
Example: Claude 3.5 = 200K tokens ≈ ~150 pages of a book.
Embedding
Converting text to a numeric vector to measure semantic similarity.
Used in: RAG, semantic search, clustering.
Fine-tuning
Training a base model further on your own data to adjust style, tone, or domain knowledge.
Best for: tasks requiring a specific voice or specialized knowledge not in pre-training.
Hallucination
When a model confidently generates factually incorrect information.
Mitigated by: RAG, grounding with real sources, prompting to say “I don’t know.”
Prompt Engineering
Designing inputs that elicit better outputs without fine-tuning.
Techniques: few-shot examples, chain-of-thought, role assignment, structured output.
RAG (Retrieval-Augmented Generation)
Fetching relevant documents from a knowledge base and inserting them into the prompt before generation.
Why it works: the model doesn’t need to memorize everything — it just reads and summarizes what’s retrieved.
Temperature
Controls output randomness.
0.0= deterministic, same answer every time1.0= creative, varied- Code/fact tasks: low | Creative tasks: higher
Token
The smallest unit a model processes — not a character, not a word.
Rough rule: 1 English word ≈ 1–2 tokens | 1 Thai word ≈ 2–4 tokens.