Interactive · Guide

LLM ทำงานยังไง?

4 ขั้นตอน ตั้งแต่ text ดิบไปถึง output — คลิกและลองเล่นได้เลย

Tokenization

แปลง text ให้เป็น "token" ชิ้นเล็กๆ พร้อม ID ตัวเลข

Model ไม่ได้อ่าน text ทีละตัวอักษร — แต่แบ่งเป็น token ซึ่งอาจเป็นคำ, ส่วนของคำ, หรือเครื่องหมายวรรคตอน แต่ละ token จะได้รับ ID ตัวเลข เพื่อ lookup ใน embedding table ต่อไป

ตัวอย่าง · กด token เพื่อดู ID

ตัวเลขข้างล่าง = token ID ใน vocabulary

💡

"tokenization" แตกเป็น token + ization — เทคนิคนี้ทำให้ model รับมือกับคำใหม่ที่ไม่เคยเห็นได้ด้วย เช่น "ChatGPT" ถูกแบ่งเป็น subword ที่รู้จักอยู่แล้ว

Embeddings

แปลง token ID เป็น vector ตัวเลขหลายมิติ

แต่ละ token ID จะถูก lookup จาก embedding table ได้ vector ตัวเลขหลายร้อยมิติ คำที่ความหมายใกล้เคียงกันจะมี vector "อยู่ใกล้" กันในพื้นที่นี้ — ทำให้ model "รู้" ว่า king กับ queen มีความสัมพันธ์กัน

vector space (2D projection) · hover เพื่อดูข้อมูล

💡

Vector arithmetic ใช้ได้จริง: king − man + woman ≈ queen — เส้นประในรูปแสดงความสัมพันธ์นี้ model เรียนรู้เองจากข้อมูลโดยไม่ได้ถูก hard-code

Self-Attention

แต่ละ token "มองหา" token อื่นที่เกี่ยวข้อง

Attention ช่วยให้ model เข้าใจ บริบท — เมื่อประมวลผล token หนึ่ง model จะ "มองไปยัง" token อื่นๆ น้ำหนัก attention สูง = token นั้นมีความสำคัญกับการ predict token ปัจจุบัน

กด token เพื่อดู attention weights

น้ำหนักต่ำ

น้ำหนักสูง

💡

Transformer มี หลาย attention head พร้อมกัน — บางตัวจับ syntax, บางตัวจับ coreference (สรรพนาม → นาม), บางตัวจับ semantic similarity ผลลัพธ์ถูก concatenate และส่งต่อไป layer ถัดไป

Token Generation

เลือก token ถัดไปจาก probability distribution

หลัง attention ทุก layer, model สร้าง probability distribution ครอบคลุม vocabulary ทั้งหมด (~50,000 token) และเลือกตัวที่จะ output ผ่านพารามิเตอร์ Temperature

context

The weather today is

Temperature: 1.0

🥶 deterministic (0.1) 🔥 random (2.0)

0.1 · deterministic 2.0 · random

top 5 candidate tokens

💡

Temperature ต่ำ (0.1) → เลือกตัวที่ probability สูงสุดเสมอ เหมาะกับงาน code, fact
Temperature สูง (1.5+) → กระจาย probability ทั่วขึ้น เหมาะกับ creative writing, brainstorm