Install
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS via Homebrew
brew install ollama
Common commands
ollama pull llama3.2 # download a model
ollama run llama3.2 # start interactive chat
ollama list # list downloaded models
ollama rm llama3.2 # remove a model
ollama ps # show currently loaded models
Serve the API (background)
ollama serve # starts API at localhost:11434
# Quick test
curl http://localhost:11434/api/generate \
-d '{"model":"llama3.2","prompt":"Hello","stream":false}'
Recommended models by RAM
| RAM | Model | Size |
|---|
| 8 GB | llama3.2:3b, qwen2.5:3b | ~2 GB |
| 16 GB | llama3.2:8b, qwen2.5:7b | ~5 GB |
| 32 GB+ | llama3.1:70b (Q4) | ~40 GB |
| Any | nomic-embed-text | ~274 MB |
Useful environment variables
OLLAMA_HOST=0.0.0.0:11434 # expose to other machines
OLLAMA_NUM_PARALLEL=2 # concurrent requests
OLLAMA_MAX_LOADED_MODELS=1 # cap RAM usage
Python usage
import ollama
response = ollama.chat(
model='llama3.2',
messages=[{'role': 'user', 'content': 'Hello'}]
)
print(response['message']['content'])