AI & LLM

Ollama Quick Reference — Run LLMs Locally

Commands you'll actually use with Ollama: pull, run, serve, model list, memory config — all in one place.

Nat ·
#ollama #local-ai #llm #linux #macos

Install

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS via Homebrew
brew install ollama

Common commands

ollama pull llama3.2          # download a model
ollama run llama3.2           # start interactive chat
ollama list                   # list downloaded models
ollama rm llama3.2            # remove a model
ollama ps                     # show currently loaded models

Serve the API (background)

ollama serve                  # starts API at localhost:11434
# Quick test
curl http://localhost:11434/api/generate \
  -d '{"model":"llama3.2","prompt":"Hello","stream":false}'
RAMModelSize
8 GBllama3.2:3b, qwen2.5:3b~2 GB
16 GBllama3.2:8b, qwen2.5:7b~5 GB
32 GB+llama3.1:70b (Q4)~40 GB
Anynomic-embed-text~274 MB

Useful environment variables

OLLAMA_HOST=0.0.0.0:11434    # expose to other machines
OLLAMA_NUM_PARALLEL=2         # concurrent requests
OLLAMA_MAX_LOADED_MODELS=1    # cap RAM usage

Python usage

import ollama

response = ollama.chat(
    model='llama3.2',
    messages=[{'role': 'user', 'content': 'Hello'}]
)
print(response['message']['content'])