If you want a fast, private, and cost-effective AI assistant without sending data to third parties, you can self-host one with Ollama and Open WebUI. Ollama runs large language models locally, while Open WebUI gives you a friendly chat interface with features like chat history, prompt templates, and model management. This guide shows how to deploy both using Docker, with optional GPU acceleration for NVIDIA or AMD.
Why this stack
Ollama simplifies running modern models such as Llama 3.1, Mistral, Phi, and more with a single command. Open WebUI connects to Ollama and adds a browser-based chat app, multiple users, and extras like RAG, files, and tools. Docker keeps everything consistent, easy to update, and portable across servers and clouds.
Prerequisites
- A 64-bit Linux host (Ubuntu 22.04/24.04 recommended), macOS, or Windows with WSL2. For production, a Linux VM or server is ideal.
- Docker Engine 24+ and Docker Compose plugin.
- 16 GB RAM minimum (24–32 GB recommended for 8B models; bigger models need more).
- 25–50 GB free disk space per model.
- Optional GPU:
• NVIDIA: recent driver + nvidia-container-toolkit.
• AMD: ROCm-capable GPU and kernel/drivers.
Step 1 — Install Docker and (optional) drivers
On Ubuntu, install Docker quickly:
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version
If you have an NVIDIA GPU, install drivers and container toolkit, then restart Docker:
sudo apt update
sudo apt install -y nvidia-driver-535
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
nvidia-smi
Step 2 — Create a Docker Compose file
Create a project folder, then a docker-compose.yml that runs Ollama and Open WebUI. This setup persists models and app data in Docker volumes and exposes ports 11434 (Ollama) and 3000 (WebUI).
mkdir -p ~/ai-chat && cd ~/ai-chat
cat > docker-compose.yml << 'YAML'
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=24h
# For NVIDIA GPU support, uncomment the next line (requires nvidia-container-toolkit)
# gpus: all
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
restart: unless-stopped
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_SECRET_KEY=change_this_long_random_string
- ENABLE_SIGNUP=true
- DEFAULT_MODELS=llama3.1:8b-instruct
ports:
- "3000:8080"
volumes:
- openwebui:/app/backend/data
volumes:
ollama:
openwebui:
YAML
Step 3 — Launch the stack
Start both services in the background:
docker compose up -d
docker compose ps
Open a browser and visit http://SERVER_IP:3000. On first visit, create an admin account. In Settings, confirm the Ollama endpoint shows http://ollama:11434 and the default model list includes llama3.1:8b-instruct.
Step 4 — Pull a model
You can pull models in the WebUI, or via CLI inside the Ollama container:
docker exec -it ollama ollama pull llama3.1:8b-instruct
After the download, start chatting in Open WebUI. If the model is large or your server is low on RAM, start with a smaller one like mistral:7b-instruct or phi3:mini.
Optional — Enable NVIDIA GPU acceleration
If nvidia-smi works on the host and you installed nvidia-container-toolkit, uncomment gpus: all for the ollama service in docker-compose.yml and redeploy:
docker compose down
sed -n '1,200p' docker-compose.yml
docker compose up -d
docker logs -f ollama
When a model runs, Ollama should log CUDA usage. You can also watch GPU load with nvidia-smi.
Optional — Enable AMD GPU (ROCm)
For AMD GPUs supported by ROCm, use the ROCm image and pass GPU devices into the container. Replace the ollama service with:
ollama:
image: ollama/ollama:rocm
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
devices:
- /dev/kfd
- /dev/dri
group_add:
- video
Then redeploy with docker compose up -d. If you see ROCm capability errors, verify your kernel/driver versions and that your user belongs to the video group.
Secure and expose your WebUI
For public access, put a reverse proxy in front with HTTPS. Caddy makes this easy:
your-domain.example {
reverse_proxy 127.0.0.1:3000
}
Point DNS to your server, install Caddy, and it will fetch certificates automatically. In Open WebUI, set strong passwords, disable open signup if you do not need it (ENABLE_SIGNUP=false), and consider enabling rate limits at the proxy.
Backups and updates
Your important data lives in two volumes: ollama (models) and openwebui (app data, history). To back them up:
docker compose stop
docker run --rm -v ollama:/src -v $PWD:/backup alpine tar czf /backup/ollama-vol.tgz -C /src .
docker run --rm -v openwebui:/src -v $PWD:/backup alpine tar czf /backup/openwebui-vol.tgz -C /src .
docker compose start
To update images and get the latest features:
docker compose pull
docker compose up -d
Models remain unless you explicitly remove the ollama volume.
Troubleshooting
- Open WebUI cannot connect to Ollama: ensure OLLAMA_BASE_URL points to http://ollama:11434 and both containers share the same Docker network (default in Compose).
- CUDA driver not found: confirm nvidia-smi works on the host; re-run nvidia-ctk; restart Docker; ensure gpus: all is enabled.
- AMD permissions error: check /dev/kfd and /dev/dri are present; add group_add: video; ensure your kernel/ROCm version supports your GPU.
- Out of memory or slow responses: choose a smaller model, or reduce threads and context in the model settings; increase swap as a temporary measure.
- No space left on device: models are large; prune unused images and models with docker image prune and ollama list / ollama rm.
Uninstall cleanly
Stop and remove containers and volumes (this also deletes downloaded models and chat data):
cd ~/ai-chat
docker compose down -v
You now have a private AI chatbot that runs entirely on your hardware. Expand it with more models, plug in document retrieval, or publish it behind a secure HTTPS domain for your team.
Comments
Post a Comment