Local AI is now practical: with Ollama you can run large language models (LLMs) on your machine, and OpenWebUI gives you a clean, chat-style interface. In this tutorial, you will deploy both on Ubuntu 22.04/24.04 using Docker Compose, with optional NVIDIA GPU acceleration for much faster inference.
Why OpenWebUI + Ollama?
Ollama manages model downloads and provides an OpenAI-compatible API at /v1
. OpenWebUI is a lightweight, self-hosted web frontend that connects to Ollama and adds chat history, prompt templates, and simple administration. Together, they create a private, zero-cost alternative to cloud AI for development, prototyping, and offline use.
Prerequisites
- Ubuntu 22.04 LTS or 24.04 LTS with sudo access.
- Stable internet connection and at least 16 GB of RAM recommended for medium models.
- Optional but recommended: an NVIDIA GPU (Turing or newer) with recent drivers for CUDA acceleration.
Step 1 — Install Docker and Docker Compose
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
Log out and back in (or reboot) to apply the new group membership so you can run Docker without sudo.
Step 2 — Enable GPU support (NVIDIA Container Toolkit)
If you do not have an NVIDIA GPU, skip to Step 3. If you do, install the proprietary driver first:
sudo ubuntu-drivers install
sudo reboot
After reboot, verify the driver:
nvidia-smi
Install the NVIDIA Container Toolkit so Docker can access the GPU:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Test Docker GPU access:
docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi
Step 3 — Create the Docker Compose file
Create a project directory and the Compose file:
mkdir -p ~/ai-stack && cd ~/ai-stack
nano compose.yml
Paste the following contents and save:
version: "3.9"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=5m
- OLLAMA_MAX_LOADED_MODELS=2
gpus: all # Remove this line if you do not have an NVIDIA GPU
openwebui:
image: ghcr.io/open-webui/open-webui:main
container_name: openwebui
restart: unless-stopped
ports:
- "8080:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=True # Require login
volumes:
- openwebui:/app/backend/data
depends_on:
- ollama
volumes:
ollama:
openwebui:
Step 4 — Launch the stack
docker compose up -d
Open OpenWebUI in your browser: http://<your-server-ip>:8080
. On first run, create an admin account when prompted. The backend (Ollama) will be reachable at http://ollama:11434
inside the Docker network and at http://<your-server-ip>:11434
from your LAN.
Step 5 — Download a model and test
Pull a model with the Ollama CLI (inside the container) or use OpenWebUI’s “Models” tab:
docker exec -it ollama ollama pull llama3.1:8b
Try a quick prompt:
docker exec -it ollama ollama run llama3.1:8b "Explain what a vector database is in one paragraph."
Return to OpenWebUI and start chatting with the downloaded model. If you have a GPU configured, latency will drop significantly compared to CPU-only mode.
Optional — Secure and expose the UI
- Keep OpenWebUI private on your LAN and enable authentication (WEBUI_AUTH=True
) as shown.
- For public access, place a reverse proxy like Caddy, Nginx Proxy Manager, or Traefik in front, and obtain Let’s Encrypt certificates. Bind OpenWebUI to 127.0.0.1:8080
and publish the proxy instead.
Troubleshooting
- GPU not detected: verify nvidia-smi
works on the host. Re-run sudo nvidia-ctk runtime configure --runtime=docker
and sudo systemctl restart docker
. Ensure the gpus: all
line is present for the ollama service and restart with docker compose up -d
.
- Slow or out-of-memory errors: try a smaller model (for example, llama3.2:3b
), or add --num-ctx 2048
in OpenWebUI’s model settings to reduce memory use.
- Port conflicts: change the host ports in the Compose file (e.g., "8081:8080"
).
- Persistence: models are stored in the ollama
volume; UI data (prompts, chats) in the openwebui
volume. Back them up with docker run --rm -v ollama:/data -v $(pwd):/backup alpine tar czf /backup/ollama.tar.gz -C / data
.
Maintenance tips
- Update images: docker compose pull && docker compose up -d
.
- View logs: docker compose logs -f ollama
and docker compose logs -f openwebui
.
- Use the OpenAI-compatible API: your apps can point to http://<server-ip>:11434/v1
with the model name you downloaded. Most SDKs accept a custom base URL and a dummy API key.
You now have a modern, private AI stack running locally. Iterate on prompts, fine-tune your workflow, and scale up to larger models as your hardware allows—all while keeping your data on your own machine.
Comments
Post a Comment