Deploy OpenWebUI and Ollama with NVIDIA GPU on Ubuntu using Docker Compose

Local AI is now practical: with Ollama you can run large language models (LLMs) on your machine, and OpenWebUI gives you a clean, chat-style interface. In this tutorial, you will deploy both on Ubuntu 22.04/24.04 using Docker Compose, with optional NVIDIA GPU acceleration for much faster inference.

Why OpenWebUI + Ollama?

Ollama manages model downloads and provides an OpenAI-compatible API at /v1. OpenWebUI is a lightweight, self-hosted web frontend that connects to Ollama and adds chat history, prompt templates, and simple administration. Together, they create a private, zero-cost alternative to cloud AI for development, prototyping, and offline use.

Prerequisites

- Ubuntu 22.04 LTS or 24.04 LTS with sudo access.
- Stable internet connection and at least 16 GB of RAM recommended for medium models.
- Optional but recommended: an NVIDIA GPU (Turing or newer) with recent drivers for CUDA acceleration.

Step 1 — Install Docker and Docker Compose

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER

Log out and back in (or reboot) to apply the new group membership so you can run Docker without sudo.

Step 2 — Enable GPU support (NVIDIA Container Toolkit)

If you do not have an NVIDIA GPU, skip to Step 3. If you do, install the proprietary driver first:

sudo ubuntu-drivers install
sudo reboot

After reboot, verify the driver:

nvidia-smi

Install the NVIDIA Container Toolkit so Docker can access the GPU:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Test Docker GPU access:

docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi

Step 3 — Create the Docker Compose file

Create a project directory and the Compose file:

mkdir -p ~/ai-stack && cd ~/ai-stack
nano compose.yml

Paste the following contents and save:

version: "3.9"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=5m
- OLLAMA_MAX_LOADED_MODELS=2
gpus: all # Remove this line if you do not have an NVIDIA GPU

openwebui:
image: ghcr.io/open-webui/open-webui:main
container_name: openwebui
restart: unless-stopped
ports:
- "8080:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=True # Require login
volumes:
- openwebui:/app/backend/data
depends_on:
- ollama

volumes:
ollama:
openwebui:

Step 4 — Launch the stack

docker compose up -d

Open OpenWebUI in your browser: http://<your-server-ip>:8080. On first run, create an admin account when prompted. The backend (Ollama) will be reachable at http://ollama:11434 inside the Docker network and at http://<your-server-ip>:11434 from your LAN.

Step 5 — Download a model and test

Pull a model with the Ollama CLI (inside the container) or use OpenWebUI’s “Models” tab:

docker exec -it ollama ollama pull llama3.1:8b

Try a quick prompt:

docker exec -it ollama ollama run llama3.1:8b "Explain what a vector database is in one paragraph."

Return to OpenWebUI and start chatting with the downloaded model. If you have a GPU configured, latency will drop significantly compared to CPU-only mode.

Optional — Secure and expose the UI

- Keep OpenWebUI private on your LAN and enable authentication (WEBUI_AUTH=True) as shown.
- For public access, place a reverse proxy like Caddy, Nginx Proxy Manager, or Traefik in front, and obtain Let’s Encrypt certificates. Bind OpenWebUI to 127.0.0.1:8080 and publish the proxy instead.

Troubleshooting

- GPU not detected: verify nvidia-smi works on the host. Re-run sudo nvidia-ctk runtime configure --runtime=docker and sudo systemctl restart docker. Ensure the gpus: all line is present for the ollama service and restart with docker compose up -d.
- Slow or out-of-memory errors: try a smaller model (for example, llama3.2:3b), or add --num-ctx 2048 in OpenWebUI’s model settings to reduce memory use.
- Port conflicts: change the host ports in the Compose file (e.g., "8081:8080").
- Persistence: models are stored in the ollama volume; UI data (prompts, chats) in the openwebui volume. Back them up with docker run --rm -v ollama:/data -v $(pwd):/backup alpine tar czf /backup/ollama.tar.gz -C / data.

Maintenance tips

- Update images: docker compose pull && docker compose up -d.
- View logs: docker compose logs -f ollama and docker compose logs -f openwebui.
- Use the OpenAI-compatible API: your apps can point to http://<server-ip>:11434/v1 with the model name you downloaded. Most SDKs accept a custom base URL and a dummy API key.

You now have a modern, private AI stack running locally. Iterate on prompts, fine-tune your workflow, and scale up to larger models as your hardware allows—all while keeping your data on your own machine.

Comments