Overview
This guide shows you how to deploy Ollama and Open WebUI on Ubuntu using Docker Compose, with optional NVIDIA GPU acceleration and automatic HTTPS. You will get a clean, reproducible setup suitable for a home lab, a developer VM, or a small on-prem server. The steps are focused on Ubuntu 22.04/24.04 LTS, but will work on other modern distributions with minor changes.
What You Will Build
You will run three containers: Ollama (LLM runtime), Open WebUI (a friendly web front end), and Caddy (a reverse proxy that issues and renews free TLS certificates). Data will persist in Docker volumes so updates and restarts do not wipe your models or chat history.
Prerequisites
1) An Ubuntu server with at least 16 GB RAM recommended for medium models (more is better). 2) A domain or subdomain (e.g., ai.example.com) pointed to your server’s public IP (A/AAAA record). 3) Ports 80 and 443 open to the Internet. 4) Optional: an NVIDIA GPU with recent drivers for acceleration. 5) A non-root user with sudo.
Step 1 — Install Docker and Compose
Update the OS and install Docker Engine and the Compose plugin from Docker’s repository:
sudo apt update && sudo apt upgrade -y
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER && newgrp docker
Step 2 — (Optional) Enable NVIDIA GPU for Containers
If you have an NVIDIA GPU, install the driver and the NVIDIA Container Toolkit so Ollama can use CUDA.
Install drivers: sudo ubuntu-drivers autoinstall
, then reboot. Verify with nvidia-smi
.
Install the container toolkit:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && \
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Step 3 — Prepare the Project
Create a directory for your stack and move into it:
mkdir -p ~/ollama-stack && cd ~/ollama-stack
We will create a docker-compose.yml and a Caddyfile. Replace ai.example.com and your email as needed.
Step 4 — Docker Compose File
Create docker-compose.yml with the content below. If you have a GPU, keep the deploy.resources.reservations.devices
section; otherwise you can remove it.
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
openwebui:
image: ghcr.io/open-webui/open-webui:latest
container_name: openwebui
restart: unless-stopped
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
volumes:
- openwebui_data:/app/backend/data
caddy:
image: caddy:latest
container_name: caddy
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config
depends_on:
- openwebui
volumes:
ollama_data:
openwebui_data:
caddy_data:
caddy_config:
Step 5 — Caddy Reverse Proxy
Create Caddyfile with your domain. Caddy will automatically issue and renew a Let’s Encrypt certificate and proxy traffic to Open WebUI.
ai.example.com {
encode gzip
reverse_proxy openwebui:8080
}
Ensure your DNS A/AAAA record points to the server before continuing. If you only need local access, you can skip Caddy and access Open WebUI on http://SERVER_IP:8080
by publishing that port; however, TLS is strongly recommended.
Step 6 — Launch the Stack
Start everything with Docker Compose:
docker compose up -d
Watch the logs for any errors, especially domain or certificate issues:
docker compose logs -f caddy
After a minute, visit https://ai.example.com
and complete the initial Open WebUI setup. In Settings, verify the Ollama endpoint is http://ollama:11434
(it should be pre-set from the environment variable).
Step 7 — Pull a Model and Test
You can pull and manage models via the Open WebUI interface, or via the CLI inside the Ollama container:
docker exec -it ollama ollama pull llama3.1
docker exec -it ollama ollama run llama3.1
If you enabled GPU support, Ollama should automatically leverage CUDA. You can confirm GPU usage with nvidia-smi
while running a prompt.
Security and Hardening Tips
- Create an admin user in Open WebUI and do not expose the Ollama port 11434 to the Internet unless you really need the API externally. In the Compose file above, only Caddy is published publicly on 80/443, which is safer.
- Restrict access by IP or add basic auth in Caddy if you want a quick gate. Example inside your site block: basicauth { user JDJhJDEw$... }
(generate hashes with caddy hash-password
).
- Keep images updated: docker compose pull && docker compose up -d
. Consider enabling automatic re-deploys on a schedule.
Performance Hints
- Use models that fit your VRAM/RAM. Smaller models like q4_K_M
quantizations work well on modest GPUs and CPUs. For CPU-only servers, prefer 7B or smaller models.
- Set swap if RAM is tight: sudo fallocate -l 16G /swapfile && sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile
. Add to /etc/fstab
for persistence.
- Place Docker volumes on fast storage (NVMe) for quicker model load times. You can bind-mount a directory like ./ollama:/root/.ollama
if you prefer easy backups.
Backup and Restore
Back up the volumes for Ollama and Open WebUI to keep models and chat history. Example quick backup of models:
docker run --rm -v ollama_data:/data -v $(pwd):/backup alpine tar czf /backup/ollama-data.tgz -C /data .
Repeat similarly for openwebui_data
. To restore, reverse the process by untarring into an identically named volume.
Troubleshooting
- If Caddy fails to get a certificate, verify your DNS record, that ports 80/443 are reachable, and no other service (like another web server) is binding them.
- If GPU is not detected, confirm nvidia-smi
works on the host and that the nvidia-container-toolkit
is installed. Restart Docker and the containers after changes.
- If Open WebUI cannot reach Ollama, ensure the environment variable points to http://ollama:11434
and that both containers share the same default network (they do in this Compose file).
Conclusion
You now have a production-grade, self-hosted LLM stack with Ollama and Open WebUI, managed by Docker Compose and protected by automatic HTTPS via Caddy. This setup is easy to maintain, portable across servers, and ready for experimentation or internal use. With GPU acceleration, you can serve sophisticated models efficiently; without a GPU, you can still run smaller quantized models for private inference. Keep your containers updated, monitor resource usage, and iterate on models that best fit your hardware and use cases.
Comments
Post a Comment