Deploy Ollama + Open WebUI with NVIDIA GPU on Ubuntu using Docker Compose and Nginx (HTTPS-ready)

Overview

This step-by-step guide shows you how to deploy an AI chatbot stack with Ollama (for running local LLMs) and Open WebUI (a clean, browser-based interface) on Ubuntu 22.04 or 24.04. We will run both apps in Docker, enable NVIDIA GPU acceleration, and put Nginx in front with a free TLS certificate from Let's Encrypt. You will get a production-friendly setup with persistent storage, HTTPS, and simple maintenance commands.

What you'll build

You will end up with two containers on a private Docker network: ollama (listening on 11434) and openwebui (listening on 8080, mapped to localhost:3000). Nginx will reverse proxy a public domain (for example, ai.example.com) to Open WebUI and handle SSL. Models and chat data will be stored on the host so updates don't wipe them.

Prerequisites

- Ubuntu 22.04/24.04 with sudo access
- An NVIDIA GPU (Turing or newer recommended) and a supported driver
- A DNS A record pointing your domain (e.g., ai.example.com) to your server's public IP
- Outbound internet access to pull images and models

1) Install NVIDIA driver and container toolkit

Update the system and install the proprietary driver. If you don't already have the correct driver, Ubuntu can choose one for you:
sudo apt update && sudo apt -y upgrade
sudo ubuntu-drivers autoinstall
Reboot:
sudo reboot

After reboot, confirm the GPU is visible:
nvidia-smi
Install the NVIDIA Container Toolkit so Docker can use the GPU:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt -y install nvidia-container-toolkit
Configure Docker to use it and restart Docker:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

2) Install Docker Engine and Docker Compose plugin

Install the official Docker packages:
sudo apt -y install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Verify:
docker --version
docker compose version

3) Create persistent folders and a Docker Compose file

Create directories for persistent data:
sudo mkdir -p /opt/ollama /opt/openwebui
sudo chown -R $USER:$USER /opt/ollama /opt/openwebui
Now create a compose.yml in a new project folder (for example, /opt/ai-stack/compose.yml) with the following content:

version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- /opt/ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=24h
networks:
- ai
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

openwebui:
image: ghcr.io/open-webui/open-webui:latest
container_name: openwebui
restart: unless-stopped
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
ports:
- "127.0.0.1:3000:8080"
volumes:
- /opt/openwebui:/app/backend/data
networks:
- ai

networks:
ai:

Notes: We bind Open WebUI to localhost:3000 so it is not exposed directly. Nginx will handle public traffic. The NVIDIA device reservation passes the GPU into the Ollama container. If your Docker Compose version supports it, you may also use gpus: all under the ollama service instead of the deploy block.

4) Start the stack and test

From the folder with compose.yml, bring the stack up:
docker compose up -d
Watch logs until both services are healthy:
docker compose logs -f
Pull a model and perform a quick GPU test (you should see GPU usage spike in nvidia-smi):
docker exec -it ollama ollama pull llama3:8b
docker exec -it ollama ollama run llama3:8b
Locally, you can visit Open WebUI at http://127.0.0.1:3000. Next, we'll put it behind HTTPS.

5) Install Nginx and obtain a Let's Encrypt certificate

Install Nginx and Certbot:
sudo apt -y install nginx certbot python3-certbot-nginx
Create an Nginx server block (replace ai.example.com with your domain):
sudo nano /etc/nginx/sites-available/ai.conf
Paste:

server {
listen 80;
server_name ai.example.com;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
client_max_body_size 20m;
}

Enable and test:
sudo ln -s /etc/nginx/sites-available/ai.conf /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
Issue a certificate and force HTTPS:
sudo certbot --nginx -d ai.example.com --redirect --agree-tos -m you@example.com
Now browse to https://ai.example.com and you should see Open WebUI served over TLS.

6) Backups, updates, and security tips

Back up data by archiving the two directories we created:
sudo tar -czf /root/ollama-backup.tgz /opt/ollama
sudo tar -czf /root/openwebui-backup.tgz /opt/openwebui
Open WebUI stores conversations and settings in /opt/openwebui; Ollama stores models and blobs in /opt/ollama.

To update images with minimal downtime:
docker compose pull
docker compose up -d
Old images can be cleaned with docker image prune when you're done testing. For security, keep ports private (we only published 3000 to localhost) and use your firewall to allow 80/443 only. If you need extra protection, add HTTP Basic Auth to Nginx and restrict by IP when possible.

Troubleshooting

- GPU not used: watch nvidia-smi while running a model. If it stays idle, recheck the NVIDIA driver, container toolkit, and the GPU reservation in compose.yml.
- Models fail due to VRAM limits: try smaller variants (e.g., llama3:8b instead of 70B) or quantized builds (like q4_K_M).
- Port conflicts: change the host port mapping in compose.yml if 3000 is taken (e.g., use 127.0.0.1:3100:8080 and update the Nginx proxy_pass accordingly).
- Certbot issues: make sure your domain points to the server’s public IP and TCP/80 is reachable from the internet during certificate issuance.

What's next

From here, you can connect more tools to the Ollama API, add multiple models, fine-tune your Nginx headers, or place the stack behind Cloudflare. The setup is simple to maintain: pull updates, restart the stack, and your data persists. With GPU acceleration and HTTPS in place, you have a fast, private AI assistant ready for everyday use.

LifeBytes Journal

Search This Blog