How to Deploy Ollama and Open WebUI with NVIDIA GPU on Ubuntu using Docker Compose

Overview

This tutorial shows how to deploy Ollama (a lightweight local LLM runner) and Open WebUI (a clean chat interface) on Ubuntu using Docker Compose, with optional NVIDIA GPU acceleration. You will get a private, browser-based interface to run models like Llama 3 locally without sending data to the cloud. The steps work on Ubuntu 22.04/24.04. If you do not have a compatible NVIDIA GPU, you can still run everything on CPU.

Prerequisites

- Ubuntu 22.04 or newer (server or desktop)

- Sudo privileges

- For GPU acceleration: an NVIDIA GPU with recent drivers and CUDA support

1) Install Docker and Docker Compose

Install the Docker Engine and the Compose plugin in a few commands. You can also use official packages from Docker to stay current.

curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker
sudo apt-get install -y docker-compose-plugin
docker compose version

2) Install NVIDIA Container Toolkit (GPU only)

If you have an NVIDIA GPU, install drivers (if you have not already), then add the NVIDIA Container Toolkit so Docker can access the GPU.

# Install recommended NVIDIA drivers (reboot may be required)
sudo apt-get update
sudo apt-get install -y ubuntu-drivers-common
sudo ubuntu-drivers autoinstall
# After this finishes, reboot if drivers were installed/updated
# sudo reboot

# Add NVIDIA Container Toolkit repository and install
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker runtime and restart Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify GPU access from Docker:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

You should see your GPU listed. If not, ensure drivers are installed correctly and Docker has restarted.

3) Create the Docker Compose project

Create a working directory and a Compose file that runs two services: Ollama (backend API) and Open WebUI (frontend). The configuration below binds Ollama to localhost for safety and exposes Open WebUI on port 3000.

mkdir -p ~/ollama-openwebui
cd ~/ollama-openwebui
nano docker-compose.yml

Paste the following content. If you do not have a GPU, remove the runtime line and the NVIDIA environment variable in the ollama service.

version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "127.0.0.1:11434:11434"
    volumes:
      - ollama:/root/.ollama
    environment:
      - OLLAMA_KEEP_ALIVE=24h
      - NVIDIA_VISIBLE_DEVICES=all
    runtime: nvidia

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_API_BASE=http://ollama:11434
    volumes:
      - open-webui:/app/backend/data

volumes:
  ollama:
  open-webui:

4) Start the stack

Bring everything up in the background and follow the logs to ensure both services are healthy.

docker compose up -d
docker compose ps
docker compose logs -f ollama

5) Pull and run a model

Use Ollama to pull a model such as llama3 (Meta’s Llama 3 8B). This downloads the model into the ollama volume. The first pull may take a while.

docker exec -it ollama ollama pull llama3
docker exec -it ollama ollama run llama3

You can also open the browser interface at http://<your-server-ip>:3000, choose Ollama as the provider (it should auto-detect), select or download a model, and start chatting. When a supported GPU is available, Ollama will automatically use it.

6) Secure and tune

Network access: By default, Ollama is bound to localhost and Open WebUI is exposed on port 3000. If this is a public server, restrict access with a firewall (UFW, security groups) or place Open WebUI behind a reverse proxy (Caddy, Nginx) with HTTPS.

Authentication: Open WebUI supports user accounts. Open the UI, create an admin, and disable open signups in Settings if you do not want others to register. For single-user setups, keep the service bound to a private network.

Model storage: Models are stored in the ollama named volume. To reclaim space, remove unused models with docker exec -it ollama ollama rm <model>.

Updates: Update to the latest images and restart:

docker compose pull
docker compose up -d

Troubleshooting

Docker cannot see the GPU: If --gpus all fails, confirm the NVIDIA driver is installed and loaded (nvidia-smi on host). Re-run sudo nvidia-ctk runtime configure --runtime=docker and sudo systemctl restart docker. Some older setups require a reboot after driver installation.

Address already in use: Change ports in the Compose file (for example, map Open WebUI to "127.0.0.1:3001:8080") if port 3000 is occupied.

CPU-only mode: If you have no GPU, remove the runtime: nvidia line and NVIDIA_VISIBLE_DEVICES environment variable from the ollama service. Performance will be lower but still usable for smaller models.

Slow inference: Use quantized models (e.g., llama3:8b-instruct-q4_0) and increase RAM/swap. On GPU, ensure you have sufficient VRAM; otherwise, use a smaller or more aggressively quantized model.

What you built

You now have a maintainable, containerized local AI stack with Ollama and Open WebUI on Ubuntu. This setup is easy to update, safe to run offline, and flexible enough to add more services later (embeddings, vector databases, or reverse proxies). Enjoy private, low-latency LLM chat on your own hardware.

Comments