Deploy Ollama and Open WebUI on Ubuntu 22.04/24.04 with NVIDIA GPU Acceleration (Docker Compose)

Running large language models locally is easier than ever. In this guide, you will deploy Ollama and Open WebUI on Ubuntu 22.04 or 24.04 using Docker Compose, with optional NVIDIA GPU acceleration for faster inference. Ollama handles model management and inference, while Open WebUI gives you a clean, browser-based interface. By the end, you will have a persistent, secure setup ready for daily use.

Prerequisites

You need an Ubuntu 22.04 or 24.04 server with at least 16 GB RAM for 7B–8B models (more is better), 30+ GB free disk space, and internet access. GPU acceleration is optional but recommended: an NVIDIA GPU with drivers installed significantly speeds up responses. You will also need sudo privileges. If UFW or another firewall is enabled, plan to allow TCP 3000 (Open WebUI) and 11434 (Ollama) for local access.

Step 1: Update Ubuntu

Make sure your system is current. This reduces dependency conflicts and ensures you get the latest Docker packages.

sudo apt update && sudo apt -y upgrade
sudo reboot

Step 2: Install Docker Engine and Docker Compose Plugin

Install the official Docker repository, Docker Engine, and the Compose plugin. This is the most reliable way to run both Ollama and Open WebUI containers with persistent volumes.

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Step 3 (Optional but Recommended): Enable NVIDIA GPU for Containers

If you have an NVIDIA GPU, install the proprietary driver and the NVIDIA Container Toolkit so Docker can access the GPU. If you are CPU-only, skip to Step 4.

# Install NVIDIA driver (reboot after)
sudo ubuntu-drivers autoinstall
sudo reboot

# After reboot, verify the GPU
nvidia-smi

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/$distribution/nvidia-container-toolkit.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Optional sanity test
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Step 4: Create a Docker Compose File

Use Docker Compose to orchestrate both services. The configuration below persists model data, restarts on failures, and binds Open WebUI to port 3000. GPU access is configured using device reservations. Save this as docker-compose.yml in an empty directory (for example, ~/ai-stack).

version: "3.8"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    environment:
      - OLLAMA_KEEP_ALIVE=24h
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    depends_on:
      - ollama
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - openwebui:/app/backend/data

volumes:
  ollama:
  openwebui:

If you do not have an NVIDIA GPU or do not want to use it, you can remove the entire deploy.resources block from the ollama service. Ollama will run on CPU automatically, albeit slower.

Step 5: Start the Stack

Bring the services online in detached mode. The first start will pull images, which may take a few minutes depending on your connection speed.

docker compose up -d
docker compose ps

Open your browser and go to http://<server-ip>:3000. The Open WebUI interface should load. The first time you access it, you will be prompted to create an account. This account is stored in the openwebui volume for persistence.

Step 6: Pull a Model and Run Your First Chat

You can pull a model either from the Open WebUI interface or via the Ollama CLI inside the container. Popular choices include llama3, llama3.1, mistral, and qwen. The example below pulls Llama 3 8B. Adjust the model size to fit your RAM/GPU VRAM.

# Pull from inside the Ollama container
docker exec -it ollama ollama pull llama3:8b

# Verify Ollama is responding
curl http://localhost:11434/api/tags

In Open WebUI, select the pulled model from the dropdown and start chatting. If you see slow responses on CPU, confirm that your GPU is being used by monitoring nvidia-smi while generating text.

Step 7: Secure and Tune Your Deployment

By default, Open WebUI exposes port 3000. If you only use it locally, bind to localhost by editing the compose file port mapping to "127.0.0.1:3000:8080". For remote access, place a reverse proxy like Nginx or Caddy in front with HTTPS. Inside Open WebUI settings, disable open sign-ups after creating your admin account to restrict access.

Consider setting model-specific parameters in Open WebUI such as temperature, top_p, and context length. Ollama supports model-level configuration via Modelfiles if you want reproducible prompts and system messages. You can also set OLLAMA_NUM_PARALLEL to control concurrency for multiple users.

Updating, Backups, and Uninstall

To update, pull the latest images and recreate containers without losing data, since volumes persist your models and settings.

docker compose pull
docker compose up -d

For backups, snapshot the ollama and openwebui volumes or back up the entire /var/lib/docker/volumes paths created by this stack. To remove the stack without deleting data, run docker compose down. To fully remove everything, include the -v flag to delete volumes.

docker compose down        # stops and removes containers
docker compose down -v     # also removes volumes (data loss)

Troubleshooting

If Open WebUI cannot connect to Ollama, ensure the OLLAMA_BASE_URL is set to http://ollama:11434 and that both containers are in the same compose project. If the port 3000 or 11434 is already in use, change the host-side port in the compose file. For GPU issues like “no CUDA devices found,” verify that nvidia-smi works on the host and that the NVIDIA Container Toolkit is installed and Docker was restarted. If you see permission errors using Docker, confirm your user is in the docker group and re-open your shell or use newgrp docker.

With this setup, you now have a modern, self-hosted AI stack on Ubuntu that is fast, secure, and easy to maintain. Enjoy experimenting with different models, fine-tuning settings, and integrating Open WebUI into your daily workflow.

LifeBytes Journal

Search This Blog