Deploy a Private AI Chat Server with Ollama and Open WebUI on Ubuntu using Docker Compose (GPU Optional)

Overview

This step-by-step guide shows you how to deploy a private AI chat server on Ubuntu using Ollama and Open WebUI with Docker Compose. Ollama runs large language models (LLMs) locally, while Open WebUI gives you a clean web interface for chat, prompts, and model management. The setup works on CPUs and can optionally use an NVIDIA GPU for much faster inference. You will learn installation, configuration, GPU enablement, security basics, updates, and backup tips.

Prerequisites

Before you start, make sure you have: (1) Ubuntu 22.04/24.04 or another recent Linux distro, (2) sudo access, (3) at least 8 GB of RAM (more is better), (4) 20+ GB of free disk space for models, (5) Docker Engine and the Docker Compose plugin, and optionally (6) an NVIDIA GPU with drivers and the NVIDIA Container Toolkit if you want acceleration.

Step 1: Install Docker and Compose

Install Docker Engine and Compose using the official repository. If you already have Docker, you can skip to the next step.

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Step 2: Create the Docker Compose project

Create a working directory and a Docker Compose file that launches two services: ollama (the model runtime and API) and open-webui (the frontend). This configuration stores models in a named volume and exposes the web UI on port 3000. The GPU configuration is included and can be left in place even if you are on CPU-only; it will be ignored without an NVIDIA setup.

mkdir -p ~/ollama-openwebui
cd ~/ollama-openwebui
nano docker-compose.yml

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      - OLLAMA_API_BASE=http://ollama:11434
      - WEBUI_NAME=Private AI Chat
      - ENABLE_SIGNUP=true
    ports:
      - "3000:8080"
    volumes:
      - openwebui-data:/app/backend/data

volumes:
  ollama-data:
  openwebui-data:

Step 3: Start the stack and pull a model

Bring the services up in the background and open the web UI at http://SERVER_IP:3000. The first load may take a moment.

docker compose up -d

You can pull models from the UI (Models menu) or via the CLI. For example, to fetch a good general model:

docker exec -it ollama ollama pull llama3.1
# Other options: mistral, phi3, qwen2, codellama, llama3.1:8b-instruct-q4_K_M

In Open WebUI, select your model from the dropdown, then start chatting. You can also adjust system prompts, temperature, and context length from the settings.

Step 4: Enable GPU acceleration (optional)

To use an NVIDIA GPU, install the driver and the NVIDIA Container Toolkit, then restart Docker. Your Compose file above already includes GPU reservations; Docker will attach GPUs automatically when available.

# Install NVIDIA driver (check your GPU support docs)
sudo apt-get install -y nvidia-driver-535

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Recreate containers
docker compose up -d --force-recreate

Verify GPU is visible:

docker exec -it ollama nvidia-smi

Step 5: Secure access

By default, the web UI is open to anyone who can reach the server. For small teams, keep the service bound to your private network, enable signups only for trusted users, and set an admin email with environment variables in the Open WebUI service. For internet exposure, place NGINX or Caddy in front with HTTPS and basic auth or OIDC. A quick alternative is to keep port 3000 closed publicly and use an SSH tunnel: ssh -L 3000:localhost:3000 user@server.

Step 6: Update and backup

To update to the latest versions, pull new images and recreate containers without losing data:

docker compose pull
docker compose up -d

Back up your volumes regularly. They contain downloaded models and user data. You can snapshot them to a tar archive:

docker run --rm -v ollama-openwebui_ollama-data:/data -v $PWD:/backup alpine \
  tar -czf /backup/ollama-data.tgz -C /data .
docker run --rm -v ollama-openwebui_openwebui-data:/data -v $PWD:/backup alpine \
  tar -czf /backup/openwebui-data.tgz -C /data .

Troubleshooting tips

If models do not load, check logs: docker logs -f ollama and docker logs -f open-webui. For out-of-memory errors, choose a smaller model variant (e.g., 7B/8B quantized). If GPU is not detected, ensure the driver and toolkit versions match, verify nvidia-smi works on the host, and recreate containers. Slow responses on CPU are normal; try quantized models (like Q4_K_M) for better speed and lower RAM. To change the web UI name, edit WEBUI_NAME and run docker compose up -d.

What you achieved

You now have a private AI chat server running locally with Docker. Ollama hosts your LLMs, Open WebUI provides a friendly interface, and optional NVIDIA acceleration boosts performance. With updates and backups in place, you can safely iterate, add specialized models for code or documents, and keep your AI workflows under your control.

LifeBytes Journal

Search This Blog