Deploy a Private AI Chat on Ubuntu 24.04 with Ollama and Open WebUI (Docker, CPU/GPU)

Overview

This step-by-step guide shows how to self-host a private AI chat on Ubuntu 24.04 using Ollama and Open WebUI with Docker. You will be able to run modern large language models (LLMs) on your own server, keep your data local, and optionally accelerate inference with an NVIDIA GPU. The setup takes less than 30 minutes and works on CPU-only hosts as well.

Prerequisites

- Ubuntu 24.04 LTS (fresh or existing server)

- 8 GB RAM minimum (16 GB recommended), at least 20 GB free disk space

- Optional: NVIDIA GPU with recent driver (535+), CUDA-capable

- A user with sudo privileges

Step 1 — Install Docker and Docker Compose plugin

Install the Docker Engine from the official repository to get the current stable version and the Compose plugin. Run:

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo $VERSION_CODENAME) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Verify installation:

docker version
docker compose version

Step 2 — Enable GPU acceleration (optional)

If your server has an NVIDIA GPU, install a compatible driver and the NVIDIA Container Toolkit to expose the GPU to containers. Skip this section if you are running CPU-only.

# Install or update NVIDIA driver (reboot required)
sudo ubuntu-drivers install
sudo reboot

After reboot, install the container toolkit:

distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Test GPU visibility (should show your GPU):

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Step 3 — Create the Docker Compose file

Create a project folder and a Compose file with Ollama (model runtime) and Open WebUI (chat interface). The configuration below enables GPU automatically if present.

mkdir -p ~/ai-chat && cd ~/ai-chat
cat > docker-compose.yml <<'YAML'
version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    environment:
      - OLLAMA_KEEP_ALIVE=24h
    restart: unless-stopped
    # Comment the next line if you are CPU-only
    gpus: all

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_AUTH=true
      - ENABLE_SIGNUP=true
    ports:
      - "3000:8080"
    volumes:
      - open-webui:/app/backend/data
    restart: unless-stopped

volumes:
  ollama:
  open-webui:
YAML

Step 4 — Start the stack

Launch both services in the background:

docker compose up -d
docker compose ps

Open your browser at http://SERVER_IP:3000 to access Open WebUI. The first visit will ask you to create an account because WEBUI_AUTH is enabled.

Step 5 — Pull a model and run your first chat

Ollama hosts models locally. Pull a model you want to try, such as Llama 3 8B (balanced size and performance) or a smaller model if resources are tight.

# Pull a model (examples)
docker exec -it ollama ollama pull llama3:8b
# Optional smaller models:
# docker exec -it ollama ollama pull llama3.2:3b
# docker exec -it ollama ollama pull mistral:7b

# Quick API test
curl http://localhost:11434/api/generate -d '{"model":"llama3:8b","prompt":"Hello, world!"}'

Back in Open WebUI, pick the pulled model from the dropdown and start chatting. If you enabled GPU, you should notice faster responses and lower CPU usage.

Step 6 — Secure and expose (optional)

If you plan to access the chat over the internet, protect it with HTTPS and strong authentication. At a minimum, restrict ports to trusted IPs:

# Allow local LAN only (adjust to your subnet)
sudo ufw allow from 192.168.0.0/16 to any port 3000 proto tcp
sudo ufw allow from 192.168.0.0/16 to any port 11434 proto tcp

For public access, put Open WebUI behind a reverse proxy like Caddy or Nginx with a domain and TLS certificates, and keep WEBUI_AUTH enabled. You can also disable sign-ups after creating your admin account by setting ENABLE_SIGNUP=false in the Compose file and running docker compose up -d again.

Step 7 — Update, backup, and remove

To update to the latest images:

cd ~/ai-chat
docker compose pull
docker compose up -d

To back up data, archive the named volumes. Stop the stack first for a consistent snapshot:

docker compose down
sudo tar -czf ollama-vol.tgz /var/lib/docker/volumes/ai-chat_ollama/_data
sudo tar -czf webui-vol.tgz /var/lib/docker/volumes/ai-chat_open-webui/_data
docker compose up -d

To remove everything:

docker compose down -v

Troubleshooting

- GPU not detected: confirm nvidia-smi works on the host and the test container command prints your GPU. Ensure you restarted Docker after nvidia-ctk runtime configure.

- Out-of-memory or slow responses: try a smaller model (e.g., llama3.2:3b), or increase swap. Avoid running multiple large models at once.

- Port conflicts: change the host ports in docker-compose.yml (for example, map 3001:8080) and re-run docker compose up -d.

You now have a fast, private AI chat running on your own server with full control over models, updates, and data.

LifeBytes Journal

Search This Blog