Overview
This tutorial shows you how to deploy a fast, private, local AI stack on Ubuntu 24.04 using Docker Compose. We will run Ollama (which downloads and serves LLMs like Llama 3.1) together with Open WebUI (a friendly web interface) on port 3000. You will also learn how to enable optional NVIDIA GPU acceleration, set up persistence, and manage the stack with systemd for reliable startup at boot.
What you will build
You will create a two-container setup: Ollama provides the model runtime API on an internal network, and Open WebUI connects to it and exposes a browser UI at http://server-ip:3000. Data (models and chat history) is stored on Docker volumes so updates do not erase your content.
Prerequisites
- Ubuntu 24.04 LTS server or VM with 4+ GB RAM (more is better for larger models), 20+ GB free disk, and a user with sudo rights.
- Internet access to pull Docker images and models.
- Optional: An NVIDIA GPU with drivers if you want hardware acceleration.
Step 1 — Install Docker Engine and Docker Compose
Install Docker from the official repository (includes the compose plugin):
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu noble stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version
The last two commands verify that Docker and Docker Compose are installed correctly.
Step 2 — Optional: Enable NVIDIA GPU for containers
If your host has an NVIDIA GPU, install the driver and the NVIDIA container toolkit so Docker can access the GPU. Reboot after installing the driver if required.
# Install the recommended NVIDIA driver (reboot if prompted)
sudo ubuntu-drivers install
# Add NVIDIA Container Toolkit repository and install
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/$distribution/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list > /dev/null
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
To confirm GPU visibility, you can later run: docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi.
Step 3 — Create the Docker Compose project
Create a project directory and a compose file that defines two services, persistent volumes, and a port mapping for the web interface.
sudo mkdir -p /opt/ollama-webui
sudo chown -R $USER:$USER /opt/ollama-webui
cd /opt/ollama-webui
cat > compose.yaml <<'YAML'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=24h
# Uncomment the next line if you've installed the NVIDIA container toolkit
# and want GPU acceleration:
# gpus: all
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
restart: unless-stopped
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
ports:
- "3000:8080"
volumes:
- openwebui:/app/backend/data
volumes:
ollama:
openwebui:
YAML
This configuration keeps Ollama’s model files under the ollama volume and Open WebUI data (users, chats, settings) under openwebui. If you enabled GPU, remove the comment character in front of gpus: all.
Step 4 — Start the stack
Bring up both services in the background:
docker compose up -d
docker compose ps
Open a browser to http://<server-ip>:3000. The first load may take a moment while the UI initializes.
Step 5 — Pull a model and chat
You can download models from the web UI, or pull them directly via the Ollama container. For example, to grab a compact and capable model:
docker exec -it ollama ollama pull llama3.1:8b
Return to Open WebUI, select llama3.1:8b in the model picker, and start chatting. For better performance, use GPU acceleration if available, or choose a smaller model for CPU-only machines.
Step 6 — Start at boot with systemd (optional)
Use a systemd unit so your AI stack starts automatically after reboots.
sudo tee /etc/systemd/system/ollama-webui.service >/dev/null <<'UNIT'
[Unit]
Description=Ollama + Open WebUI (Docker Compose)
After=network-online.target docker.service
Wants=network-online.target
Requires=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/ollama-webui
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
UNIT
sudo systemctl daemon-reload
sudo systemctl enable --now ollama-webui
systemctl status ollama-webui --no-pager
The service will run docker compose up -d on boot and cleanly stop the stack on shutdown.
Updating and maintenance
To update to the latest images without losing data, run:
cd /opt/ollama-webui
docker compose pull
docker compose up -d
For logs and troubleshooting, use:
docker compose logs -f
docker logs ollama -f
docker logs open-webui -f
Troubleshooting tips
- If the UI does not load, confirm that port 3000 is open in your server’s firewall and that no other service is bound to it.
- If models fail to load due to disk space, expand your storage or prune unused images with docker image prune.
- If GPU acceleration does not work, verify the host’s nvidia-smi, then test GPU inside a container. Ensure gpus: all is enabled in compose.yaml and restart the stack.
- For remote exposure over HTTPS, place Open WebUI behind a reverse proxy (Caddy, Nginx, or a cloud tunnel) and restrict access with authentication.
You are done
You now have a modern, private AI chat environment running locally on Ubuntu 24.04. With Docker volumes for persistence, optional GPU acceleration, and systemd for auto-start, this setup is reliable and easy to maintain. Add or switch models anytime using Ollama, and enjoy a clean, fast interface with Open WebUI.
Comments
Post a Comment