Deploy a Private AI Chat Server with Ollama and Open WebUI on Ubuntu using Docker Compose (GPU Optional)
Overview
This step-by-step guide shows you how to deploy a private AI chat server on Ubuntu using Ollama and Open WebUI with Docker Compose. Ollama runs large language models (LLMs) locally, while Open WebUI gives you a clean web interface for chat, prompts, and model management. The setup works on CPUs and can optionally use an NVIDIA GPU for much faster inference. You will learn installation, configuration, GPU enablement, security basics, updates, and backup tips.
Prerequisites
Before you start, make sure you have: (1) Ubuntu 22.04/24.04 or another recent Linux distro, (2) sudo access, (3) at least 8 GB of RAM (more is better), (4) 20+ GB of free disk space for models, (5) Docker Engine and the Docker Compose plugin, and optionally (6) an NVIDIA GPU with drivers and the NVIDIA Container Toolkit if you want acceleration.
Step 1: Install Docker and Compose
Install Docker Engine and Compose using the official repository. If you already have Docker, you can skip to the next step.
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
Step 2: Create the Docker Compose project
Create a working directory and a Docker Compose file that launches two services: ollama (the model runtime and API) and open-webui (the frontend). This configuration stores models in a named volume and exposes the web UI on port 3000. The GPU configuration is included and can be left in place even if you are on CPU-only; it will be ignored without an NVIDIA setup.
mkdir -p ~/ollama-openwebui
cd ~/ollama-openwebui
nano docker-compose.yml
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
restart: unless-stopped
depends_on:
- ollama
environment:
- OLLAMA_API_BASE=http://ollama:11434
- WEBUI_NAME=Private AI Chat
- ENABLE_SIGNUP=true
ports:
- "3000:8080"
volumes:
- openwebui-data:/app/backend/data
volumes:
ollama-data:
openwebui-data:
Step 3: Start the stack and pull a model
Bring the services up in the background and open the web UI at http://SERVER_IP:3000. The first load may take a moment.
docker compose up -d
You can pull models from the UI (Models menu) or via the CLI. For example, to fetch a good general model:
docker exec -it ollama ollama pull llama3.1
# Other options: mistral, phi3, qwen2, codellama, llama3.1:8b-instruct-q4_K_M
In Open WebUI, select your model from the dropdown, then start chatting. You can also adjust system prompts, temperature, and context length from the settings.
Step 4: Enable GPU acceleration (optional)
To use an NVIDIA GPU, install the driver and the NVIDIA Container Toolkit, then restart Docker. Your Compose file above already includes GPU reservations; Docker will attach GPUs automatically when available.
# Install NVIDIA driver (check your GPU support docs)
sudo apt-get install -y nvidia-driver-535
# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Recreate containers
docker compose up -d --force-recreate
Verify GPU is visible:
docker exec -it ollama nvidia-smi
Step 5: Secure access
By default, the web UI is open to anyone who can reach the server. For small teams, keep the service bound to your private network, enable signups only for trusted users, and set an admin email with environment variables in the Open WebUI service. For internet exposure, place NGINX or Caddy in front with HTTPS and basic auth or OIDC. A quick alternative is to keep port 3000 closed publicly and use an SSH tunnel: ssh -L 3000:localhost:3000 user@server
.
Step 6: Update and backup
To update to the latest versions, pull new images and recreate containers without losing data:
docker compose pull
docker compose up -d
Back up your volumes regularly. They contain downloaded models and user data. You can snapshot them to a tar archive:
docker run --rm -v ollama-openwebui_ollama-data:/data -v $PWD:/backup alpine \
tar -czf /backup/ollama-data.tgz -C /data .
docker run --rm -v ollama-openwebui_openwebui-data:/data -v $PWD:/backup alpine \
tar -czf /backup/openwebui-data.tgz -C /data .
Troubleshooting tips
If models do not load, check logs: docker logs -f ollama
and docker logs -f open-webui
. For out-of-memory errors, choose a smaller model variant (e.g., 7B/8B quantized). If GPU is not detected, ensure the driver and toolkit versions match, verify nvidia-smi
works on the host, and recreate containers. Slow responses on CPU are normal; try quantized models (like Q4_K_M) for better speed and lower RAM. To change the web UI name, edit WEBUI_NAME
and run docker compose up -d
.
What you achieved
You now have a private AI chat server running locally with Docker. Ollama hosts your LLMs, Open WebUI provides a friendly interface, and optional NVIDIA acceleration boosts performance. With updates and backups in place, you can safely iterate, add specialized models for code or documents, and keep your AI workflows under your control.
Comments
Post a Comment