Running a private, GPU-accelerated AI assistant is now easier than ever. In this step-by-step guide, you will deploy Ollama (for running LLMs locally) and Open WebUI (a clean browser interface) on Ubuntu 22.04/24.04 using Docker Compose with NVIDIA GPU support. This setup is fast, reproducible, and ideal for teams or power users who want a self-hosted ChatGPT-like experience with full control.
Prerequisites
Before you begin, you will need:
- An Ubuntu 22.04 or 24.04 server or workstation.
- An NVIDIA GPU with recent drivers (e.g., RTX 20/30/40 series or A-series).
- Docker Engine, Docker Compose (v2), and the NVIDIA Container Toolkit.
- At least 20 GB free disk space and adequate RAM/VRAM (8–24 GB VRAM recommended for larger models).
Step 1: Install Docker Engine and Compose
Install Docker using the official repository for reliability and updates:
sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version
Step 2: Install NVIDIA Container Toolkit
The NVIDIA Container Toolkit enables GPU access from containers. Install and verify it as follows:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
nvidia-smi
If nvidia-smi
shows your GPU and driver information, you are ready.
Step 3: Create a Docker Compose file
We will run two services: ollama (the model server) and open-webui (the frontend). The configuration below binds ports to localhost for security, so the services are not accessible from the public internet by default.
mkdir -p ~/ollama-openwebui && cd ~/ollama-openwebui
cat > docker-compose.yml << 'EOF'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "127.0.0.1:11434:11434"
volumes:
- ollama:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=8h
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
restart: unless-stopped
depends_on:
- ollama
environment:
- OLLAMA_BASE_URL=http://ollama:11434
ports:
- "127.0.0.1:3000:8080"
volumes:
- open-webui:/app/backend/data
volumes:
ollama:
open-webui:
EOF
If you do not have an NVIDIA GPU or want CPU-only, remove the deploy.resources
block in the ollama
service and optionally set OLLAMA_NUM_THREADS
in the environment.
Step 4: Start the stack
Pull the images and launch the services in the background:
docker compose pull
docker compose up -d
docker compose ps
Open WebUI will be available at http://127.0.0.1:3000
on the host. The first load may take a moment.
Step 5: Pull a model and run your first chat
Ollama manages models. Use the following commands to pull a model (for example, llama3.1
) and verify it works:
docker exec -it ollama ollama list
docker exec -it ollama ollama pull llama3.1
docker exec -it ollama ollama run llama3.1 "Write a haiku about GPUs."
Now visit Open WebUI on http://127.0.0.1:3000
. In Settings > Models, you should see the pulled model. Start chatting. If you need a smaller or faster model, try llama3.1:8b
, mistral
, or qwen2
. For larger models, ensure your GPU memory is sufficient.
Optional: Access from your LAN or the Internet
- For LAN access, change the port bindings in docker-compose.yml
from 127.0.0.1:3000:8080
to 0.0.0.0:3000:8080
and repeat for port 11434 if needed, then run docker compose up -d
.
- For internet exposure, use a reverse proxy (Nginx, Caddy, Traefik) with HTTPS (Let’s Encrypt) and keep 11434 internal. Never expose Ollama’s API publicly without authentication.
Resource tuning and tips
- Set OLLAMA_KEEP_ALIVE
to control model unload time (e.g., 8h
).
- For CPU installs, set OLLAMA_NUM_THREADS=$(nproc)
.
- Use quantized models (e.g., llama3.1:8b-instruct-q4_K_M
) if VRAM is limited.
- Persist data: models live in the ollama
volume; Open WebUI settings and chats live in the open-webui
volume.
- Backups: snapshot /var/lib/docker/volumes/<name>/_data
or use docker run --rm -v volume:/data -v $(pwd):/backup alpine tar czf /backup/volume.tgz -C / data
.
Troubleshooting
- GPU not detected: Ensure drivers are installed and nvidia-smi
works on the host. Re-run sudo nvidia-ctk runtime configure --runtime=docker
, restart Docker, and verify Compose GPU reservations are present.
- Out of memory (VRAM): Choose a smaller/quantized model. Watch container logs: docker logs -f ollama
.
- Slow performance on CPU: Reduce context window, use smaller models, and set threads to the number of CPU cores.
- Port conflicts: Adjust the host ports in the Compose file (e.g., 127.0.0.1:13000:8080
).
Updating and maintenance
Keep images fresh and stable with a simple routine:
cd ~/ollama-openwebui
docker compose pull
docker compose up -d
docker image prune -f
Model files are cached in the ollama
volume. Removing the container will not delete models unless you remove the volume explicitly.
Conclusion
You have deployed a modern, private AI chat stack powered by Ollama and Open WebUI with GPU acceleration on Ubuntu. With Docker Compose, the setup is reproducible and easy to maintain. You can now experiment with state-of-the-art open models, keep data on your hardware, and scale up or down by swapping models or hardware. If you want advanced features like multi-user support, role-based access, or external tools, explore Open WebUI’s settings and plug-ins—and enjoy your self-hosted AI assistant.
Comments
Post a Comment