Overview
This tutorial shows how to run large language models locally using Ollama and Open WebUI with Docker on Ubuntu 22.04 or 24.04. You will get a private, fast AI chat interface in your browser with optional NVIDIA GPU acceleration. We will cover prerequisites, Docker setup, GPU configuration, a ready-to-use docker-compose.yml, updates, backups, and troubleshooting. The steps also work for CPU-only machines.
What You Will Need
Before you begin, make sure you have the following:
- Ubuntu 22.04 or 24.04 (freshly updated)
- Docker Engine and Docker Compose plugin
- Optional: NVIDIA GPU with recent drivers (e.g., 535+), CUDA-capable
- At least 16 GB RAM recommended; more VRAM helps with larger models
- 1 open TCP port for the web UI (default 3000)
Step 1: Install Docker and Compose
Install Docker from the official repository and enable it on boot. If you already have Docker, ensure it is up to date.
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
Step 2 (Optional): Enable NVIDIA GPU for Containers
If you have an NVIDIA GPU, install the NVIDIA Container Toolkit so Docker can access the GPU. First verify the GPU is detected:
nvidia-smi
Then install the container toolkit:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
If you are on a CPU-only system, skip this step. The stack will still work, just slower.
Step 3: Create the Docker Compose File
Create a working directory and a docker-compose.yml. This configuration runs two services: Ollama (model runtime) and Open WebUI (browser UI). It includes a GPU-enabled section that you can remove if you are running on CPU.
mkdir -p ~/ai-stack && cd ~/ai-stack
nano docker-compose.yml
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
environment:
- OLLAMA_KEEP_ALIVE=12h
- OLLAMA_NUM_PARALLEL=1
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"] # Remove this block on CPU-only hosts
open-webui:
image: ghcr.io/open-webui/open-webui:latest
container_name: open-webui
restart: unless-stopped
depends_on:
- ollama
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- openwebui-data:/app/backend/data
volumes:
ollama-data:
openwebui-data:
For CPU-only systems, delete the deploy.resources.reservations.devices block under the ollama service to avoid GPU scheduling errors.
Step 4: Start the Stack and Pull a Model
Launch the containers in the background:
docker compose up -d
Pull a model with Ollama. The following example downloads a compact, general-purpose model:
docker exec -it ollama ollama pull llama3.1:8b
You can list available models or search the model library at the official Ollama registry. Popular options include llama3.1:8b, mistral:7b, and neural-chat. Larger models require more RAM/VRAM.
Step 5: Access the Web Interface
Open your browser and visit http://SERVER_IP:3000 to access Open WebUI. On first launch, create an admin user. In Settings, confirm the Ollama base URL is http://ollama:11434. Choose your default model and start chatting locally.
Useful Tips
Switch or Add Models: Use the Models section in Open WebUI or run docker exec -it ollama ollama pull MODEL:TAG. You can host multiple models and select them per chat.
Performance Tuning: On GPU hosts, keep drivers current. In low-VRAM scenarios, choose quantized models (e.g., Q4_K_M variants). Adjust OLLAMA_NUM_PARALLEL and context window settings to balance speed and quality.
Storage Paths: Models are stored in the ollama-data volume; Open WebUI data lives in openwebui-data. Back up both volumes regularly.
Updating and Maintenance
To update to the latest images without losing data, pull and recreate:
docker compose pull
docker compose up -d
To back up volumes, stop the stack and export them or bind-mount to a backup path. Example quick export:
docker run --rm -v ollama-data:/data -v $(pwd):/backup alpine tar czf /backup/ollama-data.tar.gz -C /data .
docker run --rm -v openwebui-data:/data -v $(pwd):/backup alpine tar czf /backup/openwebui-data.tar.gz -C /data .
Security Considerations
Do not expose port 3000 or 11434 directly to the internet. If remote access is required, use a reverse proxy (Caddy, Nginx, or Traefik) with HTTPS and authentication, or place the service behind a VPN like WireGuard or Tailscale. Limit container memory/CPU if sharing the host.
Troubleshooting
Permission denied on Docker: Run newgrp docker or log out/in after adding your user to the docker group.
GPU not detected in container: Ensure nvidia-smi works on the host, the NVIDIA Container Toolkit is installed, and you did not remove the GPU reservation block in compose. Restart Docker after changes.
Port already in use: Change ports in docker-compose.yml (e.g., 3001:8080) and recreate the stack.
Models fail to load due to memory: Choose smaller or quantized models, reduce context length, or add swap on the host.
Uninstall or Remove
To stop and remove the stack while keeping volumes:
docker compose down
To remove everything including data volumes:
docker compose down -v
Conclusion
With Docker, Ollama, and Open WebUI, you can run private AI models on your own hardware in minutes. This setup scales from a simple laptop to a GPU workstation and is easy to update and back up. Start with a lightweight model, then experiment with larger options as your resources allow.
Comments
Post a Comment