Overview
This guide shows you how to self-host a private ChatGPT-like interface on Ubuntu 24.04 using Ollama and Open WebUI. You will deploy both apps with Docker, enable optional NVIDIA GPU acceleration, and connect them securely. The result is a fast, local AI stack that can run popular open-source models like Llama 3 and Mistral without sending your data to the cloud.
Prerequisites
- A server or VM running Ubuntu 24.04 LTS with a sudo user
- Stable internet connection and at least 10 GB free disk space (more is better for models)
- Optional: NVIDIA GPU (Turing or newer recommended), proprietary drivers installed, and CUDA-capable
1) Install Docker and prepare the host
Update the system and install Docker from the official repository for best stability and performance:
sudo apt update && sudo apt upgrade -y
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release; echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
2) Optional: Enable NVIDIA GPU support for Docker
If you have an NVIDIA GPU, install the proprietary driver and the NVIDIA Container Toolkit to let containers access the GPU.
Install driver (if not already):sudo ubuntu-drivers install
sudo reboot
Verify after reboot:nvidia-smi
should show your GPU.
Install NVIDIA Container Toolkit:curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
3) Create persistent volumes
Create directories for model files and the Open WebUI database so your data survives container updates:
mkdir -p ~/ollama-data ~/openwebui-data
4) Run the Ollama container (CPU or GPU)
Ollama serves models over an HTTP API on port 11434. Use one of the following commands:
CPU-only:docker run -d --name ollama --restart unless-stopped -p 11434:11434 -v ~/ollama-data:/root/.ollama ollama/ollama:latest
GPU-enabled:docker run -d --name ollama --gpus all --restart unless-stopped -p 11434:11434 -v ~/ollama-data:/root/.ollama ollama/ollama:latest
Pull a model to test the setup (choose one):
docker exec -it ollama ollama pull llama3.1:8b
docker exec -it ollama ollama pull mistral:7b
docker exec -it ollama ollama pull phi3:mini
5) Deploy Open WebUI and connect it to Ollama
Open WebUI is a fast, modern interface that talks to Ollama via API. Put them on the same Docker network so the UI can resolve the Ollama container by name.
docker network create ai
docker network connect ai ollama
docker run -d --name openwebui --network ai --restart unless-stopped -p 3000:8080 -v ~/openwebui-data:/app/backend/data -e OLLAMA_BASE_URL=http://ollama:11434 ghcr.io/open-webui/open-webui:latest
Open your browser and navigate to http://SERVER_IP:3000
. Create the first admin user when prompted. In Settings, choose the default model you pulled earlier (for example, llama3.1:8b
), then start chatting.
6) Optional: HTTPS with a free certificate (Caddy)
If the server is reachable from the internet with a DNS name, you can place Caddy in front to get automatic HTTPS via Let’s Encrypt.
Create a simple Caddyfile:mkdir -p ~/caddy && nano ~/caddy/Caddyfile
Example Caddyfile (replace ai.example.com
with your domain):
ai.example.com {
reverse_proxy 127.0.0.1:3000
}
Run Caddy:
docker run -d --name caddy --restart unless-stopped -p 80:80 -p 443:443 -v ~/caddy/Caddyfile:/etc/caddy/Caddyfile -v caddy_data:/data -v caddy_config:/config caddy:latest
7) Updating, backups, and management
Update containers:docker pull ollama/ollama:latest
docker pull ghcr.io/open-webui/open-webui:latest
docker stop openwebui ollama && docker rm openwebui ollama
# Re-run the same docker run commands used earlier
Logs and health:docker logs -f ollama
docker logs -f openwebui
Backups: the important data lives in ~/ollama-data
(models, manifests) and ~/openwebui-data
(users, settings, chats). Back up these folders with your usual tool (rsync, restic, borg, etc.).
Troubleshooting
- If GPU is not used: verify with docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
. If it fails, confirm the driver, toolkit, and that Docker was restarted after nvidia-ctk
configuration.
- If Open WebUI cannot see models: ensure both containers share the same network (ai
) and that OLLAMA_BASE_URL=http://ollama:11434
is set.
- If port conflicts occur: change the host ports (for example, -p 3001:8080
) or stop the service using the port.
What you built
You now have a secure, private, and fast AI stack on Ubuntu 24.04: Ollama handles model execution with optional GPU acceleration, and Open WebUI provides a clean, multi-user chat interface. This setup is easy to maintain with Docker, simple to back up, and flexible enough to add new models or scale to stronger GPUs later.
Comments
Post a Comment