Overview
This step-by-step guide shows you how to self-host a modern AI chat interface by combining Ollama (for running local large language models) with Open WebUI (a friendly web front end). We will deploy everything on Ubuntu 22.04 using Docker, enable optional NVIDIA GPU acceleration, and secure access with HTTPS via Caddy. The result is a fast, private, and maintainable AI setup for your lab, team, or home server.
Prerequisites
You will need: (1) An Ubuntu 22.04+ 64-bit server with at least 8 GB RAM; (2) Optional NVIDIA GPU for acceleration; (3) A domain name pointing to your server’s public IP if you want HTTPS; (4) A sudo-enabled user; (5) Basic firewall access to ports 22, 80, and 443.
Update the system
Run the following to update packages:
sudo apt-get update && sudo apt-get -y upgrade
Install Docker (and let your user run it)
Install Docker using the official convenience script:
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
Enable NVIDIA GPU support (optional but recommended)
If your server has an NVIDIA GPU, install drivers and the container toolkit so Docker can access the GPU:
sudo ubuntu-drivers autoinstall
sudo reboot
After reboot, verify:
nvidia-smi
Install the NVIDIA container toolkit and wire it to Docker:
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Test GPU passthrough (optional):
docker run --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi
Create a dedicated Docker network
A user-defined network makes service-to-service communication simpler:
docker network create ai
Run Ollama (the local model runtime)
Start Ollama as a background service and persist its model data in a named volume:
docker volume create ollama
docker run -d --name ollama --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama ollama/ollama:latest
If you have a GPU, add --gpus all:
docker run -d --name ollama --restart unless-stopped --network ai -p 11434:11434 -v ollama:/root/.ollama --gpus all ollama/ollama:latest
Pull at least one model (examples include llama3.1, mistral, qwen2, phi3). For a balanced start, try an 8B parameter model:
docker exec -it ollama ollama pull llama3.1:8b
Run Open WebUI (the chat interface)
Deploy Open WebUI and link it to the Ollama API URL across the same Docker network:
docker volume create open-webui
docker run -d --name open-webui --restart unless-stopped --network ai -p 3000:8080 -v open-webui:/app/backend/data -e OLLAMA_API_BASE_URL=http://ollama:11434 open-webui/open-webui:latest
Open a browser to http://SERVER_IP:3000 to complete the initial setup. Create the first admin user, then in Settings disable open signups if this is a private deployment.
Add HTTPS with Caddy (automatic certificates)
Caddy can obtain and renew Let’s Encrypt certificates for you. Create a simple Caddyfile in your home directory with this content (replace yourdomain.com):
yourdomain.com {
reverse_proxy 127.0.0.1:3000
}
Run Caddy in Docker and bind ports 80/443:
docker volume create caddy-data
docker volume create caddy-config
docker run -d --name caddy --restart unless-stopped -p 80:80 -p 443:443 -v $PWD/Caddyfile:/etc/caddy/Caddyfile -v caddy-data:/data -v caddy-config:/config caddy:latest
Point your domain’s DNS A record to the server’s IP, wait for propagation, and then visit https://yourdomain.com to use Open WebUI securely.
Useful Open WebUI and Ollama tips
Inside Open WebUI, go to Models and set your default model to the one you pulled. You can pull more models anytime with:
docker exec -it ollama ollama pull mistral:7b
docker exec -it ollama ollama pull qwen2:7b
docker exec -it ollama ollama pull phi3:mini
For faster responses, enable GPU quantized models (e.g., Q4_K_M). On low-RAM VPS, pick smaller models like phi3:mini or llama3.1:8b-instruct with 4-bit quantization.
Update and maintenance
To update containers without losing data:
docker pull ollama/ollama:latest
docker pull open-webui/open-webui:latest
docker pull caddy:latest
docker stop open-webui ollama caddy
docker rm open-webui ollama caddy
Repeat the docker run commands from earlier to recreate; volumes preserve your data and models.
To back up important data:
docker run --rm -v ollama:/data -v $PWD:/backup alpine tar czf /backup/ollama-backup.tgz -C / data
docker run --rm -v open-webui:/data -v $PWD:/backup alpine tar czf /backup/openwebui-backup.tgz -C / data
Firewall and security tips
If using UFW, allow only needed ports:
sudo ufw allow 22/tcp
sudo ufw allow 80,443/tcp
sudo ufw enable
Harden Open WebUI by turning off public signups, using strong admin passwords, and placing the service behind HTTPS. For additional isolation, restrict Open WebUI to listen only on localhost and expose it solely via Caddy (default Docker run above binds to 0.0.0.0; you can change -p 3000:8080 to -p 127.0.0.1:3000:8080).
Troubleshooting
If Open WebUI shows “Cannot reach Ollama,” verify the network and base URL:
docker logs open-webui
docker logs ollama
docker exec -it open-webui wget -qO- http://ollama:11434/api/tags
If GPU is not detected, confirm drivers and toolkit:
nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi
If that fails, re-run:
sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker
Certificate issues? Ensure port 80/443 are reachable from the internet and that your DNS A record is correct. Check Caddy logs:
docker logs caddy
What you built
You now have a private AI chat platform that runs on your hardware, speaks to high-quality local models via Ollama, provides a clean web interface with Open WebUI, and is secured with HTTPS. This stack is simple to update, performs well with GPUs, and is flexible enough to scale with new models and plugins as your needs evolve.
Comments
Post a Comment