Overview
This tutorial shows you how to deploy a private AI chatbot using Ollama and Open WebUI with Docker on Linux (Ubuntu 22.04/24.04). You will get a secure, local, and fast setup that can run on CPU or use your NVIDIA GPU for acceleration. We will cover installation, model downloads, persistence, updates, and security hardening.
What You Will Build
- Ollama container serving large language models (LLMs) on port 11434.
- Open WebUI container providing a modern chat interface on port 3000.
- Optional NVIDIA GPU pass-through for faster inference.
- Persistent volumes so your models and settings survive reboots and updates.
- Basic authentication and reverse proxy tips for safe remote access.
Prerequisites
- A 64-bit Linux host with Docker Engine installed. On Ubuntu, install Docker with:sudo apt update && sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
- Optional: an NVIDIA GPU with drivers installed (530+ recommended) if you want GPU acceleration.
Optional: Enable NVIDIA GPU for Docker
If you have an NVIDIA GPU, install the NVIDIA Container Toolkit so Docker can access the GPU:curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Test with:docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
Create Folders and a Dedicated Network
mkdir -p ~/ai/ollama ~/ai/openwebui
docker network create ai-net
Start the Ollama Container (CPU or GPU)
CPU-only (works everywhere):docker run -d --name ollama --restart unless-stopped \
-p 11434:11434 \
-v ~/ai/ollama:/root/.ollama \
--network ai-net \
ollama/ollama:latest
GPU-enabled (if you completed the NVIDIA step):docker run -d --name ollama --restart unless-stopped \
-p 11434:11434 \
-v ~/ai/ollama:/root/.ollama \
--gpus all \
--network ai-net \
ollama/ollama:latest
Pull a Model with Ollama
Ollama hosts many models (Llama 3, Phi-3, Mistral, Gemma, etc.). Pull one that fits your hardware. For a good balance, try Llama 3 8B:docker exec -it ollama ollama pull llama3:8b
On low-memory machines, use a smaller or quantized model (for example llama3:8b-instruct-q4_K_M
). You can list models with:docker exec -it ollama ollama list
Start Open WebUI and Connect It to Ollama
Run Open WebUI with persistent storage and authentication. Replace the admin email and password before running:docker run -d --name openwebui --restart unless-stopped \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://ollama:11434 \
-e WEBUI_AUTH=True \
-e ADMIN_EMAIL=admin@example.com \
-e ADMIN_PASSWORD='ChangeThisStrongPass!2025' \
-v ~/ai/openwebui:/app/backend/data \
--network ai-net \
ghcr.io/open-webui/open-webui:latest
Now open your browser and go to http://SERVER_IP:3000
. Log in with the admin account. In the model dropdown, select the model you pulled (for example llama3:8b
) and start chatting.
Persist and Back Up Your Data
All models and settings are stored in the bind mounts we created:
- Models and Ollama config: ~/ai/ollama
- Web interface data (users, chats): ~/ai/openwebui
To back them up, stop containers and archive the folders:docker stop openwebui ollama
tar -czf ai-backup-$(date +%F).tar.gz -C ~/ ai
docker start ollama openwebui
Update Containers and Models
To update to the latest versions safely:docker pull ollama/ollama:latest
docker pull ghcr.io/open-webui/open-webui:latest
docker stop openwebui ollama
docker rm openwebui ollama
Recreate with the same docker run
commands (volumes keep your data). To update a model:docker exec -it ollama ollama pull llama3:8b
Secure Remote Access
- Keep WEBUI_AUTH=True and use a strong admin password.
- Restrict firewall: allow only your IP and necessary ports (11434
, 3000
, or the reverse proxy port). On Ubuntu with UFW:sudo ufw allow 22/tcp
sudo ufw allow from YOUR.IP.ADDR.0/24 to any port 3000 proto tcp
sudo ufw enable
- For HTTPS, place a reverse proxy in front. Example Caddyfile (replace domain):ai.example.com {
reverse_proxy 127.0.0.1:3000
}
Caddy will auto-issue TLS certificates via Let’s Encrypt.
Performance Tips
- Prefer GPU for large models. Use smaller or quantized models on CPU-only hosts.
- Set the context window and temperature in Open WebUI for faster, more focused responses.
- Avoid swapping: ensure available RAM; 8–16 GB is reasonable for 7–8B quantized models, more for FP16 and larger models.
- Pin container CPU/RAM if needed using --cpus
and -m
flags in docker run
.
Troubleshooting
- Port already in use: change -p 3000:8080
or stop the conflicting service.
- GPU not detected: confirm nvidia-smi
works on the host, verify nvidia-ctk runtime configure
, restart Docker, and run the CUDA test container.
- Model download slow: it is normal on first pull; try a different model or check your network.
- Open WebUI cannot reach Ollama: ensure both containers are on ai-net
and OLLAMA_BASE_URL=http://ollama:11434
is set correctly.
Conclusion
With Docker, Ollama, and Open WebUI, you can run a private AI chat system that is fast, flexible, and secure. This stack supports many modern open models and can scale from a small home server to a GPU workstation. Keep your containers updated, back up the volumes, and tune the model choice to your hardware for the best experience.
Comments
Post a Comment