Run a Private AI Chat with Ollama and Open WebUI on Docker (CPU/GPU): Step-by-Step Guide

Overview

This tutorial shows you how to deploy a private AI chatbot using Ollama and Open WebUI with Docker on Linux (Ubuntu 22.04/24.04). You will get a secure, local, and fast setup that can run on CPU or use your NVIDIA GPU for acceleration. We will cover installation, model downloads, persistence, updates, and security hardening.

What You Will Build

- Ollama container serving large language models (LLMs) on port 11434.
- Open WebUI container providing a modern chat interface on port 3000.
- Optional NVIDIA GPU pass-through for faster inference.
- Persistent volumes so your models and settings survive reboots and updates.
- Basic authentication and reverse proxy tips for safe remote access.

Prerequisites

- A 64-bit Linux host with Docker Engine installed. On Ubuntu, install Docker with:
sudo apt update && sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
- Optional: an NVIDIA GPU with drivers installed (530+ recommended) if you want GPU acceleration.

Optional: Enable NVIDIA GPU for Docker

If you have an NVIDIA GPU, install the NVIDIA Container Toolkit so Docker can access the GPU:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Test with:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

Create Folders and a Dedicated Network

mkdir -p ~/ai/ollama ~/ai/openwebui
docker network create ai-net

Start the Ollama Container (CPU or GPU)

CPU-only (works everywhere):
docker run -d --name ollama --restart unless-stopped \
-p 11434:11434 \
-v ~/ai/ollama:/root/.ollama \
--network ai-net \
ollama/ollama:latest

GPU-enabled (if you completed the NVIDIA step):
docker run -d --name ollama --restart unless-stopped \
-p 11434:11434 \
-v ~/ai/ollama:/root/.ollama \
--gpus all \
--network ai-net \
ollama/ollama:latest

Pull a Model with Ollama

Ollama hosts many models (Llama 3, Phi-3, Mistral, Gemma, etc.). Pull one that fits your hardware. For a good balance, try Llama 3 8B:
docker exec -it ollama ollama pull llama3:8b
On low-memory machines, use a smaller or quantized model (for example llama3:8b-instruct-q4_K_M). You can list models with:
docker exec -it ollama ollama list

Start Open WebUI and Connect It to Ollama

Run Open WebUI with persistent storage and authentication. Replace the admin email and password before running:
docker run -d --name openwebui --restart unless-stopped \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://ollama:11434 \
-e WEBUI_AUTH=True \
-e ADMIN_EMAIL=admin@example.com \
-e ADMIN_PASSWORD='ChangeThisStrongPass!2025' \
-v ~/ai/openwebui:/app/backend/data \
--network ai-net \
ghcr.io/open-webui/open-webui:latest

Now open your browser and go to http://SERVER_IP:3000. Log in with the admin account. In the model dropdown, select the model you pulled (for example llama3:8b) and start chatting.

Persist and Back Up Your Data

All models and settings are stored in the bind mounts we created:
- Models and Ollama config: ~/ai/ollama
- Web interface data (users, chats): ~/ai/openwebui
To back them up, stop containers and archive the folders:
docker stop openwebui ollama
tar -czf ai-backup-$(date +%F).tar.gz -C ~/ ai
docker start ollama openwebui

Update Containers and Models

To update to the latest versions safely:
docker pull ollama/ollama:latest
docker pull ghcr.io/open-webui/open-webui:latest
docker stop openwebui ollama
docker rm openwebui ollama
Recreate with the same docker run commands (volumes keep your data). To update a model:
docker exec -it ollama ollama pull llama3:8b

Secure Remote Access

- Keep WEBUI_AUTH=True and use a strong admin password.
- Restrict firewall: allow only your IP and necessary ports (11434, 3000, or the reverse proxy port). On Ubuntu with UFW:
sudo ufw allow 22/tcp
sudo ufw allow from YOUR.IP.ADDR.0/24 to any port 3000 proto tcp
sudo ufw enable
- For HTTPS, place a reverse proxy in front. Example Caddyfile (replace domain):
ai.example.com {
reverse_proxy 127.0.0.1:3000
}
Caddy will auto-issue TLS certificates via Let’s Encrypt.

Performance Tips

- Prefer GPU for large models. Use smaller or quantized models on CPU-only hosts.
- Set the context window and temperature in Open WebUI for faster, more focused responses.
- Avoid swapping: ensure available RAM; 8–16 GB is reasonable for 7–8B quantized models, more for FP16 and larger models.
- Pin container CPU/RAM if needed using --cpus and -m flags in docker run.

Troubleshooting

- Port already in use: change -p 3000:8080 or stop the conflicting service.
- GPU not detected: confirm nvidia-smi works on the host, verify nvidia-ctk runtime configure, restart Docker, and run the CUDA test container.
- Model download slow: it is normal on first pull; try a different model or check your network.
- Open WebUI cannot reach Ollama: ensure both containers are on ai-net and OLLAMA_BASE_URL=http://ollama:11434 is set correctly.

Conclusion

With Docker, Ollama, and Open WebUI, you can run a private AI chat system that is fast, flexible, and secure. This stack supports many modern open models and can scale from a small home server to a GPU workstation. Keep your containers updated, back up the volumes, and tune the model choice to your hardware for the best experience.

LifeBytes Journal

Search This Blog