Deploy a Self-Hosted AI Chatbot with Ollama and Open WebUI on Docker (CPU/GPU)

If you want a fast, private, and cost-effective AI assistant without sending data to third parties, you can self-host one with Ollama and Open WebUI. Ollama runs large language models locally, while Open WebUI gives you a friendly chat interface with features like chat history, prompt templates, and model management. This guide shows how to deploy both using Docker, with optional GPU acceleration for NVIDIA or AMD.

Why this stack

Ollama simplifies running modern models such as Llama 3.1, Mistral, Phi, and more with a single command. Open WebUI connects to Ollama and adds a browser-based chat app, multiple users, and extras like RAG, files, and tools. Docker keeps everything consistent, easy to update, and portable across servers and clouds.

Prerequisites

- A 64-bit Linux host (Ubuntu 22.04/24.04 recommended), macOS, or Windows with WSL2. For production, a Linux VM or server is ideal.
- Docker Engine 24+ and Docker Compose plugin.
- 16 GB RAM minimum (24–32 GB recommended for 8B models; bigger models need more).
- 25–50 GB free disk space per model.
- Optional GPU:
  • NVIDIA: recent driver + nvidia-container-toolkit.
  • AMD: ROCm-capable GPU and kernel/drivers.

Step 1 — Install Docker and (optional) drivers

On Ubuntu, install Docker quickly:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version

If you have an NVIDIA GPU, install drivers and container toolkit, then restart Docker:

sudo apt update
sudo apt install -y nvidia-driver-535
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
nvidia-smi

Step 2 — Create a Docker Compose file

Create a project folder, then a docker-compose.yml that runs Ollama and Open WebUI. This setup persists models and app data in Docker volumes and exposes ports 11434 (Ollama) and 3000 (WebUI).

mkdir -p ~/ai-chat && cd ~/ai-chat
cat > docker-compose.yml << 'YAML'
version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    environment:
      - OLLAMA_KEEP_ALIVE=24h
    # For NVIDIA GPU support, uncomment the next line (requires nvidia-container-toolkit)
    # gpus: all

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=change_this_long_random_string
      - ENABLE_SIGNUP=true
      - DEFAULT_MODELS=llama3.1:8b-instruct
    ports:
      - "3000:8080"
    volumes:
      - openwebui:/app/backend/data

volumes:
  ollama:
  openwebui:
YAML

Step 3 — Launch the stack

Start both services in the background:

docker compose up -d
docker compose ps

Open a browser and visit http://SERVER_IP:3000. On first visit, create an admin account. In Settings, confirm the Ollama endpoint shows http://ollama:11434 and the default model list includes llama3.1:8b-instruct.

Step 4 — Pull a model

You can pull models in the WebUI, or via CLI inside the Ollama container:

docker exec -it ollama ollama pull llama3.1:8b-instruct

After the download, start chatting in Open WebUI. If the model is large or your server is low on RAM, start with a smaller one like mistral:7b-instruct or phi3:mini.

Optional — Enable NVIDIA GPU acceleration

If nvidia-smi works on the host and you installed nvidia-container-toolkit, uncomment gpus: all for the ollama service in docker-compose.yml and redeploy:

docker compose down
sed -n '1,200p' docker-compose.yml
docker compose up -d
docker logs -f ollama

When a model runs, Ollama should log CUDA usage. You can also watch GPU load with nvidia-smi.

Optional — Enable AMD GPU (ROCm)

For AMD GPUs supported by ROCm, use the ROCm image and pass GPU devices into the container. Replace the ollama service with:

  ollama:
    image: ollama/ollama:rocm
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama:/root/.ollama
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - video

Then redeploy with docker compose up -d. If you see ROCm capability errors, verify your kernel/driver versions and that your user belongs to the video group.

Secure and expose your WebUI

For public access, put a reverse proxy in front with HTTPS. Caddy makes this easy:

your-domain.example {
  reverse_proxy 127.0.0.1:3000
}

Point DNS to your server, install Caddy, and it will fetch certificates automatically. In Open WebUI, set strong passwords, disable open signup if you do not need it (ENABLE_SIGNUP=false), and consider enabling rate limits at the proxy.

Backups and updates

Your important data lives in two volumes: ollama (models) and openwebui (app data, history). To back them up:

docker compose stop
docker run --rm -v ollama:/src -v $PWD:/backup alpine tar czf /backup/ollama-vol.tgz -C /src .
docker run --rm -v openwebui:/src -v $PWD:/backup alpine tar czf /backup/openwebui-vol.tgz -C /src .
docker compose start

To update images and get the latest features:

docker compose pull
docker compose up -d

Models remain unless you explicitly remove the ollama volume.

Troubleshooting

- Open WebUI cannot connect to Ollama: ensure OLLAMA_BASE_URL points to http://ollama:11434 and both containers share the same Docker network (default in Compose).
- CUDA driver not found: confirm nvidia-smi works on the host; re-run nvidia-ctk; restart Docker; ensure gpus: all is enabled.
- AMD permissions error: check /dev/kfd and /dev/dri are present; add group_add: video; ensure your kernel/ROCm version supports your GPU.
- Out of memory or slow responses: choose a smaller model, or reduce threads and context in the model settings; increase swap as a temporary measure.
- No space left on device: models are large; prune unused images and models with docker image prune and ollama list / ollama rm.

Uninstall cleanly

Stop and remove containers and volumes (this also deletes downloaded models and chat data):

cd ~/ai-chat
docker compose down -v

You now have a private AI chatbot that runs entirely on your hardware. Expand it with more models, plug in document retrieval, or publish it behind a secure HTTPS domain for your team.

Comments