Run Local AI Chat: Install Ollama and Open WebUI with Docker (GPU/CPU) on Ubuntu 22.04/24.04

Overview

This tutorial shows how to run large language models locally using Ollama and Open WebUI with Docker on Ubuntu 22.04 or 24.04. You will get a private, fast AI chat interface in your browser with optional NVIDIA GPU acceleration. We will cover prerequisites, Docker setup, GPU configuration, a ready-to-use docker-compose.yml, updates, backups, and troubleshooting. The steps also work for CPU-only machines.

What You Will Need

Before you begin, make sure you have the following:

Ubuntu 22.04 or 24.04 (freshly updated)
Docker Engine and Docker Compose plugin
Optional: NVIDIA GPU with recent drivers (e.g., 535+), CUDA-capable
At least 16 GB RAM recommended; more VRAM helps with larger models
1 open TCP port for the web UI (default 3000)

Step 1: Install Docker and Compose

Install Docker from the official repository and enable it on boot. If you already have Docker, ensure it is up to date.

sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

Step 2 (Optional): Enable NVIDIA GPU for Containers

If you have an NVIDIA GPU, install the NVIDIA Container Toolkit so Docker can access the GPU. First verify the GPU is detected:

nvidia-smi

Then install the container toolkit:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

If you are on a CPU-only system, skip this step. The stack will still work, just slower.

Step 3: Create the Docker Compose File

Create a working directory and a docker-compose.yml. This configuration runs two services: Ollama (model runtime) and Open WebUI (browser UI). It includes a GPU-enabled section that you can remove if you are running on CPU.

mkdir -p ~/ai-stack && cd ~/ai-stack
nano docker-compose.yml

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama
    environment:
      - OLLAMA_KEEP_ALIVE=12h
      - OLLAMA_NUM_PARALLEL=1
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: ["gpu"]   # Remove this block on CPU-only hosts

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - openwebui-data:/app/backend/data

volumes:
  ollama-data:
  openwebui-data:

For CPU-only systems, delete the deploy.resources.reservations.devices block under the ollama service to avoid GPU scheduling errors.

Step 4: Start the Stack and Pull a Model

Launch the containers in the background:

docker compose up -d

Pull a model with Ollama. The following example downloads a compact, general-purpose model:

docker exec -it ollama ollama pull llama3.1:8b

You can list available models or search the model library at the official Ollama registry. Popular options include llama3.1:8b, mistral:7b, and neural-chat. Larger models require more RAM/VRAM.

Step 5: Access the Web Interface

Open your browser and visit http://SERVER_IP:3000 to access Open WebUI. On first launch, create an admin user. In Settings, confirm the Ollama base URL is http://ollama:11434. Choose your default model and start chatting locally.

Useful Tips

Switch or Add Models: Use the Models section in Open WebUI or run docker exec -it ollama ollama pull MODEL:TAG. You can host multiple models and select them per chat.

Performance Tuning: On GPU hosts, keep drivers current. In low-VRAM scenarios, choose quantized models (e.g., Q4_K_M variants). Adjust OLLAMA_NUM_PARALLEL and context window settings to balance speed and quality.

Storage Paths: Models are stored in the ollama-data volume; Open WebUI data lives in openwebui-data. Back up both volumes regularly.

Updating and Maintenance

To update to the latest images without losing data, pull and recreate:

docker compose pull
docker compose up -d

To back up volumes, stop the stack and export them or bind-mount to a backup path. Example quick export:

docker run --rm -v ollama-data:/data -v $(pwd):/backup alpine tar czf /backup/ollama-data.tar.gz -C /data .
docker run --rm -v openwebui-data:/data -v $(pwd):/backup alpine tar czf /backup/openwebui-data.tar.gz -C /data .

Security Considerations

Do not expose port 3000 or 11434 directly to the internet. If remote access is required, use a reverse proxy (Caddy, Nginx, or Traefik) with HTTPS and authentication, or place the service behind a VPN like WireGuard or Tailscale. Limit container memory/CPU if sharing the host.

Troubleshooting

Permission denied on Docker: Run newgrp docker or log out/in after adding your user to the docker group.

GPU not detected in container: Ensure nvidia-smi works on the host, the NVIDIA Container Toolkit is installed, and you did not remove the GPU reservation block in compose. Restart Docker after changes.

Port already in use: Change ports in docker-compose.yml (e.g., 3001:8080) and recreate the stack.

Models fail to load due to memory: Choose smaller or quantized models, reduce context length, or add swap on the host.

Uninstall or Remove

To stop and remove the stack while keeping volumes:

docker compose down

To remove everything including data volumes:

docker compose down -v

Conclusion

With Docker, Ollama, and Open WebUI, you can run private AI models on your own hardware in minutes. This setup scales from a simple laptop to a GPU workstation and is easy to update and back up. Start with a lightweight model, then experiment with larger options as your resources allow.

LifeBytes Journal

Search This Blog