Deploy Ollama and Open WebUI with NVIDIA GPU on Ubuntu using Docker Compose (2025 Guide)

Running a private, GPU-accelerated AI assistant is now easier than ever. In this step-by-step guide, you will deploy Ollama (for running LLMs locally) and Open WebUI (a clean browser interface) on Ubuntu 22.04/24.04 using Docker Compose with NVIDIA GPU support. This setup is fast, reproducible, and ideal for teams or power users who want a self-hosted ChatGPT-like experience with full control.

Prerequisites

Before you begin, you will need:

- An Ubuntu 22.04 or 24.04 server or workstation.
- An NVIDIA GPU with recent drivers (e.g., RTX 20/30/40 series or A-series).
- Docker Engine, Docker Compose (v2), and the NVIDIA Container Toolkit.
- At least 20 GB free disk space and adequate RAM/VRAM (8–24 GB VRAM recommended for larger models).

Step 1: Install Docker Engine and Compose

Install Docker using the official repository for reliability and updates:

sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version

Step 2: Install NVIDIA Container Toolkit

The NVIDIA Container Toolkit enables GPU access from containers. Install and verify it as follows:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
nvidia-smi

If nvidia-smi shows your GPU and driver information, you are ready.

Step 3: Create a Docker Compose file

We will run two services: ollama (the model server) and open-webui (the frontend). The configuration below binds ports to localhost for security, so the services are not accessible from the public internet by default.

mkdir -p ~/ollama-openwebui && cd ~/ollama-openwebui
cat > docker-compose.yml << 'EOF'
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "127.0.0.1:11434:11434"
    volumes:
      - ollama:/root/.ollama
    environment:
      - OLLAMA_KEEP_ALIVE=8h
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    restart: unless-stopped
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    ports:
      - "127.0.0.1:3000:8080"
    volumes:
      - open-webui:/app/backend/data

volumes:
  ollama:
  open-webui:
EOF

If you do not have an NVIDIA GPU or want CPU-only, remove the deploy.resources block in the ollama service and optionally set OLLAMA_NUM_THREADS in the environment.

Step 4: Start the stack

Pull the images and launch the services in the background:

docker compose pull
docker compose up -d
docker compose ps

Open WebUI will be available at http://127.0.0.1:3000 on the host. The first load may take a moment.

Step 5: Pull a model and run your first chat

Ollama manages models. Use the following commands to pull a model (for example, llama3.1) and verify it works:

docker exec -it ollama ollama list
docker exec -it ollama ollama pull llama3.1
docker exec -it ollama ollama run llama3.1 "Write a haiku about GPUs."

Now visit Open WebUI on http://127.0.0.1:3000. In Settings > Models, you should see the pulled model. Start chatting. If you need a smaller or faster model, try llama3.1:8b, mistral, or qwen2. For larger models, ensure your GPU memory is sufficient.

Optional: Access from your LAN or the Internet

- For LAN access, change the port bindings in docker-compose.yml from 127.0.0.1:3000:8080 to 0.0.0.0:3000:8080 and repeat for port 11434 if needed, then run docker compose up -d.
- For internet exposure, use a reverse proxy (Nginx, Caddy, Traefik) with HTTPS (Let’s Encrypt) and keep 11434 internal. Never expose Ollama’s API publicly without authentication.

Resource tuning and tips

- Set OLLAMA_KEEP_ALIVE to control model unload time (e.g., 8h).
- For CPU installs, set OLLAMA_NUM_THREADS=$(nproc).
- Use quantized models (e.g., llama3.1:8b-instruct-q4_K_M) if VRAM is limited.
- Persist data: models live in the ollama volume; Open WebUI settings and chats live in the open-webui volume.
- Backups: snapshot /var/lib/docker/volumes/<name>/_data or use docker run --rm -v volume:/data -v $(pwd):/backup alpine tar czf /backup/volume.tgz -C / data.

Troubleshooting

- GPU not detected: Ensure drivers are installed and nvidia-smi works on the host. Re-run sudo nvidia-ctk runtime configure --runtime=docker, restart Docker, and verify Compose GPU reservations are present.
- Out of memory (VRAM): Choose a smaller/quantized model. Watch container logs: docker logs -f ollama.
- Slow performance on CPU: Reduce context window, use smaller models, and set threads to the number of CPU cores.
- Port conflicts: Adjust the host ports in the Compose file (e.g., 127.0.0.1:13000:8080).

Updating and maintenance

Keep images fresh and stable with a simple routine:

cd ~/ollama-openwebui
docker compose pull
docker compose up -d
docker image prune -f

Model files are cached in the ollama volume. Removing the container will not delete models unless you remove the volume explicitly.

Conclusion

You have deployed a modern, private AI chat stack powered by Ollama and Open WebUI with GPU acceleration on Ubuntu. With Docker Compose, the setup is reproducible and easy to maintain. You can now experiment with state-of-the-art open models, keep data on your hardware, and scale up or down by swapping models or hardware. If you want advanced features like multi-user support, role-based access, or external tools, explore Open WebUI’s settings and plug-ins—and enjoy your self-hosted AI assistant.

Comments