Install Open WebUI and Ollama with GPU: Run Local LLMs on Windows and Linux Using Docker

Overview

Want to run modern large language models (LLMs) like Llama 3 locally, with a clean web interface and optional GPU acceleration? This tutorial shows how to deploy Ollama (model runtime) together with Open WebUI (browser UI) using Docker on Windows or Linux. You will get a stable setup that is easy to update, secure by default, and fast on NVIDIA or AMD GPUs. No cloud required.

Prerequisites

- Windows 10/11 (with WSL2) or any recent Linux distribution.
- Docker Desktop on Windows, or Docker Engine on Linux.
- At least 16 GB RAM recommended; SSD storage preferred.
- Optional GPU acceleration: NVIDIA (CUDA) or AMD (ROCm on Linux). CPU-only also works, just slower.

Step 1 — Install Docker

Windows: Install Docker Desktop, enable WSL2, and turn on “Use the WSL 2 based engine.” In Settings → Resources → WSL Integration, enable your Linux distro. If you have an NVIDIA GPU, install the latest NVIDIA driver; Docker Desktop uses WSL2 GPU automatically.

Linux: Install Docker Engine from your distro’s repository or Docker’s official repo. Add your user to the docker group, then log out and back in. Verify with:
docker version

Step 2 — Prepare GPU Support (Optional)

NVIDIA on Windows: Update the NVIDIA driver. Docker Desktop with WSL2 will expose the GPU automatically to containers that request it.

NVIDIA on Linux: Install the NVIDIA driver and the NVIDIA Container Toolkit. Verify with:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu20.04 nvidia-smi

AMD on Linux (ROCm): Install ROCm per your distro and ensure /dev/kfd and /dev/dri are present. AMD GPU acceleration is supported with the rocm-tagged Ollama image.

Step 3 — Create a Docker Compose file

Create a project folder (for example, C:\llm or ~/llm) and in it create a file named docker-compose.yml. Choose the variant that fits your hardware. All versions map Open WebUI to localhost only for security.

CPU-only (works everywhere):
version: "3.9"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
  openwebui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: openwebui
    environment:
      - OLLAMA_API_BASE=http://ollama:11434
    depends_on:
      - ollama
    ports:
      - "127.0.0.1:3000:8080"
volumes:
  ollama:

NVIDIA GPU (Windows or Linux):
version: "3.9"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    gpus: all
    environment:
      - OLLAMA_KEEP_ALIVE=24h
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
  openwebui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: openwebui
    environment:
      - OLLAMA_API_BASE=http://ollama:11434
    depends_on:
      - ollama
    ports:
      - "127.0.0.1:3000:8080"
volumes:
  ollama:

AMD GPU on Linux (ROCm):
version: "3.9"
services:
  ollama:
    image: ollama/ollama:rocm
    container_name: ollama      - "/dev/kfd:/dev/kfd"
      - "/dev/dri:/dev/dri"
    group_add:
      - "video"
    ipc: host
    security_opt:
      - seccomp=unconfined
    cap_add:
      - SYS_PTRACE
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
  openwebui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: openwebui
    environment:
      - OLLAMA_API_BASE=http://ollama:11434
    depends_on:
      - ollama
    ports:
      - "127.0.0.1:3000:8080"
volumes:
  ollama:

Step 4 — Start the stack

In the project folder, run:
docker compose up -d
This pulls the images and starts both containers. Open WebUI will be available at http://127.0.0.1:3000 and Ollama’s API at http://localhost:11434.

Step 5 — Download a model

Use the Web UI to add a model, or pull one via CLI. For example, to pull Llama 3.1 8B:
docker exec -it ollama ollama pull llama3.1:8b
Then test it:
docker exec -it ollama ollama run llama3.1:8b "Say hello in one sentence."

Step 6 — First login and basic security

Open http://127.0.0.1:3000 in your browser. Create your account and log in. By default, this guide binds the UI to localhost, so it is not exposed to your network. If you need remote access, publish through a reverse proxy with HTTPS or a zero-trust tunnel, and enable authentication in Open WebUI. Keep your Docker host patched and restrict ports with a firewall.

Updating and Maintenance

- Update to the latest images:
docker compose pull && docker compose up -d
- List installed models:
docker exec -it ollama ollama list
- Remove unused models to free space:
docker exec -it ollama ollama rm model-name

Troubleshooting

- GPU not detected: for NVIDIA, run docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu20.04 nvidia-smi. If that fails, update the driver or NVIDIA Container Toolkit. For AMD, ensure /dev/kfd and /dev/dri are present and you used the rocm image variant.
- Slow performance: confirm you pulled a quantized model (e.g., Q4_K_M) or enable GPU. Increase RAM swap if you run out of memory.
- Ports in use: change the host ports in the compose file (e.g., 127.0.0.1:4000:8080 for the UI).
- Logs: check issues with docker compose logs -f ollama and docker compose logs -f openwebui.

Uninstall (Optional)

To stop and remove containers, run:
docker compose down
To remove models and data, also remove the volume:
docker volume rm llm_ollama (adjust name with docker volume ls)

What you achieved

You now have a local, private, and fast LLM environment with a friendly web UI. Thanks to Docker, the stack is reproducible and easy to update. With GPU acceleration, even 7B–13B models become highly responsive for chat, coding help, and offline experimentation—without sending your data to the cloud.

Comments