Overview
Want to run modern large language models (LLMs) like Llama 3 locally, with a clean web interface and optional GPU acceleration? This tutorial shows how to deploy Ollama (model runtime) together with Open WebUI (browser UI) using Docker on Windows or Linux. You will get a stable setup that is easy to update, secure by default, and fast on NVIDIA or AMD GPUs. No cloud required.
Prerequisites
- Windows 10/11 (with WSL2) or any recent Linux distribution.
- Docker Desktop on Windows, or Docker Engine on Linux.
- At least 16 GB RAM recommended; SSD storage preferred.
- Optional GPU acceleration: NVIDIA (CUDA) or AMD (ROCm on Linux). CPU-only also works, just slower.
Step 1 — Install Docker
Windows: Install Docker Desktop, enable WSL2, and turn on “Use the WSL 2 based engine.” In Settings → Resources → WSL Integration, enable your Linux distro. If you have an NVIDIA GPU, install the latest NVIDIA driver; Docker Desktop uses WSL2 GPU automatically.
Linux: Install Docker Engine from your distro’s repository or Docker’s official repo. Add your user to the docker group, then log out and back in. Verify with:
docker version
Step 2 — Prepare GPU Support (Optional)
NVIDIA on Windows: Update the NVIDIA driver. Docker Desktop with WSL2 will expose the GPU automatically to containers that request it.
NVIDIA on Linux: Install the NVIDIA driver and the NVIDIA Container Toolkit. Verify with:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu20.04 nvidia-smi
AMD on Linux (ROCm): Install ROCm per your distro and ensure /dev/kfd and /dev/dri are present. AMD GPU acceleration is supported with the rocm-tagged Ollama image.
Step 3 — Create a Docker Compose file
Create a project folder (for example, C:\llm or ~/llm) and in it create a file named docker-compose.yml. Choose the variant that fits your hardware. All versions map Open WebUI to localhost only for security.
CPU-only (works everywhere):
version: "3.9"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama:/root/.ollama
ports:
- "11434:11434"
openwebui:
image: ghcr.io/open-webui/open-webui:latest
container_name: openwebui
environment:
- OLLAMA_API_BASE=http://ollama:11434
depends_on:
- ollama
ports:
- "127.0.0.1:3000:8080"
volumes:
ollama:
NVIDIA GPU (Windows or Linux):
version: "3.9"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
gpus: all
environment:
- OLLAMA_KEEP_ALIVE=24h
volumes:
- ollama:/root/.ollama
ports:
- "11434:11434"
openwebui:
image: ghcr.io/open-webui/open-webui:latest
container_name: openwebui
environment:
- OLLAMA_API_BASE=http://ollama:11434
depends_on:
- ollama
ports:
- "127.0.0.1:3000:8080"
volumes:
ollama:
AMD GPU on Linux (ROCm):
version: "3.9"
services:
ollama:
image: ollama/ollama:rocm
container_name: ollama
- "/dev/kfd:/dev/kfd"
- "/dev/dri:/dev/dri"
group_add:
- "video"
ipc: host
security_opt:
- seccomp=unconfined
cap_add:
- SYS_PTRACE
volumes:
- ollama:/root/.ollama
ports:
- "11434:11434"
openwebui:
image: ghcr.io/open-webui/open-webui:latest
container_name: openwebui
environment:
- OLLAMA_API_BASE=http://ollama:11434
depends_on:
- ollama
ports:
- "127.0.0.1:3000:8080"
volumes:
ollama:
Step 4 — Start the stack
In the project folder, run:
docker compose up -d
This pulls the images and starts both containers. Open WebUI will be available at http://127.0.0.1:3000 and Ollama’s API at http://localhost:11434.
Step 5 — Download a model
Use the Web UI to add a model, or pull one via CLI. For example, to pull Llama 3.1 8B:
docker exec -it ollama ollama pull llama3.1:8b
Then test it:
docker exec -it ollama ollama run llama3.1:8b "Say hello in one sentence."
Step 6 — First login and basic security
Open http://127.0.0.1:3000 in your browser. Create your account and log in. By default, this guide binds the UI to localhost, so it is not exposed to your network. If you need remote access, publish through a reverse proxy with HTTPS or a zero-trust tunnel, and enable authentication in Open WebUI. Keep your Docker host patched and restrict ports with a firewall.
Updating and Maintenance
- Update to the latest images:
docker compose pull && docker compose up -d
- List installed models:
docker exec -it ollama ollama list
- Remove unused models to free space:
docker exec -it ollama ollama rm model-name
Troubleshooting
- GPU not detected: for NVIDIA, run docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu20.04 nvidia-smi. If that fails, update the driver or NVIDIA Container Toolkit. For AMD, ensure /dev/kfd and /dev/dri are present and you used the rocm image variant.
- Slow performance: confirm you pulled a quantized model (e.g., Q4_K_M) or enable GPU. Increase RAM swap if you run out of memory.
- Ports in use: change the host ports in the compose file (e.g., 127.0.0.1:4000:8080 for the UI).
- Logs: check issues with docker compose logs -f ollama and docker compose logs -f openwebui.
Uninstall (Optional)
To stop and remove containers, run:
docker compose down
To remove models and data, also remove the volume:
docker volume rm llm_ollama (adjust name with docker volume ls)
What you achieved
You now have a local, private, and fast LLM environment with a friendly web UI. Thanks to Docker, the stack is reproducible and easy to update. With GPU acceleration, even 7B–13B models become highly responsive for chat, coding help, and offline experimentation—without sending your data to the cloud.
Comments
Post a Comment