Run Local LLMs with Ollama and Open WebUI on Docker (GPU-Ready Guide for Ubuntu 22.04/24.04)

Overview

Running large language models locally is easier than ever thanks to Ollama and Open WebUI. This tutorial shows you how to deploy both on Docker with optional NVIDIA GPU acceleration on Ubuntu 22.04/24.04. You will get a clean, repeatable setup, persistent storage, and a modern web interface to chat with models like Llama 3.1, Gemma, and Phi-3. If you do not have a GPU, you can still run smaller models on CPU.

What You Will Build

We will launch two containers: one for Ollama (the model runtime and API) and one for Open WebUI (the interface). They will be connected on a Docker network, with volumes for persistence. When done, you can open your browser, select a model, and start chatting locally—no cloud required.

Prerequisites

- Ubuntu 22.04 or 24.04, sudo access, and a stable internet connection.

- Docker installed (we will cover a quick install).

- Optional NVIDIA GPU with drivers for acceleration. CPU-only instructions are included.

1) Install Docker (if not installed)

Install the latest Docker CE from the official repo for better compatibility and security updates.

sudo apt update
sudo apt install -y ca-certificates curl gnupg lsb-release
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker

2) Enable NVIDIA GPU Support (optional but recommended)

If you have an NVIDIA GPU, install the proprietary driver and NVIDIA Container Toolkit so Docker can access the GPU.

# Install recommended NVIDIA driver
sudo apt update
sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers autoinstall
sudo reboot

After reboot, install the NVIDIA Container Toolkit and integrate with Docker.

sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify GPUs are visible to Docker:

docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi

3) Create a Network and Volumes

We will keep Ollama models and WebUI data persistent across container restarts.

docker network create ai-net
docker volume create ollama-data
docker volume create openwebui-data

4) Run the Ollama Container

Start Ollama with GPU acceleration if available. The container exposes port 11434 for the API.

# GPU-enabled
docker run -d --name ollama --restart unless-stopped \
  --gpus all \
  -p 11434:11434 \
  -v ollama-data:/root/.ollama \
  --network ai-net \
  ollama/ollama:latest

# CPU-only (no GPU flag)
# docker run -d --name ollama --restart unless-stopped \
#   -p 11434:11434 \
#   -v ollama-data:/root/.ollama \
#   --network ai-net \
#   ollama/ollama:latest

5) Pull a Model

Use the Ollama CLI inside the container to download a model. Start with a balanced choice like Llama 3.1 8B or pick a smaller one if you are on CPU.

# Enter the container and pull a model
docker exec -it ollama bash -lc "ollama pull llama3.1:8b"

# Alternative smaller models:
# docker exec -it ollama bash -lc "ollama pull phi3:mini"
# docker exec -it ollama bash -lc "ollama pull mistral:7b"

6) Run Open WebUI

Open WebUI connects to the Ollama API. We will publish it on port 3000 and persist its configuration.

docker run -d --name openwebui --restart unless-stopped \
  -p 3000:8080 \
  -v openwebui-data:/app/backend/data \
  -e OLLAMA_BASE_URL=http://ollama:11434 \
  --network ai-net \
  ghcr.io/open-webui/open-webui:latest

Open your browser to http://localhost:3000, create an admin account, and pick your default model (e.g., llama3.1:8b). You can switch models anytime in the interface.

7) Test the Setup

Send a quick API test to confirm Ollama is serving responses:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "In one sentence, explain what a container is."
}'

If you get a streamed JSON response with text tokens, the backend is working. In Open WebUI, start a new chat and ask a question to validate the full stack.

8) Performance Tips

- Prefer GPU for 7B–14B models; CPU can be slow or memory-constrained. Smaller models like Phi-3 Mini run decently on modern CPUs.

- Add “q4_0” or “q5_1” quantized variants if available to reduce VRAM/RAM usage. Example: llama3.1:8b-instruct-q4_0.

- Limit GPU layers or context length if you see out-of-memory errors. In Open WebUI, lower max tokens and system prompt size.

9) Troubleshooting

Docker can’t see the GPU: Ensure the NVIDIA driver is installed, run nvidia-smi on the host, and confirm nvidia-ctk runtime configure was applied. Restart Docker and try the CUDA test container again.

Model download is slow or fails: Check DNS and firewall, or use a different mirror via environment variables if your network requires a proxy. You can also prefetch models on a faster connection and copy the volume.

Open WebUI cannot reach Ollama: Confirm both containers are on the ai-net network and OLLAMA_BASE_URL points to http://ollama:11434. Check logs with docker logs openwebui and docker logs ollama.

Out of memory: Choose a smaller model, use a stronger quantization, or on GPU, close other VRAM-heavy apps. For CPU, add swap if RAM is limited.

10) Security and Maintenance

Do not expose ports 11434 or 3000 publicly without authentication and TLS. If you must access remotely, restrict with a reverse proxy, HTTPS, and basic auth or OAuth. Keep images updated:

docker pull ollama/ollama:latest
docker pull ghcr.io/open-webui/open-webui:latest
docker stop openwebui ollama
docker rm openwebui ollama
# re-run the docker run commands from above (volumes preserve your data)

Wrap-Up

You now have a robust, GPU-capable local AI stack: Ollama serving models and Open WebUI providing a polished chat interface. Because everything runs in Docker with volumes, you can upgrade, backup, and migrate with minimal friction. Experiment with different models, tune prompts, and enjoy private, offline AI on your own hardware.

Comments