Overview
This guide shows you how to install Ollama and Open WebUI on Ubuntu 24.04 with NVIDIA GPU acceleration. You will get a fast local AI stack that can run modern large language models (LLMs) like Llama 3 on your own hardware, secured and accessible in a web browser. We will cover GPU driver setup, Docker with the NVIDIA Container Toolkit, Ollama installation, Open WebUI deployment, basic security, and troubleshooting.
Prerequisites
- A machine running Ubuntu 24.04 (fresh or updated).
- An NVIDIA GPU with at least 8 GB VRAM recommended for 8B models (more is better).
- Internet access and sudo privileges.
Step 1: Install the NVIDIA GPU Driver
Install the recommended proprietary driver so CUDA becomes available to both Ollama and containers.
sudo apt update && sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers autoinstall
sudo reboot
After reboot, verify the GPU is recognized:
nvidia-smi
You should see a driver version and your GPU model. If not, check “Troubleshooting.”
Step 2: Install Docker and the NVIDIA Container Toolkit
We will run Open WebUI in Docker and connect it to the host’s Ollama API. First, install Docker Engine from the official repository:
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
Now install the NVIDIA Container Toolkit so containers can access your GPU:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Quick GPU-in-container test (optional):
docker run --rm --gpus all nvidia/cuda:12.6.2-base-ubuntu24.04 nvidia-smi
Step 3: Install Ollama (GPU-Accelerated LLM Runtime)
Ollama downloads, runs, and serves models locally. The installer sets up a system service by default.
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
Verify the API is listening on port 11434:
curl http://127.0.0.1:11434/api/tags
Check GPU visibility and the model directory:
ollama info
Note: On Ubuntu, the Ollama service runs as the “ollama” user. Models are typically stored under /usr/share/ollama
(service) or ~/.ollama
(when run as your user). The exact location appears in ollama info
.
Step 4: Deploy Open WebUI (Docker)
Open WebUI provides a friendly browser UI that connects to Ollama’s API. We’ll run it in host networking mode so it can reach 127.0.0.1:11434
directly.
docker run -d --name open-webui --restart unless-stopped --network=host -e OLLAMA_BASE_URL=http://127.0.0.1:11434 ghcr.io/open-webui/open-webui:latest
Open your browser and visit http://<server-ip>:8080
. On first launch, create the admin account. In Settings, confirm the Ollama endpoint is http://127.0.0.1:11434
.
Step 5: Pull a Model and Test
Download a model with good quality/speed balance. The 8B variants work well on many consumer GPUs.
ollama pull llama3.1:8b
Quick API test:
curl http://127.0.0.1:11434/api/generate -d '{"model":"llama3.1:8b","prompt":"Say hello in one short sentence."}'
Or try the CLI:
ollama run llama3.1:8b "Explain the difference between VRAM and system RAM in one paragraph."
You should see GPU utilization in nvidia-smi
while the model is generating.
Step 6: Basic Security and Management
If your server is exposed to a network, restrict Open WebUI to trusted IPs with UFW (replace the subnet with your LAN):
sudo apt install -y ufw
sudo ufw allow OpenSSH
sudo ufw allow from 192.168.0.0/16 to any port 8080 proto tcp
sudo ufw enable
Keep your stack up to date:
sudo systemctl stop ollama && curl -fsSL https://ollama.com/install.sh | sh && sudo systemctl start ollama
docker pull ghcr.io/open-webui/open-webui:latest && docker restart open-webui
To persist Open WebUI data, mount a volume (recommended for production):
docker run -d --name open-webui --restart unless-stopped --network=host -v /opt/open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 ghcr.io/open-webui/open-webui:latest
Troubleshooting
nvidia-smi: command not found or no devices were found
- Reinstall drivers: sudo ubuntu-drivers autoinstall
and reboot.
- If Secure Boot is enabled, you may need to enroll the NVIDIA kernel module (MOK) or temporarily disable Secure Boot in firmware.
- Use a supported driver (typically 535+ on newer GPUs).
Docker can’t see the GPU
- Test: docker run --rm --gpus all nvidia/cuda:12.6.2-base-ubuntu24.04 nvidia-smi
.
- Reconfigure toolkit: sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker
.
- Ensure you’re using the NVIDIA proprietary driver, not Nouveau.
Open WebUI cannot connect to Ollama
- Confirm Ollama API is running: curl http://127.0.0.1:11434/api/tags
.
- Check logs: docker logs -f open-webui
.
- Ensure --network=host
is used and OLLAMA_BASE_URL
is set to http://127.0.0.1:11434
.
Models consume too much VRAM
- Try a smaller quantization (e.g., llama3.1:8b
default Q4_K_M) or use a 7B/8B model instead of 13B/70B.
- Limit parallel requests in Open WebUI settings and Ollama.
What You Built
You now have a local, GPU-accelerated AI environment with Ollama serving LLMs and Open WebUI providing a clean chat interface. This setup is fast, private, and easy to maintain, giving you full control over your models and data. You can add more models with ollama pull
, create prompt presets in Open WebUI, and secure access with firewalls or a reverse proxy if you plan to expose it beyond your LAN.
Comments
Post a Comment