Install Ollama + Open WebUI on Ubuntu 24.04 with NVIDIA GPU Acceleration (Step-by-Step)

Overview

This guide shows you how to install Ollama and Open WebUI on Ubuntu 24.04 with NVIDIA GPU acceleration. You will get a fast local AI stack that can run modern large language models (LLMs) like Llama 3 on your own hardware, secured and accessible in a web browser. We will cover GPU driver setup, Docker with the NVIDIA Container Toolkit, Ollama installation, Open WebUI deployment, basic security, and troubleshooting.

Prerequisites

- A machine running Ubuntu 24.04 (fresh or updated).
- An NVIDIA GPU with at least 8 GB VRAM recommended for 8B models (more is better).
- Internet access and sudo privileges.

Step 1: Install the NVIDIA GPU Driver

Install the recommended proprietary driver so CUDA becomes available to both Ollama and containers.

sudo apt update && sudo apt install -y ubuntu-drivers-common sudo ubuntu-drivers autoinstall sudo reboot

After reboot, verify the GPU is recognized:

nvidia-smi

You should see a driver version and your GPU model. If not, check “Troubleshooting.”

Step 2: Install Docker and the NVIDIA Container Toolkit

We will run Open WebUI in Docker and connect it to the host’s Ollama API. First, install Docker Engine from the official repository:

sudo apt-get update sudo apt-get install -y ca-certificates curl gnupg sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin sudo usermod -aG docker $USER newgrp docker

Now install the NVIDIA Container Toolkit so containers can access your GPU:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker

Quick GPU-in-container test (optional):

docker run --rm --gpus all nvidia/cuda:12.6.2-base-ubuntu24.04 nvidia-smi

Step 3: Install Ollama (GPU-Accelerated LLM Runtime)

Ollama downloads, runs, and serves models locally. The installer sets up a system service by default.

curl -fsSL https://ollama.com/install.sh | sh sudo systemctl enable --now ollama

Verify the API is listening on port 11434:

curl http://127.0.0.1:11434/api/tags

Check GPU visibility and the model directory:

ollama info

Note: On Ubuntu, the Ollama service runs as the “ollama” user. Models are typically stored under /usr/share/ollama (service) or ~/.ollama (when run as your user). The exact location appears in ollama info.

Step 4: Deploy Open WebUI (Docker)

Open WebUI provides a friendly browser UI that connects to Ollama’s API. We’ll run it in host networking mode so it can reach 127.0.0.1:11434 directly.

docker run -d --name open-webui --restart unless-stopped --network=host -e OLLAMA_BASE_URL=http://127.0.0.1:11434 ghcr.io/open-webui/open-webui:latest

Open your browser and visit http://<server-ip>:8080. On first launch, create the admin account. In Settings, confirm the Ollama endpoint is http://127.0.0.1:11434.

Step 5: Pull a Model and Test

Download a model with good quality/speed balance. The 8B variants work well on many consumer GPUs.

ollama pull llama3.1:8b

Quick API test:

curl http://127.0.0.1:11434/api/generate -d '{"model":"llama3.1:8b","prompt":"Say hello in one short sentence."}'

Or try the CLI:

ollama run llama3.1:8b "Explain the difference between VRAM and system RAM in one paragraph."

You should see GPU utilization in nvidia-smi while the model is generating.

Step 6: Basic Security and Management

If your server is exposed to a network, restrict Open WebUI to trusted IPs with UFW (replace the subnet with your LAN):

sudo apt install -y ufw sudo ufw allow OpenSSH sudo ufw allow from 192.168.0.0/16 to any port 8080 proto tcp sudo ufw enable

Keep your stack up to date:

sudo systemctl stop ollama && curl -fsSL https://ollama.com/install.sh | sh && sudo systemctl start ollama docker pull ghcr.io/open-webui/open-webui:latest && docker restart open-webui

To persist Open WebUI data, mount a volume (recommended for production):

docker run -d --name open-webui --restart unless-stopped --network=host -v /opt/open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 ghcr.io/open-webui/open-webui:latest

Troubleshooting

nvidia-smi: command not found or no devices were found
- Reinstall drivers: sudo ubuntu-drivers autoinstall and reboot.
- If Secure Boot is enabled, you may need to enroll the NVIDIA kernel module (MOK) or temporarily disable Secure Boot in firmware.
- Use a supported driver (typically 535+ on newer GPUs).

Docker can’t see the GPU
- Test: docker run --rm --gpus all nvidia/cuda:12.6.2-base-ubuntu24.04 nvidia-smi.
- Reconfigure toolkit: sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker.
- Ensure you’re using the NVIDIA proprietary driver, not Nouveau.

Open WebUI cannot connect to Ollama
- Confirm Ollama API is running: curl http://127.0.0.1:11434/api/tags.
- Check logs: docker logs -f open-webui.
- Ensure --network=host is used and OLLAMA_BASE_URL is set to http://127.0.0.1:11434.

Models consume too much VRAM
- Try a smaller quantization (e.g., llama3.1:8b default Q4_K_M) or use a 7B/8B model instead of 13B/70B.
- Limit parallel requests in Open WebUI settings and Ollama.

What You Built

You now have a local, GPU-accelerated AI environment with Ollama serving LLMs and Open WebUI providing a clean chat interface. This setup is fast, private, and easy to maintain, giving you full control over your models and data. You can add more models with ollama pull, create prompt presets in Open WebUI, and secure access with firewalls or a reverse proxy if you plan to expose it beyond your LAN.

LifeBytes Journal

Search This Blog