Self-Host a Private AI Chat with Open WebUI and Ollama on Ubuntu 24.04 (CPU or NVIDIA GPU)

Summary: This hands-on guide shows you how to self-host a privacy-friendly AI chat using Open WebUI and Ollama on Ubuntu 24.04. You’ll install Ollama (CPU or NVIDIA GPU), pull a modern open-source model (like Llama 3), and run Open WebUI as a container so you can chat in your browser. The steps are simple, repeatable, and production-friendly.

Why self-host a ChatGPT alternative?

Self-hosting gives you control, privacy, and predictable costs. Models run locally, so prompts and outputs stay on your server. With a capable CPU or an NVIDIA GPU, you can achieve low-latency responses and customize the setup to match your workflow.

What you need

- Ubuntu 24.04 LTS with a non-root user that can run sudo.
- Internet access and at least 16 GB of RAM for 7–8B models (more is better); an NVIDIA GPU speeds things up.
- Open ports: 11434 (Ollama, local only), 8080 (Open WebUI).
- Basic terminal familiarity.

Step 1 — Prepare Ubuntu 24.04

sudo apt update && sudo apt -y upgrade
sudo apt install -y curl git ufw

Enable the firewall and allow SSH and the Open WebUI port:

sudo ufw allow OpenSSH
sudo ufw allow 8080/tcp
sudo ufw enable

Step 2 — Install Ollama (CPU or NVIDIA GPU)

Ollama is a lightweight runtime for local LLMs with an OpenAI-compatible API. Install it with the official script and enable the system service:

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama
ollama --version

By default, Ollama listens on 127.0.0.1:11434 (local only), which is what we want for a secure setup.

GPU acceleration (optional but recommended)

If you have an NVIDIA GPU, install the recommended driver so Ollama can use CUDA automatically:

sudo ubuntu-drivers install
sudo reboot
nvidia-smi

Ollama will use the GPU if the driver and CUDA libraries are present. During generation, watch nvidia-smi to confirm usage. If you do not have a GPU, Ollama still runs well on CPU-only, just expect slower responses.

Step 3 — Pull a model and test locally

Pull a modern, general-purpose model. Llama 3 8B is a great starting point:

ollama pull llama3:8b

Run a quick prompt from the terminal to verify generation works:

ollama run llama3:8b "Write a haiku about Ubuntu servers."

Tip: You can pull other models such as mistral:7b, phi3:latest, or code-focused models depending on your use case.

Step 4 — Install Open WebUI (Docker)

Open WebUI provides a clean, fast chat interface that can talk to Ollama. The easiest way to run it is via Docker. If Docker is not installed yet, set it up:

sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu noble stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io

Run Open WebUI on the host network so it can reach Ollama at 127.0.0.1:

sudo docker run -d --name open-webui --restart=always --network host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 ghcr.io/open-webui/open-webui:latest

If you prefer not to use --network host, publish a port and set OLLAMA_BASE_URL to http://host.docker.internal:11434 (on recent Docker versions) or to your host’s IP.

Step 5 — First run and basic security

Open your browser and visit http://SERVER_IP:8080. Create the first admin user. In Settings → Security, enable authentication and disable public sign-ups to prevent unauthorized access. Keep Ollama bound to localhost and avoid exposing port 11434 to the internet.

In the Models section, you can pull models directly from the UI or reuse models previously pulled with the Ollama CLI. Start a new chat and select your model (e.g., llama3:8b). You now have a private AI chat running on your own hardware.

Optional — TLS with a reverse proxy

For secure remote access, place Open WebUI behind a reverse proxy like Nginx or Caddy with a Let’s Encrypt certificate. Point the proxy to 127.0.0.1:8080, enable HTTPS, and restrict access with authentication. If you use Cloudflare, consider an Origin Certificate and firewall rules for extra protection.

Troubleshooting

Port in use: If 8080 is taken, change Open WebUI’s bind port or stop the conflicting service. Check with sudo ss -ltnp | grep 8080.
GPU not used: Ensure nvidia-smi works, use a recent driver (e.g., 535+), and watch GPU utilization during generation. Restart Ollama after driver changes: sudo systemctl restart ollama.
Model not found: Pull the model again (ollama pull model:tag) and verify spelling. Some models have multiple quantizations; pick one that fits your RAM/VRAM.
Slow responses: Use a smaller or more aggressively quantized model, lower context window, or run on GPU for better throughput.

Next steps

- Add embeddings and RAG with document loaders inside Open WebUI for private knowledge search.
- Tune concurrency: OLLAMA_NUM_PARALLEL and model context size to match your CPU/GPU and memory budget.
- Back up Open WebUI’s volume (open-webui) and the Ollama models directory for quick disaster recovery.

You now have a fast, private, and flexible AI stack on Ubuntu 24.04. With Ollama and Open WebUI, you control your data and costs while enjoying a modern chat experience powered by open models.

LifeBytes Journal

Search This Blog