# Host a local model (maximum privacy) This guide shows how to run `gmuse` against a **local** LLM so your prompts and responses stay on your machine. - `gmuse` uses [LiteLLM](https://docs.litellm.ai/docs/) under the hood. - Local hosting means **you** operate the model runtime (no gmuse backend servers). If you haven't already, read the privacy overview: [Privacy & Security](../explanation/privacy.md). ## When to use a local model Local hosting is a good fit when you want: - Maximum privacy (no third-party API calls) - Offline usage (after the model is downloaded) - Predictable costs (no per-request billing) It may be a poor fit when you need: - The highest quality for complex reasoning (local models may be weaker than frontier hosted models) - Very large context windows - Zero maintenance (you are responsible for updates and security) ## Security checklist (read first) If you follow only one section, follow this one. - **Bind locally by default**: keep the inference server on `127.0.0.1` (localhost). - **Do not expose the port publicly** unless you understand the risks. - **If you must access it remotely**: - Put it behind authentication. - Use TLS. - Restrict access (VPN, firewall rules, allowlists). - **Disable debug logging** in sensitive environments: - Avoid `GMUSE_DEBUG=1` and DEBUG-level logs because prompts/diffs may be written to logs. - **Treat model downloads like binaries**: - Prefer official sources. - Keep the runtime and models updated. - Review licensing before redistribution or commercial use. ## Recommended “golden path”: Ollama + LiteLLM Ollama is a simple local LLM runtime. LiteLLM supports Ollama models via the `ollama/` (or `ollama_chat/`) model prefix. ### 1) Install and run Ollama Follow Ollama’s official install instructions for your OS: - https://ollama.com/download Ensure the Ollama server is running and listening on localhost (default: `http://localhost:11434`). ### 2) Pull a model Pick a model that’s strong at short, structured text generation. Examples (choose one): ```console $ ollama pull qwen2.5-coder ``` ```console $ ollama pull llama3.1 ``` Notes: - Smaller models (around 7B–8B) are typically a good speed/quality balance for commit messages. - Model names vary by runtime; use `ollama list` to see what you have locally. ### 3) Point gmuse at the local model You can configure a local model either temporarily (environment) or persistently (config file). **Option A — Environment variable (one shell session):** ```console $ export GMUSE_MODEL="ollama/qwen2.5-coder" $ gmuse msg ``` **Option B — Config file (persistent):** Add to `~/.config/gmuse/config.toml`: ```toml model = "ollama/qwen2.5-coder" ``` Then run: ```console $ gmuse msg ``` ### 4) Verify configuration `gmuse info` prints the resolved model and provider heuristics: ```console $ gmuse info ``` If your provider is detected as `ollama`, your `GMUSE_MODEL` is being interpreted as a local runtime model. ## Generalize to other local backends Ollama is the simplest path, but it’s not the only one. Two common patterns work well with `gmuse`: ### Pattern A — Use a LiteLLM provider prefix If LiteLLM supports your local backend directly, set `GMUSE_MODEL` to the appropriate provider-prefixed model name. Examples: - `ollama/` - `ollama_chat/` (often better chat-style responses) See the LiteLLM providers list: - https://docs.litellm.ai/docs/providers ### Pattern B — Use an OpenAI-compatible endpoint (advanced) Some local servers expose an OpenAI-compatible API. Best practices for this approach: - Prefer a LiteLLM provider that matches your backend (when available) instead of the generic OpenAI-compatible route. - If your OpenAI-compatible client requires an API key even locally, use a non-sensitive placeholder and enable authentication on the server if it’s reachable beyond localhost. LiteLLM reference: - https://docs.litellm.ai/docs/providers/openai_compatible ## Troubleshooting ### gmuse says no provider is configured Run: ```console $ gmuse info ``` Common causes: - `GMUSE_MODEL` isn’t set and no provider API key env var is set. - `GMUSE_MODEL` is set to a value LiteLLM doesn’t recognize. ### Connection errors If you see errors connecting to Ollama: - Verify the server is running (default address `http://localhost:11434`). - Confirm the model exists locally: `ollama list`. - Ensure you can reach the server from the same environment you run `gmuse` from. ### Output is too long or too random Tune these settings: - Lower temperature for more deterministic messages - Reduce max tokens for shorter outputs Example config: ```toml temperature = 0.2 max_tokens = 200 ``` See also: [Configure gmuse](configuration.md) and the [Configuration Reference](../reference/configuration.md).