Postil

Blog

Self-hosted AI code review without the 500-seat enterprise gate

June 2026 · Postil team

If your code cannot leave the network, the AI code review market has a short, frustrating answer for you. Self-hosting exists, but for a small or regulated team it is usually either an enterprise sales motion with a seat minimum, or one open-source project you assemble yourself. The sharpest example is CodeRabbit: its own documentation states that "The self-hosted option is only available for CodeRabbit Enterprise customers with 500 user seats or more." The gate is a seat count, not a capability. This piece walks through who actually lets you self-host and on what terms, and then runs the concrete path to a working local review with Postil and Ollama in about 15 minutes, at any team size, with no sales call.

Who actually lets you self-host, and the fine print

Self-hosting is real in this category, but the terms vary widely. The table below is the honest landscape as of June 2026; vendor policies change often, so verify before you commit.

ToolSelf-host?
CodeRabbitYes
GreptileYes
Qodo PR-AgentYes
MacroscopeNo
GitHub CopilotNo
Cursor BugbotNo
PostilYes

Two patterns stand out. First, where a hosted product offers real self-hosting (CodeRabbit, Greptile), it is reserved for the enterprise tier, which for a five-person team blocked from sending code to an external API is the same as not offering it. Second, the tools that abolished seats in favor of usage pricing did so for their cloud product; running the model on your own hardware is a different axis, and for several of them it simply is not on offer. The Bugbot nuance is the one most likely to be mis-stated elsewhere: it can review pull requests on a self-hosted forge, but the reviewer itself executes in Cursor's cloud, so your diff still leaves your network.

The real open-source alternative: Qodo PR-Agent

There is one genuine open-source option for "bring your own key plus a local model," and it deserves credit rather than a dismissal. Qodo PR-Agent is Apache-2.0 licensed, community-owned, with roughly 11.6k stars, and it supports multiple models through an OpenAI-compatible / LiteLLM layer. Air-gapped setups that put LiteLLM in front of Ollama are documented by the community. If you want a self-hosted reviewer and a project to maintain, that is a legitimate path.

The trade-off is the one any self-assembled stack carries: you own the integration. The honest framing is not that PR-Agent is bad; it is that "self-host with a local model" means budgeting for the glue, the model wiring, and the day a request silently goes to the wrong endpoint. That last failure mode is exactly what the rest of this article is about avoiding.

The 15-minute path with Postil and Ollama

Postil self-hosts the same stack we run hosted: Postgres, the web app, and the worker, via Docker Compose. It is free forever with no seat limit. The concrete path:

git clone https://github.com/postil-dev/postil
cd postil
cp .env.example .env
# fill in: GitHub App credentials, webhook secret, a sealing key,
#          a session secret, and your LLM key. Each line in
#          .env.example explains its variable.

docker compose up -d
docker compose exec web bun run db:migrate

Two things make the 15-minute budget realistic rather than aspirational. First, both the web app and the worker validate their configuration at boot: a missing or malformed variable stops the process with the variable name, what it is for, and an example value, not a stack trace from the first request that happened to need it. Second, before you ever open a test PR, postil doctor runs a live probe that proves the whole chain in one shot. Point it at Ollama with a one-line block:

POSTIL_API_BASE=http://ollama:11434/v1
POSTIL_API_KEY=ollama        # any non-empty value
REVIEW_MODEL=qwen3-coder:30b

Then run the doctor inside the worker container. The output shape:

docker compose exec worker postil doctor

  endpoint  http://ollama:11434/v1 ... ok (142ms)
  auth      key accepted ............ ok
  model     qwen3-coder:30b ......... ok (1.2s first token)

One caveat carried over from our docs page: those latencies are illustrative example values, not a captured benchmark. The checks and their pass/fail behavior are real; the millisecond numbers are there to show the shape of the output.

Why "OpenAI-compatible" is the whole trick

The structural reason there is no seat gate is that there is no hosted inference to meter. The Postil worker speaks plain OpenAI-compatible chat completions, against POST {base}/chat/completions. The same binary points at Ollama, vLLM, LiteLLM, TGI, Azure OpenAI, or OpenRouter by changing one base URL. There is no proxy in the middle and no per-review billing: your inference goes to your endpoint at your provider's rates. Postil never proxies or marks up the model call. Bring your own key is not a feature bolted on for the enterprise tier; it is the only way the tool talks to a model at all.

# OpenRouter (default)
POSTIL_API_BASE=https://openrouter.ai/api/v1
POSTIL_API_KEY=sk-or-v1-...

# Azure OpenAI
POSTIL_API_BASE=https://<resource>.openai.azure.com/openai/v1

# Ollama, vLLM, LiteLLM, TGI: same shape, different base URL

The doctor is the differentiator for self-hosters

The anti-goal is named explicitly in the source: the silently-misconfigured self-hosted reviewer, with the wrong environment variable, an unreachable endpoint, or a model name typo, discovered only when a review silently does nothing. The doctor checks each link in the chain and says exactly what to fix, including a live one-token completion that proves the base URL, key, and model together. The hints are real, in-binary behavior: a 401 or 403 reads "key rejected: wrong key for this endpoint?", a 404 reads "wrong apiBase path or unknown model name?", and a connection failure names the Ollama URL to try, http://localhost:11434/v1. For a self-hoster, the gap between "it works" and "it silently does nothing" is the entire job, and the doctor is built to close it before your first PR rather than after a confusing week of quiet output.

Air-gapped and regulated

Self-hosted plus Ollama means code never leaves your network. If you run in hosted or CLI mode with your own key instead, it goes only to the provider you chose, under your own data processing agreement, with no Postil-operated hop in between. Either way you control the data flow, which is the property procurement actually screens for. The forge coverage matters here too, because regulated buyers tend to run self-managed Git: GitHub including GitHub Enterprise Server, GitLab including self-managed, Bitbucket including Data Center, and Azure DevOps including Server, each reached through a base-URL environment variable rather than a separate build.

Operations, briefly

A few signals that this is operable rather than a toy. /api/health is a database ping suitable for liveness probes. /api/metrics emits Prometheus text, including the silence rate, protected by a METRICS_TOKEN bearer. The worker's watchdog fails any review running longer than 10 minutes and completes its check runs as failed, so a stuck review cannot hold a PR hostage as eternally in progress. And the CLI binary is baked into the worker image at a pinned commit, so upgrading the reviewer is an image upgrade, not a runtime download from a network you may have deliberately cut off.

First review now, scale later

The wedge is simple. Self-hosting in this category is real but mostly locked behind an enterprise contract with a seat minimum, or left to a DIY open-source project. Postil self-hosts for free, at any team size, with bring-your-own-key inference and a doctor that catches the misconfiguration that would otherwise make a local reviewer silently useless. No claim here that Postil detects more or better than CodeRabbit, Greptile, or PR-Agent; there is no comparative data and we will not pretend there is. The claim is about availability and the deployment model: a full AI code reviewer you can run on your own hardware, first review in about 15 minutes, no 500-seat gate, no sales call. The detailed how-to lives on the self-hosted docs page.

Sources

Run it on your own hardware.

Self-hosted is free, no seat limit, BYO key. First review in about 15 minutes with Ollama.