Blog

Self-hosted AI code review without the 500-seat enterprise gate

July 8, 2026 · Postil team

If your code cannot leave the network, the AI code review market has a short, frustrating answer for you. Self-hosting exists, but for a small or regulated team it is usually either an enterprise sales motion with a seat minimum, or one open-source project you assemble yourself. The sharpest example is CodeRabbit: its public AWS Marketplace listing describes self-hosted delivery with list pricing for 500 users, and its usage instructions set a 500-user minimum for developer seats. The AWS Marketplace listing sets the self-hosted minimum at 500 seats. This piece walks through who actually lets you self-host and on what terms, then follows Postil's scripted path from clone to a local review with Ollama, at any team size, with no sales call.

Who actually lets you self-host, and the fine print

Self-hosting is real in this category, but the terms vary widely. The table below is the honest landscape; vendor policies change often, so verify before you commit.

Tool	Self-host?	Terms
CodeRabbit	Yes	Enterprise self-hosted listing, 500-user minimum
Greptile	Yes	Docker/K8s, air-gapped, BYOK LLM endpoint, Enterprise tier
Qodo PR-Agent	Yes	Open source (Apache-2.0), BYOK, Ollama supported
Macroscope	No	Hosted only
GitHub Copilot	No	Runs in GitHub's cloud
Cursor Bugbot	No	Connects to self-hosted forges but runs in Cursor's cloud
Postil	Yes	Apache-2.0, no seat fees or license cost, BYOK

Two patterns stand out. First, where a hosted product offers real self-hosting (CodeRabbit, Greptile), it is reserved for the enterprise tier, which for a five-person team blocked from sending code to an external API is the same as not offering it. Second, the tools that abolished seats in favor of usage pricing did so for their cloud product; running the model on your own hardware is a different axis, and for several of them it is not on offer. The Bugbot nuance is the one most likely to be mis-stated elsewhere: it can review pull requests on a self-hosted forge, but the reviewer itself executes in Cursor's cloud, so your diff still leaves your network.

The real open-source alternative: Qodo PR-Agent

There is one genuine open-source option for "bring your own key plus a local model," and it deserves credit rather than a dismissal. Qodo PR-Agent is Apache-2.0 licensed, community-owned, with roughly 11.6k stars, and it supports multiple models through an OpenAI-compatible / LiteLLM layer. Air-gapped setups that put LiteLLM in front of Ollama are documented by the community. If you want a self-hosted reviewer and a project to maintain, that is a legitimate path.

The trade-off is the one any self-assembled stack carries: you own the integration. PR-Agent itself is not the problem; "self-host with a local model" means budgeting for the glue, the model wiring, and the day a request silently goes to the wrong endpoint. That last failure mode is exactly what the rest of this article is about avoiding.

The Compose path with Postil and Ollama

Postil self-hosts the same stack we run hosted: Postgres, the web app, and the worker, through one Docker Compose file. The stack is Apache-2.0 with no seat fees or license cost; you pay for your own inference and infrastructure. The concrete path:

git clone https://github.com/postil-dev/postil
cd postil
cp .env.example .env
# fill in: GitHub App credentials, webhook secret, a sealing key,
#          a session secret, and your LLM key. Each line in
#          .env.example explains its variable.

docker compose up -d
docker compose exec web bun run db:migrate

Both the web app and the worker validate their configuration at boot: a missing or malformed variable stops the process with the variable name, what it is for, and an example value. Before you open a test PR,postil doctor checks the configured chain. Point it at Ollama with this block:

POSTIL_API_BASE=http://ollama:11434/v1
MODEL_API_KEY=ollama        # any non-empty value
POSTIL_API_KEY=ollama
REVIEW_MODEL=qwen3-coder:30b

Then run the doctor inside the worker container. It checks endpoint reachability separately from whether the configured model is ready to answer a request. This successful doctor transcript shows the checks it reports:

docker compose exec worker postil doctor

[ok  ] config           loaded from defaults (model: local-doctor-probe, gate failOn: error, minConfidence: 0.6)
[ok  ] git              inside a git work tree
[ok  ] api key          POSTIL_API_KEY, OPENROUTER_API_KEY, MODEL_API_KEY, LLM_API_KEY is set (value not shown)
[ok  ] model endpoint   http://127.0.0.1:3117/v1 answered for model local-doctor-probe
[ok  ] forge tokens     presence only: GITHUB_TOKEN unset, GITLAB_TOKEN unset (only needed for remote review)

postil doctor: ready.

This transcript was captured from the CLI against a loopback OpenAI-compatible endpoint. The URL and model name change for OpenRouter, Azure OpenAI, Ollama, or another compatible server, and key values are not printed.

Two provider interfaces, one worker

The Postil worker supports OpenAI-compatible chat completions and the Anthropic Messages API. The same binary points at Ollama, vLLM, LiteLLM, TGI, Azure OpenAI, OpenRouter, or Anthropic by selecting the request format and base URL. In CLI and self-hosted modes inference goes directly to your endpoint. Hosted BYOK uses the organization's configured provider and credentials.

# OpenRouter (default)
POSTIL_API_BASE=https://openrouter.ai/api/v1
MODEL_API_KEY=sk-or-v1-...
POSTIL_API_KEY=sk-or-v1-...

# Azure OpenAI
POSTIL_API_BASE=https://azure-resource.openai.azure.com/openai/v1
MODEL_API_KEY=azure-api-key
POSTIL_API_KEY=azure-api-key

# Ollama, vLLM, LiteLLM, TGI: same shape, different base URL

Models worth trying first

Start with one cheap model and one stronger model, then promote the cheapest one that preserves detection rate and silence on clean PRs. On OpenRouter, try DeepSeek V4 Pro or Kimi K2.6 as stronger defaults, and Qwen3 32B, Mistral Small 3.2 24B, or Gemma 3 27B for lower-cost or local-friendly runs. The maintained shortlist lives in the model catalog. Locally, use the largest coder model your hardware can serve reliably and verify it with postil doctor plus the live benchmark harness.

The maintained model table and live benchmark commands are in the models guide.

The doctor is the differentiator for self-hosters

The anti-goal is named explicitly in the source: the silently-misconfigured self-hosted reviewer, with the wrong environment variable, an unreachable endpoint, or a model name typo, discovered only when a review silently does nothing. The doctor checks each link in the chain and says exactly what to fix, including a live one-token completion that proves the base URL, key, and model together. The hints are real, in-binary behavior: a 401 or 403 reads "key rejected: wrong key for this endpoint?", a 404 reads "wrong apiBase path or unknown model name?", and a connection failure names the Ollama URL to try, http://localhost:11434/v1. For a self-hoster, the gap between "it works" and "it silently does nothing" is the entire job, and the doctor is built to close it before your first PR rather than after a confusing week of quiet output.

Air-gapped and regulated

Self-hosted plus Ollama means code never leaves your network. CLI mode with your own key sends code directly to the provider you chose, under your own data processing agreement. Hosted BYOK sends the diff through the Postil worker to your configured provider, and hosted default uses Postil's configured provider path. The forge coverage matters here too, because regulated buyers tend to run self-managed Git: GitHub including GitHub Enterprise Server, GitLab including self-managed, Bitbucket including Data Center, and Azure DevOps including Server, each reached through a base-URL environment variable rather than a separate build.

Operations, briefly

A few signals that this is operable rather than a toy. /api/health is a cheap web-process liveness check, while /api/health/dependencies checks Postgres readiness. /api/metrics emits Prometheus text, including the silence rate and database-up signal, protected by a METRICS_TOKEN bearer. The worker's watchdog fails any review running longer than 10 minutes and completes its check runs as failed, so a stuck review never leaves a PR stuck in progress indefinitely. And the CLI binary is baked into the worker image at a pinned commit, so upgrading the reviewer is an image upgrade, not a runtime download from a network you may have deliberately cut off.

First review now, scale later

The wedge is simple. Self-hosting in this category is real but mostly locked behind an enterprise contract with a seat minimum, or left to a DIY open-source project. Postil's self-hosted stack is Apache-2.0 with no seat fees or license cost, at any team size, with bring-your-own-key inference and a doctor that catches the misconfiguration that would otherwise make a local reviewer silently useless. No claim here that Postil detects more or better than CodeRabbit, Greptile, or PR-Agent: there is no comparative data to support one. The claim is about availability and the deployment model: a full AI code reviewer you can run on your own hardware through a scripted Compose path, with no 500-seat gate or sales call. The detailed how-to lives on the self-hosted docs page.

Sources

CodeRabbit AWS Marketplace listing (self-hosted delivery, 500-user list price and minimum)
Qodo PR-Agent (GitHub) (Apache-2.0, multi-model)
Community walkthrough: air-gapped PR-Agent with Ollama + LiteLLM
Postil self-hosting, the doctor, and OpenAI-compatible model wiring, grounded in the self-hosted docs and the open-source CLI.

Run it on your own hardware.

Apache-2.0, no seat fees or license cost, BYOK, with a scripted Compose path for Ollama.

Self-hosting guide Install the CLI