Postil

Blog

Where does your code actually go? A data-flow audit of AI code review tools

June 2026 · Postil team

To review a pull request, a tool has to read the diff, and often the surrounding files. That is unavoidable. What is not fixed is what happens next: whether your code is held on the vendor's servers after the review, whether it is used to train or improve a model, and whose infrastructure the inference call runs on. Those three questions, retention, training, and inference location, are what a security review of an AI reviewer actually turns on, and they vary far more between products than catch rates do.

This is a comparative explainer, not a scorecard. It groups the tools by how they handle code rather than ranking them, states each vendor's posture with the source and the date we read it, and ends with the questions to put to any vendor, including us. For Postil's own controls, structural detail lives on the security page; this piece is about the category, so we keep our part short and point there.

Why this is the question procurement asks first

Security is a gate, not a tiebreaker. As one CTO checklist circulating in 2026 puts it, "a tool scoring 0 or 1 on security will not survive procurement regardless of its capabilities elsewhere" (Augment CTO checklist). The category also has a concrete reason for the scrutiny. In an August 2025 disclosure, security researchers at Kudelski described achieving remote code execution inside CodeRabbit's review pipeline via a malicious linter config in a pull request, exfiltrating environment variables that included a GitHub App private key carrying write access across roughly a million repositories (HN discussion). CodeRabbit reported the issue fixed; the durable lesson is not about one vendor but about blast radius. The same incident drove a search pattern still live in Google autocomplete as of June 12, 2026: is coderabbit safe. People are asking, and the honest answer requires reading the data-flow, not the marketing page.

Three questions that decide everything

Strip away the feature lists and an AI reviewer's data posture reduces to three independent questions:

  • Retention. After a review finishes, does your source (or an embedding of it) persist on the vendor's servers, and for how long? Ephemeral-then-deleted, a few-day troubleshooting window, and indefinite storage are three very different answers.
  • Training. Is your code, or telemetry derived from it, used to train or improve a model? If so, is that on by default with an opt-out, or off unless you opt in?
  • Inference location. Whose account makes the call to the model, and under whose data-processing agreement? A vendor calling its own model account is a different exposure from a tool that calls a model endpoint you control under your own contract.

A tool can be excellent on one and weak on another. Greptile, for example, has a genuine self-hosted, air-gapped deployment for enterprise buyers and a default hosted posture that is the most retentive among the majors. The questions are orthogonal, so audit them separately.

The hosted majors, by stated posture

What follows is each vendor's own published position for its default hosted product, as read on the dates given. Postures change; re-verify against the linked page and, for anything that matters, the contract rather than the marketing copy. Enterprise tiers often differ from defaults, which is exactly why the default is worth stating.

ToolCode retention (default hosted)
CodeRabbitEphemeral review environments; SOC 2 Type II
QodoZero-retention model account; troubleshooting data deleted within 48 hours
GreptileStores code and embeddings on its servers until access is revoked
GitHub CopilotHosted on GitHub infrastructure
Postil (hosted)Stores the review envelope only; source code never persisted

The detail behind each row:

CodeRabbit publishes SOC 2 Type II compliance, describes ephemeral review environments, and states it does not train on your code (trust center, read June 12, 2026). The August 2025 RCE is a separate matter from its data-retention policy; both belong in an audit.

Qodo describes a zero-retention arrangement with its model provider and says troubleshooting data is deleted within 48 hours, alongside SOC 2 Type II (Qodo security post, read June 12, 2026). Its open-source PR-Agent is a separate path you run yourself, which changes the inference-location answer entirely.

Greptile is the most retentive of the majors by its own description: it states (read June 12, 2026) that it stores code and embeddings on its servers until you revoke access, and that it may use anonymized customer data for AI improvement unless you opt out. Both store-by-default and train-by-default are present; both are reversible, but the default is the opposite of zero-retention. Greptile also offers a self-hosted, air-gapped enterprise deployment that sidesteps this entirely, so the posture you get depends on the tier you buy.

GitHub Copilot uses Free, Pro, and Pro+ interaction data for training unless you opt out, a policy in effect since April 24, 2025; Business and Enterprise are excluded (GitHub policy update). For code review specifically this means the training answer depends on which Copilot plan the reviewing identity is on.

The structural variable: where inference runs

The table above covers retention and training. The third question, where the inference call is made, splits the category along a line that the marketing rarely draws clearly. In the hosted majors, the vendor makes the model call on its own account: your diff goes to the vendor, the vendor sends it to a model provider, and you inherit whatever data-processing terms the vendor negotiated. That can be a perfectly good arrangement, but it is the vendor's arrangement, not yours.

The other shape is bring-your-own-key, where the tool sends the diff to a model endpoint you configure, authenticated with your key, under your data-processing agreement with that provider, or to a model you host. Among the majors this is mostly an enterprise or open-source path: CodeRabbit allows a BYO key only when self-hosted, and Qodo's PR-Agent supports any key, including a local Ollama endpoint, because you run it yourself (PR-Agent, read June 12, 2026). Macroscope, hosted Copilot, and Cursor Bugbot do not offer BYO key or model selection in their hosted products (vendor docs, read June 12, 2026). The reason inference location matters is that it determines which contract governs your code in the one place it is most exposed: in transit to, and being processed by, a large language model.

Where Postil sits, stated plainly

Postil is one option on this map, and here is its data-flow without adjectives. It is BYO-key by construction: the CLI sends your diff directly to the OpenAI-compatible endpoint you configure, OpenRouter, Azure OpenAI, vLLM, LiteLLM, or a local Ollama, authenticated with your key. The hosted control plane never proxies inference; the request to the model goes from the worker to your endpoint, not through a Postil inference service. The relevant line in the open-source CLI is blunt about it: when no key is set, it errors with "Postil never proxies your inference; bring your own key." Because the diff goes to the provider you chose, it is your data-processing agreement with that provider, not ours, that governs it.

On retention, the hosted control plane persists exactly one artifact per review: the envelope, a JSON document with the summary, the findings (path, line, severity, confidence, and the finding text), token usage, and the gate verdict. The diff is fetched at review time, sent to your model endpoint, and discarded with the process. There is no code cache, no embedding index, and no repository clone on our infrastructure, and a self-hosted deployment sends us nothing at all, no telemetry, no license pings, no update checks. When a BYO key is stored for the hosted product, it is sealed with AES-256-GCM before it touches the database and the settings form is write-only: a stored key can be replaced or removed, never read back out.

The GitHub App asks for the smallest permission set that does the job: contents: read to fetch the diff, pull_requests: write to post one batched review, checks: write for the two check-runs, and metadata: read. It never requests contents: write, so even a full compromise of an installation token cannot push a commit. That is the structural counterpart to the RCE lesson above: hold no credential a reviewer does not need. None of this is a claim that Postil is safer than any other tool; it is a description of one posture, checkable against the security page, the envelope schema, and the privacy policy, and against the open-source CLI directly.

Questions to ask any vendor (including us)

A SOC 2 guide making the rounds in 2026 gives the right instinct: get written confirmation, and "read the actual MSA, not the marketing page" (Probo). These are the questions that map to the three variables above:

  • After a review completes, what is retained on your servers, my source or only derived metadata, and for how long? Is there a written zero-retention or deletion commitment?
  • Do you train or improve any model on my code or on telemetry derived from it? Is that off by default, or on with an opt-out, and where is the toggle?
  • Whose account makes the model call, and under whose data-processing agreement? Can I point inference at an endpoint or model I control?
  • What is the exact GitHub (or GitLab) permission scope, and does it include write access to code? If the pipeline were compromised, what could an attacker reach?
  • Does the default posture differ from the enterprise tier? Quote me the default, since that is what I run on day one.

A vendor that can answer these in writing, default first, is one you can actually audit. A vendor that can only point at a trust badge is asking you to take the data-flow on faith, which, after August 2025, is the one thing this category has not earned.

Sources

Audit our data-flow, not our adjectives.

Postil stores the review, not the code, and never proxies your inference. The full posture is on the security page.