Blog

Where does your code actually go? A data-flow audit of AI code review tools

June 27, 2026 · Postil team

To review a pull request, a tool has to read the diff, and often the surrounding files. That is unavoidable. What is not fixed is what happens next: whether your code is held on the vendor's servers after the review, whether it is used to train or improve a model, and whose infrastructure the inference call runs on. Those three questions, retention, training, and inference location, are what a security review of an AI reviewer actually turns on, and they vary far more between products than catch rates do.

This is a comparative explainer, not a scorecard. It groups the tools by how they handle code rather than ranking them, states each vendor's posture with source links, and ends with the questions to put to any vendor, including us. For Postil's own controls, structural detail lives on the security page; this piece is about the category, so we keep our part short and point there.

Why this is the question procurement asks first

Security functions as a procurement gate. As one CTO checklist circulating in 2026 puts it, "a tool scoring 0 or 1 on security will not survive procurement regardless of its capabilities elsewhere" (Augment CTO checklist). The category also has a concrete reason for the scrutiny. In an August 2025 disclosure, security researchers at Kudelski described achieving remote code execution inside CodeRabbit's review pipeline via a malicious linter config in a pull request, exfiltrating environment variables that included a GitHub App private key carrying write access across roughly a million repositories (HN discussion). CodeRabbit reported the issue fixed; the durable lesson is not about one vendor but about blast radius. The same incident drove a search pattern around whether CodeRabbit is safe. People are asking, and the honest answer requires reading the data-flow, not the marketing page.

Three questions that decide everything

Strip away the feature lists and an AI reviewer's data posture reduces to three independent questions:

Retention. After a review finishes, does your source (or an embedding of it) persist on the vendor's servers, and for how long? Ephemeral-then-deleted, a few-day troubleshooting window, and indefinite storage are three very different answers.
Training. Is your code, or telemetry derived from it, used to train or improve a model? If so, is that on by default with an opt-out, or off unless you opt in?
Inference location. Whose account makes the call to the model, and under whose data-processing agreement? A vendor calling its own model account is a different exposure from a tool that calls a model endpoint you control under your own contract.

A tool can be excellent on one and weak on another. Greptile, for example, has a genuine self-hosted, air-gapped deployment for enterprise buyers and a default hosted posture that is the most retentive among the majors. The questions are orthogonal, so audit them separately.

The hosted majors, by stated posture

What follows is each vendor's own published position for its default hosted product. Postures change; re-verify against the linked page and, for anything that matters, the contract rather than the marketing copy. Enterprise tiers often differ from defaults, which is exactly why the default is worth stating.

Tool	Code retention (default hosted)	Training on your code
CodeRabbit	Ephemeral review environments; SOC 2 Type II	States it does not train on your code
Qodo	Zero-retention model account; troubleshooting data deleted within 48 hours	Zero-retention posture; SOC 2 Type II
Greptile	Stores code and embeddings on its servers until access is revoked	May use anonymized customer data for AI improvement unless you opt out
GitHub Copilot	Hosted on GitHub infrastructure	Free/Pro interaction data used for training unless opted out; Business/Enterprise excluded
Postil (hosted)	Stores the review envelope only; source code never persisted	Hosted default uses Postil's provider path; hosted BYOK routes through the worker to your provider

The detail behind each row:

CodeRabbit publishes SOC 2 Type II compliance, describes ephemeral review environments, and states it does not train on your code (trust center). The August 2025 RCE is a separate matter from its data-retention policy; both belong in an audit.

Qodo describes a zero-retention arrangement with its model provider and says troubleshooting data is deleted within 48 hours, alongside SOC 2 Type II (Qodo security post). Its open-source PR-Agent is a separate path you run yourself, which changes the inference-location answer entirely.

Greptile is the most retentive of the majors by its own description: it states that it stores code and embeddings on its servers until you revoke access, and that it may use anonymized customer data for AI improvement unless you opt out. Both store-by-default and train-by-default are present; both are reversible, but the default is the opposite of zero-retention. Greptile also offers a self-hosted, air-gapped enterprise deployment that sidesteps this entirely, so the posture you get depends on the tier you buy.

GitHub Copilot uses Free, Pro, and Pro+ interaction data for training unless you opt out, a policy in effect since April 24, 2025; Business and Enterprise are excluded (GitHub policy update). For code review specifically this means the training answer depends on which Copilot plan the reviewing identity is on.

The structural variable: where inference runs

The table above covers retention and training. The third question, where the inference call is made, splits the category along a line that the marketing rarely draws clearly. In the hosted majors, the vendor makes the model call on its own account: your diff goes to the vendor, the vendor sends it to a model provider, and you inherit whatever data-processing terms the vendor negotiated. That can be a perfectly good arrangement, but it is the vendor's arrangement, not yours.

The other shape is bring-your-own-key, where the tool sends the diff to a provider endpoint selected by the customer, authenticated with the customer's key, under that customer's data-processing agreement with the provider, or to a model the customer hosts. Among the majors this is mostly an enterprise or open-source path: CodeRabbit allows BYOK only when self-hosted, and Qodo's PR-Agent supports any key, including a local Ollama endpoint, because you run it yourself (PR-Agent). Macroscope, hosted Copilot, and Cursor Bugbot do not offer BYOK or model selection in their hosted products. The reason inference location matters is that it determines which contract governs your code in the one place it is most exposed: in transit to, and being processed by, a large language model.

Where Postil sits, stated plainly

Postil is one option on this map, and here is its data-flow without adjectives. CLI and self-hosted deployments send your diff directly to the OpenAI-compatible endpoint you configure, OpenRouter, Azure OpenAI, vLLM, LiteLLM, or a local Ollama, authenticated with your key. Hosted organizations can use the same BYOK model settings; in that case, the worker sends the diff to your configured provider and region under your provider relationship. Hosted organizations without BYOK model settings use Postil's configured provider credentials, with diffs sent from the worker to Postil's OpenRouter-compatible provider path and downstream model providers under Postil's provider relationship.

On retention, the hosted control plane persists exactly one artifact per review: the envelope, a JSON document with the summary, the findings (path, line, severity, confidence, and the finding text), token usage, and the gate verdict. The diff is fetched at review time, sent through the worker to either Postil's configured provider path or the provider your org configures for BYOK, and discarded with the process. There is no code cache, no embedding index, and no repository clone on our infrastructure, and a self-hosted deployment sends us nothing at all, no telemetry, no license pings, no update checks. When a BYOK credential is stored for the hosted product, it is sealed with AES-256-GCM before it touches the database and the settings form is write-only: a stored key can be replaced or removed, never read back out.

The GitHub App asks for the smallest permission set that does the job: contents: read to fetch the diff, pull_requests: write to post one batched review, checks: write for the two check-runs, and issues: write for explicit command replies, members: read to verify organization admins before recording approvals, and metadata: read. It never requests contents: write, so even a full compromise of an installation token cannot push a commit. That is the structural counterpart to the RCE lesson above: hold no credential a reviewer does not need. None of this is a claim that Postil is safer than any other tool; it is a description of one posture, checkable against the security page, the envelope schema, and the privacy policy, and against the open-source CLI directly.

Questions to ask any vendor (including us)

A SOC 2 guide making the rounds in 2026 gives the right instinct: get written confirmation, and "read the actual MSA, not the marketing page" (Probo). These are the questions that map to the three variables above:

After a review completes, what is retained on your servers, my source or only derived metadata, and for how long? Is there a written zero-retention or deletion commitment?
Do you train or improve any model on my code or on telemetry derived from it? Is that off by default, or on with an opt-out, and where is the toggle?
Whose account makes the model call, and under whose data-processing agreement? Can I point inference at an endpoint or model I control?
What is the exact GitHub (or GitLab) permission scope, and does it include write access to code? If the pipeline were compromised, what could an attacker reach?
Does the default posture differ from the enterprise tier? Quote me the default, since that is what I run on day one.

A vendor that can answer these in writing, default first, is one you can actually audit. A vendor that can only point at a trust badge is asking you to take the data-flow on faith, which, after August 2025, is the one thing this category has not earned.

Sources

Kudelski Security: CodeRabbit RCE write-up (Aug 19, 2025); HN discussion
CodeRabbit trust center (SOC 2 Type II, ephemeral reviews, no training on code)
Qodo security post (zero-retention, 48-hour troubleshooting deletion)
Greptile security page (stores code and embeddings until access revoked; anonymized data used for AI improvement unless opted out)
GitHub: Copilot interaction data usage policy (in effect Apr 24, 2025): Free/Pro train unless opted out, Business/Enterprise excluded
Qodo PR-Agent (open-source, BYOK including Ollama)
Augment: CTO AI coding checklist (security as a procurement gate)
Probo: AI coding tools and SOC 2 compliance (read the MSA, not the marketing page)
Postil controls, verifiable in source: the security page, the envelope schema, and the privacy policy

Audit our data-flow yourself.

Only the review envelope is retained. Hosted inference routing depends on provider settings; the full posture is on the security page.

Read the security page Install the CLI