Postil

Blog

Research notes.

Writing about AI code review the way we build it: claims dated, sources linked, competitors compared on the record. We say less. What we say is right.

June 2026

Where does your code actually go? A data-flow audit of AI code review tools

AI reviewers differ less on what they find than on where your code goes, who keeps it, and whether it trains a model. A class-by-class audit of retention, training, and inference location, every fact dated and sourced.

June 2026

Self-hosted AI code review without the 500-seat enterprise gate

CodeRabbit gates self-hosting behind 500 seats; most rivals don't offer it at all. Run a full AI code reviewer locally with Ollama in about 15 minutes, free, at any team size, BYO key, no markup.

June 2026

Every AI code review benchmark has the same winner: its author

Greptile scored 82% on its own benchmark and 45% when a rival re-ran it. Why vendor code-review benchmarks are marketing in a lab coat, and a five-point test for spotting a rigged one.

June 2026

Why GitHub Copilot can't block your merge (and how a real AI merge gate works)

Branch protection blocks on required status checks that conclude failure, not on review comments or neutral checks. Copilot posts a Comment, Claude Code review and Macroscope conclude neutral. The mechanic, and how a real two-check gate works.

June 2026

AI code review pricing in 2026: what a 20-developer team actually pays

Four vendors changed pricing models in ninety days. We run the same 20-developer team through seven tools, assumptions stated, arithmetic shown, every price dated and sourced.

June 2026

Best AI code review tools in 2026: an evidence-first comparison

CodeRabbit, Qodo, Macroscope, Greptile, Copilot, Bugbot, and Postil, compared on noise, merge gating, self-hosting, data handling, and a pricing landscape that changed four times in ninety days. Every claim dated and sourced.

June 2026

The silence rate: the AI code review metric nobody publishes

Developers stop reading AI reviewers that are wrong a third of the time. The metric that predicts it is the one no vendor benchmark reports: how often the tool says nothing.