LLM PR Regression Checker

Context

The problem it solves

Traditional code review is good at spotting style issues, obvious mistakes, and architectural concerns. It is much less reliable at noticing when a new change quietly resembles a past production failure, especially in large or fast-moving codebases.

Quality teams often have valuable defect history spread across Sentry, issue trackers, and commit logs, but that knowledge rarely shows up at the exact moment a risky pull request is under review.

Design goal

Turn historical bug knowledge into something reviewable and immediate, not something buried in old tickets after a regression has already shipped.

Workflow

How it works

1. Build the knowledge base

Past bugs, incidents, issue tickets, and blame history are collected and stored in a vector-backed retrieval layer so the system can surface similar historical failures for any new change.

2. Inspect the pull request diff

The agent analyses changed files and code patterns, then retrieves the most relevant historical examples before generating any review output.

3. Reason over evidence

GPT-4o is used as a reasoning layer, not the source of truth. It compares the diff against retrieved examples and identifies likely regression patterns with supporting references.

4. Post review comments

Findings are sent back to GitHub as structured review comments so the PR author gets signal in the same workflow where decisions are already happening.

Key Decisions

Important design choices

Retrieval before generation The system anchors analysis in real defect history instead of asking the model to speculate from code alone.
Evidence-backed comments only Review output is useful because it links risky changes to historical failures, not because it sounds intelligent.
Continuous learning loop Confirmed regressions feed back into the knowledge base so the reviewer improves over time.

Stack

Tools involved

Python
OpenAI GPT-4o
GitHub API
Sentry API
Asana API
Vector database

View on GitHub → ← All case studies