Context
The problem it solves
Traditional code review is good at spotting style issues, obvious mistakes, and architectural concerns. It is much less reliable at noticing when a new change quietly resembles a past production failure, especially in large or fast-moving codebases.
Quality teams often have valuable defect history spread across Sentry, issue trackers, and commit logs, but that knowledge rarely shows up at the exact moment a risky pull request is under review.
Design goal
Turn historical bug knowledge into something reviewable and immediate, not something buried in old tickets after a regression has already shipped.
Workflow
How it works
1. Build the knowledge base
Past bugs, incidents, issue tickets, and blame history are collected and stored in a vector-backed retrieval layer so the system can surface similar historical failures for any new change.
2. Inspect the pull request diff
The agent analyses changed files and code patterns, then retrieves the most relevant historical examples before generating any review output.
3. Reason over evidence
GPT-4o is used as a reasoning layer, not the source of truth. It compares the diff against retrieved examples and identifies likely regression patterns with supporting references.
4. Post review comments
Findings are sent back to GitHub as structured review comments so the PR author gets signal in the same workflow where decisions are already happening.
Key Decisions
Important design choices
- Retrieval before generation The system anchors analysis in real defect history instead of asking the model to speculate from code alone.
- Evidence-backed comments only Review output is useful because it links risky changes to historical failures, not because it sounds intelligent.
- Continuous learning loop Confirmed regressions feed back into the knowledge base so the reviewer improves over time.
Stack
Tools involved
- Python
- OpenAI GPT-4o
- GitHub API
- Sentry API
- Asana API
- Vector database