Overview
The problem it solves
Running all 60 instrumented tests on every Android pull request takes 45 minutes on a managed emulator. Most of that time is spent on tests with no relation to the changed code. This agent cuts that down by using Claude to analyse the diff and select only the test classes that actually cover the affected areas.
The result is a four-tier selection system: skip entirely for docs and CI-only changes, run smoke tests for low-risk changes, run targeted tests for specific feature areas, or run the full suite when dependency updates or critical paths are touched. Every decision is explained in the PR comment.
How It Works
Three-phase pipeline
1. Analyse
On PR open or update, Claude reads the diff alongside a full test inventory built by scanning the repository. It applies escalation rules: login changes trigger targeted login tests, dependency bumps trigger the full suite, and docs-only changes skip testing entirely.
2. Run
The selected test classes run on a Nexus One API 34 managed emulator with Gradle build caching. Full suite runs across up to 4 shards in parallel, cutting worst-case execution from 45 minutes to around 15.
3. Diagnose
Test results (JUnit XML) are parsed and failing tests are sent to Claude for root cause analysis. The PR comment is updated with per-test diagnosis, suggested fixes, and recommended coverage gaps based on what the diff changed.
Reusable
Built as a reusable GitHub Actions workflow. Any Android repository can integrate it without duplicating the agent logic. A local analysis CLI lets developers run diff analysis before pushing to catch issues earlier.
Test Selection
Four-tier suite logic
| Suite | Trigger | Execution time |
|---|---|---|
| Skip | Docs, CI config, or README-only changes | 0 min |
| Smoke | Low-risk changes with no feature area match | ~10 min |
| Targeted | Feature-specific changes (login, content, sync, etc.) | ~20 min |
| Full | Dependency updates, auth flows, build config changes | ~45 min (4-shard parallel) |
Design Decisions
Key choices that matter
- Claude selects tests from a live inventory scan, not a manually maintained mapping file
- Four shards for the full suite brings worst-case time from 45 min to around 15 min
- PR comments update across all three phases so reviewers see progress, not just final results
- Test class names are sanitised before passing to Gradle to guard against injection
- Reusable workflow pattern means one repo maintains the agent, many repos consume it
- Local analysis CLI lets developers run diff analysis without triggering CI
Stack
Tools involved
- Claude API
- GitHub Actions
- Node.js 22
- Android Emulator (API 34)
- Gradle
- JUnit XML
- ESLint 9