Android PR Acceptance Testing Agent

Overview

The problem it solves

Running all 60 instrumented tests on every Android pull request takes 45 minutes on a managed emulator. Most of that time is spent on tests with no relation to the changed code. This agent cuts that down by using Claude to analyse the diff and select only the test classes that actually cover the affected areas.

The result is a four-tier selection system: skip entirely for docs and CI-only changes, run smoke tests for low-risk changes, run targeted tests for specific feature areas, or run the full suite when dependency updates or critical paths are touched. Every decision is explained in the PR comment.

How It Works

Three-phase pipeline

1. Analyse

On PR open or update, Claude reads the diff alongside a full test inventory built by scanning the repository. It applies escalation rules: login changes trigger targeted login tests, dependency bumps trigger the full suite, and docs-only changes skip testing entirely.

2. Run

The selected test classes run on a Nexus One API 34 managed emulator with Gradle build caching. Full suite runs across up to 4 shards in parallel, cutting worst-case execution from 45 minutes to around 15.

3. Diagnose

Test results (JUnit XML) are parsed and failing tests are sent to Claude for root cause analysis. The PR comment is updated with per-test diagnosis, suggested fixes, and recommended coverage gaps based on what the diff changed.

Reusable

Built as a reusable GitHub Actions workflow. Any Android repository can integrate it without duplicating the agent logic. A local analysis CLI lets developers run diff analysis before pushing to catch issues earlier.

Test Selection

Four-tier suite logic

Suite	Trigger	Execution time
Skip	Docs, CI config, or README-only changes	0 min
Smoke	Low-risk changes with no feature area match	~10 min
Targeted	Feature-specific changes (login, content, sync, etc.)	~20 min
Full	Dependency updates, auth flows, build config changes	~45 min (4-shard parallel)

Design Decisions

Key choices that matter

Claude selects tests from a live inventory scan, not a manually maintained mapping file
Four shards for the full suite brings worst-case time from 45 min to around 15 min
PR comments update across all three phases so reviewers see progress, not just final results
Test class names are sanitised before passing to Gradle to guard against injection
Reusable workflow pattern means one repo maintains the agent, many repos consume it
Local analysis CLI lets developers run diff analysis without triggering CI

Stack

Tools involved

Claude API
GitHub Actions
Node.js 22
Android Emulator (API 34)
Gradle
JUnit XML
ESLint 9

← All case studies