← Case Studies

Quality Assurance · Android · CI/CD

Android PR Acceptance Testing Agent

Claude-powered GitHub Actions agent that reads a PR diff, picks only the relevant Android test classes, runs them on a managed emulator, and posts failure diagnosis and coverage recommendations back to the PR.

The problem it solves

Running all 60 instrumented tests on every Android pull request takes 45 minutes on a managed emulator. Most of that time is spent on tests with no relation to the changed code. This agent cuts that down by using Claude to analyse the diff and select only the test classes that actually cover the affected areas.

The result is a four-tier selection system: skip entirely for docs and CI-only changes, run smoke tests for low-risk changes, run targeted tests for specific feature areas, or run the full suite when dependency updates or critical paths are touched. Every decision is explained in the PR comment.

Three-phase pipeline

1. Analyse

On PR open or update, Claude reads the diff alongside a full test inventory built by scanning the repository. It applies escalation rules: login changes trigger targeted login tests, dependency bumps trigger the full suite, and docs-only changes skip testing entirely.

2. Run

The selected test classes run on a Nexus One API 34 managed emulator with Gradle build caching. Full suite runs across up to 4 shards in parallel, cutting worst-case execution from 45 minutes to around 15.

3. Diagnose

Test results (JUnit XML) are parsed and failing tests are sent to Claude for root cause analysis. The PR comment is updated with per-test diagnosis, suggested fixes, and recommended coverage gaps based on what the diff changed.

Reusable

Built as a reusable GitHub Actions workflow. Any Android repository can integrate it without duplicating the agent logic. A local analysis CLI lets developers run diff analysis before pushing to catch issues earlier.

Four-tier suite logic

Suite Trigger Execution time
Skip Docs, CI config, or README-only changes 0 min
Smoke Low-risk changes with no feature area match ~10 min
Targeted Feature-specific changes (login, content, sync, etc.) ~20 min
Full Dependency updates, auth flows, build config changes ~45 min (4-shard parallel)

Key choices that matter

  • Claude selects tests from a live inventory scan, not a manually maintained mapping file
  • Four shards for the full suite brings worst-case time from 45 min to around 15 min
  • PR comments update across all three phases so reviewers see progress, not just final results
  • Test class names are sanitised before passing to Gradle to guard against injection
  • Reusable workflow pattern means one repo maintains the agent, many repos consume it
  • Local analysis CLI lets developers run diff analysis without triggering CI

Tools involved

  • Claude API
  • GitHub Actions
  • Node.js 22
  • Android Emulator (API 34)
  • Gradle
  • JUnit XML
  • ESLint 9