← All posts
Guide · 8 min read · 2026-04-15

A practical guide to multi-phone Android testing

If you have more than two test phones, you've probably hit at least one of these:

Eight patterns we've seen work for teams running QA on 3-50 real devices, plus the anti-patterns to skip.

Pattern 1: Name your phones, always

Don't trust the model name to identify a device. Two Pixel 7s on a desk are indistinguishable in any list view. Name them — "Pixel 7 · Test slot 1" or "Samsung A16 · Hebrew RTL" or "OnePlus 12 · low-spec proxy". You'll remember the names for years; you'll never remember SM-A165F_f67595.

In DroidFleet you do this once per phone and the name follows the device across reinstalls. In a USB setup you can stick a label on the back; same idea.

Pattern 2: Have a "physical identification" workflow

The single most underrated feature of any multi-device setup: a button that says "make THIS phone make a noise / vibrate / show a toast". When you have three identical phones plugged in and the dashboard says "phone 2 is failing", you need to be able to know which one phone 2 is, in 3 seconds.

In DroidFleet this is the 📳 icon next to each phone. Without it, debugging on a fleet is hell.

Pattern 3: Pair builds with git commits

Every install should be tagged with the git SHA it was built from. Otherwise:

One simple discipline: install-wireless?gitSha=$(git rev-parse HEAD)&gitBranch=$(git branch --show-current). Two seconds of CI config saves weeks of "wait, when did this start?"

Pattern 4: Capture screenshots at well-defined points

"Screenshots after the test" is not a useful artifact. Screenshots at well-defined moments — cold start, after rotate, after navigate-to-settings, after returning from background — are diff-able across runs. That's where you catch the layout regression that broke a button on Samsung but not on Pixel.

Naming convention matters. cold-start.png beats screenshot1.png by miles. Using the same labels on every run is what makes screenshot diff possible.

Pattern 5: Stream logs, don't dump them

The post-hoc adb logcat -d > log.txt is fine for one run on one phone. It's useless when you want to watch a behavior unfold in real time across three devices.

SSE / WebSocket-streamed logs let you tail multiple phones in one window with filters. The first time you watch a race condition appear in the logs simultaneously on three devices is the moment you understand why this matters.

Pattern 6: Aggressive crash dedup

Without dedup, a 100-phone fleet generates 100 crash reports for one bug. With it, you see one card with "100 hits" and a list of which phones are affected. The signal-to-noise improvement is dramatic.

The right dedup key is the top 5 application stack frames with line numbers stripped. Strip line numbers because they shift across builds (whitespace changes count). Top 5 to handle deep call stacks where the bottom is shared but the middle differs.

Pattern 7: Use cohorts for risky changes

You're about to ship a refactor that touches the audio pipeline. Three options:

  1. Ship it to all phones, hope.
  2. Ship to one phone, eyeball it, then to the rest.
  3. Ship to 20% of phones (deterministically), let it bake for a day, compare crash rates against the 80% on the previous build.

Option 3 catches subtle regressions that #1 hides and #2 misses. The trick is that cohort assignment must be sticky — the same phone is always in the same group, otherwise you can't compare anything.

Pattern 8: Wake phones on demand

Phones go to sleep. They drop their network. Battery dies. Without a way to wake them on demand, you're stuck physically picking up each phone, unlocking it, opening your test app — when you have 10 phones, this is the difference between a 30-second test cycle and a 15-minute one.

FCM (Firebase Cloud Messaging) is the right answer on Android. The phone listens for a "wake" push, fires up the test agent, you push your build. Free, reliable, built into every Google Services-enabled phone.

Anti-patterns to avoid

Don't use ADB over Wi-Fi as your primary path

It works for two minutes, then drops. The reconnection sequence is finicky. Save it for one-off debug sessions.

Don't pair a single test session to a single device

If your test framework can only test one phone per run, you'll never catch regressions that depend on parallel state (e.g. multi-user features, server-side races). Run on multiple phones in parallel from the start.

Don't store APKs on a single dev's laptop

Every team has the "wait, the build is on Sarah's laptop and she's sick today" anti-pattern. Push every build to a shared registry — your DroidFleet account, your CI artifacts, an S3 bucket. Whatever survives a single laptop being closed.

Don't mix prod and test FCM tokens

Sending a test wake-up to your production app's FCM topic IS a production incident. Use a dedicated test FCM project; the agent should default to dev tokens.

Putting it together

If you adopt all eight patterns, your test loop looks like:

  1. You push a commit.
  2. CI builds the APK with the git SHA stamped in BuildConfig.
  3. CI calls install-wireless?gitSha=... against your phone fleet.
  4. FCM wakes any sleeping phone.
  5. The APK installs in parallel on all named phones.
  6. Auto-test runs: cold-start, rotation, screenshots at labeled points.
  7. Logs stream to your dashboard. Crashes (if any) get deduplicated and grouped.
  8. Slack notification fires when the run completes — green or red.

Total elapsed: 60-90 seconds. Setup time: one afternoon. Ongoing maintenance: ~zero.

This is what DroidFleet is built for. But the patterns work in any tool — that's the actual point of the post. Get the loop right, then the tool is just an implementation detail.

Try DroidFleet free · questions?