Pi-Droid

Control Android devices in real-time with AI agents using Pi-Droid.

Pi-Droid

Pi-Droid

Give your AI agent hands on an Android phone.

pi install pi-droid

Pi-Droid is a pi-agent extension that gives AI agents direct, real-time control over Android devices via ADB. Annotated screenshots with numbered element indices let the agent tap, swipe, and type without guessing pixel coordinates — works on any screen size, any device.

npm version license tests latest release


What it does

CategoryToolsHighlights
See6 perception toolsAnnotated screenshots with numbered indices, raw UI tree, OCR fallback
Touch5 input toolsTap by text/ID/coords, type with clear-first, swipe, scroll, key events
Navigate3 app toolsLaunch/stop apps, wait for elements, wait for activities
System6 system toolsBattery, settings, processes, logcat, shell, APK install
Manage3 device toolsMulti-device registry, preflight checks, WiFi ADB
Lock4 lock toolsQuery/set/clear PIN and pattern locks
Record2 recording toolsScreen recording, gesture macro record/save/replay
Automate3 automation toolsWake+unlock, find-and-tap with scrolling, scroll-until-found
Extend4 plugin toolsDiscover capabilities, run plugin actions, health check, heartbeat cycle

36 tools total, organized into 4 skills that scope tool sets for focused agent behavior.


Prerequisites

RequirementNotes
Node.js >= 18ESM support required
ADB on PATHadb devices should list your device
Android deviceUSB debugging enabled, USB or WiFi connected
ADBKeyboardRequired for Unicode text input via android_type
Tesseract OCR (optional)Enables android_ocr and OCR fallback in android_look. Without it, OCR is silently skipped — all other tools work normally. Install: sudo apt install tesseract-ocr

Installation

pi install pi-droid

Set your device serial (optional if only one device is connected):

export ANDROID_SERIAL=your_device_serial

Verify

npx tsx run.mts screen

Development

git clone https://github.com/ArtemisAI/pi-droid.git
cd pi-droid
npm install
npm run build
npm test

Tool Reference

Perception (6 tools)

ToolDescription
android_lookAnnotated screenshot with numbered element index — primary perception tool
android_screenshotRaw screenshot — use for failure diagnosis only
android_ui_dumpRaw UI tree XML for full element hierarchy
android_ocrTesseract OCR on current screen or saved screenshot
android_observeContinuous screen state observation
android_screen_stateCurrent activity, package, orientation, lock state as JSON

Input (5 tools)

ToolDescription
android_tapTap by coordinates, text match, or resource ID; supports long press
android_typeType text into the focused field; optional clear_first to replace
android_swipeSwipe between two coordinates with configurable duration
android_scrollScroll up or down on the current screen
android_keyPress a key: back, home, enter, tab, or any KEYCODE_*

App and Navigation (3 tools)

ToolDescription
android_appLaunch, stop, or check status of an app by package name
android_waitWait for an element to appear (by text or resource ID) with timeout
android_wait_activityWait for a specific activity to reach the foreground

System (6 tools)

ToolDescription
android_device_infoBattery, network, and hardware info
android_settingsRead/write system settings (WiFi, Bluetooth, brightness, volume, etc.)
android_processesList running processes, kill by PID or name
android_logcatCapture, search, and clear logcat
android_shellExecute arbitrary ADB shell commands
android_installInstall/uninstall APKs, check package versions

Device Management (3 tools)

ToolDescription
android_devicesList connected devices, register/unregister, set active device
android_preflightRun device readiness checks (ADB, screen, battery, etc.)
android_wifiConnect/disconnect WiFi ADB, auto-discover devices

Lock Management (4 tools)

ToolDescription
android_lock_statusQuery current lock state and type
android_lock_clearRemove existing lock
android_lock_set_patternSet a pattern lock
android_lock_set_pinSet a PIN lock

Recording and Macros (2 tools)

ToolDescription
android_recordStart/stop screen recording, pull recordings
android_macroRecord, save, load, and replay gesture macros

Automation (3 tools)

ToolDescription
android_ensure_readyWake screen, unlock, dismiss overlays — call before any automation
android_find_and_tapSearch UI tree for element and tap it; retries with scrolling
android_scroll_findScroll until an element appears, then return it

Plugin System (4 tools)

ToolDescription
android_skillsDiscover all loaded plugin capabilities and parameters
android_plugin_actionExecute a plugin action with approval gates for sensitive operations
android_plugin_statusGet health status of all loaded plugins
android_plugin_cycleRun a plugin's autonomous heartbeat cycle

Skills

Pi-Droid defines four skills that scope tool sets for focused agent behavior. Skills are discovered automatically by pi-agent's package manager.

SkillDescriptionKey Tools
android-screenPerceive device statelook, screen_state, screenshot, ui_dump, ocr, observe
android-interactPerform actions on the devicetap, type, swipe, scroll, key, app
android-automateHigh-level automation sequencesensure_ready, find_and_tap, scroll_find, wait, wait_activity, preflight
android-pluginManage and execute app pluginsskills, plugin_action, plugin_status, plugin_cycle

Plugin System

Pi-Droid's plugin system lets you add app-specific automation behind a standard interface. Plugins can declare approval gates for sensitive actions (sending messages, making purchases, posting content).

Building a plugin

Extend CliPlugin for CLI-backed apps, or implement PiDroidPlugin for full control:

import { CliPlugin } from "pi-droid";

export class WeatherPlugin extends CliPlugin {
  name = "weather";
  // ...
}

Loading plugins

Plugins are configured in config/default.json and loaded automatically on session start:

{
  "plugins": {
    "weather": {
      "enabled": true,
      "package": "@example/pi-droid-weather"
    }
  }
}

Distribute plugins as npm packages. See PLUGINS.md for the full plugin development guide, manifest schema, and marketplace integration.


Configuration

Pi-Droid reads configuration from config/default.json:

{
  "adb": {
    "serial": "your_device_serial"
  },
  "plugins": {},
  "routing": {}
}
KeyPurpose
adb.serialDevice serial (overridden by ANDROID_SERIAL env var)
pluginsPlugin configurations keyed by name
routingInput router settings for deterministic action dispatch

Environment variables:

VariablePurpose
ANDROID_SERIALTarget device serial (takes precedence over config)

Programmatic Usage

Pi-Droid exports its full ADB layer for use in custom automation scripts, external plugins, or standalone tools:

import {
  Device,
  tap, swipe, typeText, keyEvent,
  takeScreenshot, annotatedScreenshot,
  getScreenState, waitForActivity,
  launchApp, stopApp,
  dumpUiTree, findElement, waitForElement,
  ensureReady, findAndTap, scrollToFind,
  getBatteryInfo, getDeviceInfo,
  adbShell,
} from "pi-droid";

// Connect to a device
const device = await Device.connect(process.env.ANDROID_SERIAL);

// Take an annotated screenshot with numbered elements
const annotated = await annotatedScreenshot();

// Find and tap an element by text
await findAndTap({ text: "Settings" });

// Wait for an activity transition
await waitForActivity("com.android.settings/.Settings");

// Run a raw ADB shell command
const result = await adbShell("dumpsys battery");

Available exports

  • Device abstraction: Device
  • Command execution: adb, adbShell, AdbError, listDevices, isDeviceReady
  • Input: tap, swipe, typeText, keyEvent, pressBack, pressHome, pressEnter, scrollDown, scrollUp
  • Screen state: getScreenState, waitForActivity, getActivityStack, isKeyboardVisible, getOrientation
  • Screenshots and perception: takeScreenshot, screenshotBase64, annotatedScreenshot, dumpUiTree, findElements, findElement, waitForElement
  • App management: launchApp, stopApp, getAppInfo, listPackages, wakeScreen, isScreenOn
  • Monitoring: getBatteryInfo, getNetworkInfo, getDeviceInfo, isScreenLocked, getRunningApps
  • Automation: ensureReady, findAndTap, scrollToFind, DefaultStuckDetector, createTaskBudget
  • OCR: runOcrOnImage, runOcrOnCurrentScreen
  • Plugin system: PluginManager, CliPlugin, TelegramPlugin, ApprovalQueue
  • Types: AdbExecOptions, UIElement, Bounds, ElementSelector, StuckEvent, and more

Development

npm install            # Install dependencies
npm run build          # Compile TypeScript
npm run lint           # Type-check without emitting
npm test               # Run all tests (473+)
npm run test:unit      # Unit tests only (mocked ADB)
npm run test:ci        # CI-safe subset (unit + sandbox integration)
npm run test:integration # All integration tests
npm run test:device    # Full suite including device E2E
npm run dev            # Development mode (load as pi extension)

Project structure

src/
  index.ts          Extension entry point
  adb/              ADB primitives (app-agnostic, 28 modules)
  tools/            LLM tool registrations
  plugins/          Plugin system (loader, CLI base class, marketplace)
  notifications/    Notification channels and approval queues
skills/             Skill definitions (scoped tool sets)
tests/              473+ tests mirroring src/ structure
config/             Default configuration

Contributing

See CONTRIBUTING.md for development guidelines, coding standards, and the PR checklist.

Security

See SECURITY.md for vulnerability reporting and security considerations.

Changelog

See CHANGELOG.md for release history.

License

MIT -- ArtemisAI