Documentation

PhysiClaw gives an AI agent a physical body to operate a phone: one overhead camera reads the screen and a stylus on a 3-axis arm taps it — no APIs, and no app on the phone. These docs go from a parts list to a running agent, plus the tools, gestures, and built-in agent that drive it.

New here? Read the Introduction, see how it works, then build one. For the bigger picture — why PhysiClaw exists — read the Vision.

Get started

Introduction What PhysiClaw is and the problem it solves.

How it works One full look → decide → move → check loop.

What you'll build The rig, the cost, and the road ahead.

Vision Why PhysiClaw exists — a personal assistant you truly own.

Concepts

System architecture Agent → MCP server → camera + arm → phone.

How it sees peek vs screenshot, OCR + icon detection, bboxes.

Build & reference

Bill of materials Every part for a ~$112 build.

MCP tools The tools an agent calls to drive the phone.

Gestures How taps, long-presses, and swipes reach the screen.