Documentation
PhysiClaw gives an AI agent a physical body to operate a phone: one overhead camera reads the screen and a stylus on a 3-axis arm taps it — no APIs, and no app on the phone. These docs go from a parts list to a running agent, plus the tools, gestures, and built-in agent that drive it.
New here? Read the Introduction, see how it works, then build one. For the bigger picture — why PhysiClaw exists — read the Vision.
Get started
Section titled “Get started” Introduction What PhysiClaw is and the problem it solves.
How it works One full look → decide → move → check loop.
What you'll build The rig, the cost, and the road ahead.
Vision Why PhysiClaw exists — a personal assistant you truly own.
Concepts
Section titled “Concepts” System architecture Agent → MCP server → camera + arm → phone.
How it sees peek vs screenshot, OCR + icon detection, bboxes.
Build & reference
Section titled “Build & reference” Bill of materials Every part for a ~$112 build.
MCP tools The tools an agent calls to drive the phone.
Gestures How taps, long-presses, and swipes reach the screen.