It owns the loop
The agent runs the full look → decide → act cycle itself: peek to see, choose a
bbox and a gesture, tap, then peek again to check. No external model in the loop.
PhysiClaw isn’t only a set of tools an outside model calls — it ships its own agent, a brain that can drive the phone on its own.
You can run PhysiClaw two ways. As a plain MCP server, it hands the twelve tap/swipe/peek tools to whatever agent you already use — Claude Desktop, an IDE, your own client — and that external model does the deciding. As a built-in agent, PhysiClaw is the model: it runs its own look → decide → act loop, in its own process, with no external client attached. Same robot, same tools — the difference is whose mind is in charge.
A plain MCP server is reactive: it sits still until some external client sends a tool call. The built-in agent closes that gap — it can be the initiator.
It owns the loop
The agent runs the full look → decide → act cycle itself: peek to see, choose a
bbox and a gesture, tap, then peek again to check. No external model in the loop.
It runs unattended
It wakes on its own — on a schedule or when the phone screen changes — operates the phone, and goes back to sleep. Nobody has to be sitting at a client.
It remembers
A persistent memory carries facts across wakes, so the agent isn’t starting cold every time. (See Memory & skills.)
It learns routines
Skills are reusable, app-specific playbooks the agent discovers and follows — “how to send a WeChat message,” “how to place a grocery order” — instead of re-figuring-out each app every time.
Every wake runs the same loop you already met in How it works — the agent just drives it instead of an external client:
wake ──► LOOK ──► DECIDE ──► ACT ──► LOOK ──► … ──► close trigger peek pick a tap / peek (DONE / WAIT / fires (camera) bbox + swipe again, FAIL / IDLE) gesture re-decideA few rules keep the loop honest. Each turn is shaped as exactly [note, one-other] —
one running-summary note plus one real action — so the agent takes one step, records why,
and never fires a burst of taps blind. Every turn ends by looking at the result, so a
popup or a slow load is just the next state to react to, not a script to fall out of. And
each session ends with a one-word verdict — DONE, WAIT, FAIL, or IDLE — that says
what happened and whether to follow up.
You don’t have to choose once and for all — the same install does both.
| Plain MCP server | Built-in agent | |
|---|---|---|
| Who decides | your external client (Claude Desktop, an IDE) | PhysiClaw itself |
| Starts a task | you, by prompting the client | a trigger: a schedule or a screen change |
| Runs unattended | no — needs a client connected | yes — wakes, acts, sleeps |
| Memory & skills | up to your client | built in |
Reach for the plain server when you want to keep an existing agent in the loop and just give it hands. Reach for the built-in agent when you want PhysiClaw to run on its own — a recurring chore, a watch-and-react task, a phone that does things while you’re away.