Skip to content

Introduction

You’ve seen an AI agent click around a screen. PhysiClaw lets it touch a real one.

PhysiClaw is a small desktop robot that gives an AI agent a physical body to operate a phone. A single camera looks down at the screen; a capacitive stylus on a 3-axis arm reaches out and taps the glass. The agent reads what’s on screen, decides what to do, and the arm does it — the same loop a person runs, just with a camera for an eye and a stylus for a finger.

There’s no app to install on the phone, no API to integrate, and no account to connect. To the phone, the stylus is indistinguishable from a fingertip — so any app works, iOS or Android, with zero per-app setup.

Software agents that “use your phone” lean on one of three things. Each is a wall.

APIs & OAuth

Every service needs its own integration, keys, and consent screens. New app, new wall — and the apps you most want (your bank, a delivery app) often have no public API at all.

Accessibility hooks

Automation frameworks and screen-reader bridges are detectable, blockable, and break the moment an app redesigns.

Jailbreaks

Rooting a device to inject taps is fragile, unsafe, and off-limits for most people.

PhysiClaw sidesteps all three by not touching the software stack at all. The only thing reaching the phone is a stylus tip — so there is nothing to integrate, nothing to detect, and nothing to jailbreak.

The loop is deliberately simple, and it’s the same every time:

  1. Look. The overhead camera photographs the screen. On-device vision boxes and labels every button, icon, and line of text it finds.
  2. Decide. The agent reads that annotated view and picks a target — this box — and an action: tap, swipe, long-press.
  3. Move & touch. The arm drives the stylus to the target and the tip drops to register the touch, then lifts away.
  4. Check. The camera looks again. Did the screen change the way it should? If yes, on to the next action; if not, try again.

Because every step ends by looking at the result, a surprise — a popup, an ad, a slow load — is just the new state to react to, not a script to fall out of. How it works walks through one full loop in detail.

The “nothing installed” line is almost true, and the difference matters — so here it is straight:

Builders who want agents to act in the real world: automation tinkerers, robotics learners, and anyone tired of writing one more integration to do something a finger could do. A full build is about $112 in off-the-shelf parts and an afternoon of assembly and calibration — no soldering, no custom boards.