Files
gui-agent/RULES.md

44 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 手机 GUI Agent 自动操控
## 架构
七层管线闭环:截屏 → 理解 → 定位 → 规划 → 执行 → 验证 → 循环
```
src/
├── capture/ # L1 - ADB/scrcpy 截屏
├── vision/ # L2 - VLM 屏幕理解
├── grounding/ # L3 - 元素定位(自然语言→坐标)
├── planner/ # L4 - 任务规划与分解
├── executor/ # L5 - ADB 动作执行
└── verifier/ # L6+L7 - 验证纠错 + 状态记忆
```
## 启动
- `python -m src.main` — 主服务,端口 4380
- `python scripts/test_device.py` — 测试 ADB 连接
## 技术栈
- Python 3.11+
- ADB + scrcpy截屏与操控
- Qwen2.5-VL / UI-TARS-1.5(视觉理解)
- FastAPIWeb 控制台)
- Poe API / OpenRouterLLM 调用,按用户偏好)
## 环境变量
- `DEVICE_SERIAL` — Android 设备序列号adb devices 查看)
- `VLM_PROVIDER` — vlm 提供者:`local` / `poe` / `openrouter`
- `VLM_MODEL` — 模型名,默认 `Qwen/Qwen2.5-VL-7B-Instruct`
- `POE_API_KEY` — Poe API KeyVLM_PROVIDER=poe 时必填)
- `OPENROUTER_API_KEY` — OpenRouter Key备用
## 规则
- 截屏用 adb exec-out screencap不用 scrcpy 录屏流(省资源)
- 动作执行后必须等待 + 重新截屏验证
- 所有截屏保存到 `data/screenshots/` 供调试
- 坐标系统统一为百分比 (0-1),执行时再转设备像素