From b4bcbea2e6836d2c80f7550426a10e261d691d0f Mon Sep 17 00:00:00 2001
From: kang <wankang2050@gmail.com>
Date: Fri, 24 Apr 2026 20:15:03 +0800
Subject: [PATCH] auto-save 2026-04-24 20:14 (~2)

---
 .memory/source-analysis.md | 768 ++++++++++++++++++++++++++++++++++++-
 .memory/worklog.json       |   7 +
 2 files changed, 766 insertions(+), 9 deletions(-)

diff --git a/.memory/source-analysis.md b/.memory/source-analysis.md
index f847644..bbe7b16 100644
--- a/.memory/source-analysis.md
+++ b/.memory/source-analysis.md
@@ -1,16 +1,766 @@
-# ARES 源码解析（withmartian RL Agent 训练框架） 源码解析
+# ARES 源码解析
 
-> 创建日期：2026-04-24
-> 上游版本：待填写
+> **ARES = Agentic Research & Evaluation Suite**
+> withmartian 出品的 **RL-first LLM Agent 训练 / 评估框架**
+> 解析日期：2026-04-24 · 上游版本：`c804aa2` · 仓库：https://github.com/withmartian/ares
 
-## 概览
+---
 
-待补充
+## 0. TL;DR
 
-## 核心模块
+- **定位**：Agent RL 的"基础设施层"（环境 + 容器 + 观察-动作适配），不是训练算法，也不是成品 Agent。类比：健身房 + 监考系统，给 `trl / verl / OpenPipe` 这类训练器用。
+- **核心抽象**：LLM 请求 = observation，LLM 响应 = action，把 LLM-as-Agent 套进 dm_env 协议。
+- **最关键模式**：`QueueMediatedLLMClient` —— 用 `asyncio.Queue` 拦截 Agent 的 LLM 调用，让 Agent 线性代码无感地被 RL 环境控制。
+- **规模**：Python 8,339 LOC（非测试）+ 测试 3,561 LOC + Go 代理 5 文件。7 个核心模块 + `ares-proxy` 独立子项目。
+- **双栈容器**：Daytona（云，默认）+ Docker（本地），Janitor 用 `atexit` 兜底清理。
+- **双栈 Agent**：`MiniSWECodeAgent`（轻量封装）+ `Terminus2Agent`（生产级，1,110 行，带 tmux 会话 + 主动概括）。
+- **双栈拦截**：in-process `asyncio.Queue`（零开销） + `ares-proxy` Go HTTP 代理（跨进程）。
 
-待补充
+---
 
-## 关键流程
+## 1. 项目定位与生态位
 
-待补充
+### 1.1 它不是什么
+
+| 常见误解 | 实际 |
+|---|---|
+| 成品 Agent 产品（类 Manus） | ❌ 不是终端产品，不直接服务用户 |
+| 训练算法库（类 trl / verl） | ❌ 不做 PPO / GRPO 权重更新 |
+| LLM 路由器（withmartian 主业） | ❌ 那是另一条产品线 |
+
+### 1.2 它是什么
+
+**Agent RL 循环的"考场设施"**：
+
+```
+训练者 ──train──→ [ARES 环境]
+                       ↓
+                ┌──────────────────┐
+                │  [Agent 沙箱]    │
+                │   Agent LLM 做题 │  ← rollout
+                │   容器工具执行   │
+                └──────────────────┘
+                       ↓
+                reward → 训练框架 → 更新权重
+```
+
+ARES 负责造考场（环境、沙箱、观察-动作通路），训练框架（trl / openpipe / verl）负责拿分数算梯度。
+
+### 1.3 谁用
+
+- 做 Agent 后训练（post-training / fine-tuning）的研究员
+- SWE-bench Verified 类基准的评估者
+- 想把自家 Agent 接入大规模并行评估的工程团队
+- withmartian 自家的 Router / Agent 产品线
+
+---
+
+## 2. 核心抽象：Agent 当成 RL
+
+参考：`CLAUDE.md:97-106`、`src/ares/environments/base.py:31-150`
+
+```
+┌─────────────────────────────────────────────────────┐
+│                    RL Loop                          │
+│                                                     │
+│  Env.reset()         ─────────→  TimeStep(FIRST)    │
+│      │                              ↓               │
+│      │                     observation = LLMRequest │
+│      │                              ↓               │
+│      ↓                         Agent LLM            │
+│  Env.step(action)   ←─────────  action = LLMResponse│
+│      │                              ↑               │
+│      ↓                         agent 继续           │
+│  TimeStep(MID, reward=0)  ─→  下一个 LLMRequest     │
+│      │                                              │
+│      ↓                                              │
+│  TimeStep(LAST, reward) ← 终止（250步/完成/错误）    │
+└─────────────────────────────────────────────────────┘
+```
+
+**数据流**：
+
+- **Observation = `LLMRequest`**（`src/ares/llms/request.py`）：含 messages、temperature、tool definitions。
+- **Action = `LLMResponse`**（`src/ares/llms/response.py`）：含 ChatCompletion 返回体、成本统计。
+- **Reward**：episode 结束时从容器内 `/reward.txt` 或 `/reward.json` 读出（`src/ares/environments/code_env.py:302-315`）。
+
+**Environment 协议**（`src/ares/environments/base.py:71-138`）：
+
+```python
+class Environment(Protocol[ActionType, ObservationType, RewardType, DiscountType]):
+    async def reset(self) -> TimeStep[ObservationType, RewardType, DiscountType]: ...
+    async def step(self, action: ActionType) -> TimeStep[...]: ...
+    async def close(self) -> None: ...
+```
+
+**TimeStep**（`base.py:31-69`）= namedtuple `(step_type, reward, discount, observation)`，`step_type ∈ {FIRST, MID, LAST}`，FIRST 的 reward 必须为 None。
+
+---
+
+## 3. 架构全图
+
+```
+┌──────────────────────────────────────────────────────────────┐
+│                    Public API (__init__.py)                  │
+│   ares.make() · ares.info() · @register_env · TimeStep       │
+└──────────────────────────────────────────────────────────────┘
+           │
+           ↓
+┌──────────────────────────────────────────────────────────────┐
+│   Registry (registry.py) + Presets (presets.py)              │
+│   · HarborSpec × {mini_swe_agent, terminus2_agent}           │
+│   · TwentyQuestionsSpec                                      │
+│   · Selector 语法: `sbv-mswea:0:10`, `sbv-mswea@2/8`         │
+└──────────────────────────────────────────────────────────────┘
+           │
+           ↓
+┌────────────────────────┐     ┌─────────────────────────────┐
+│ CodeEnvironment        │────→│ Container                   │
+│ (code_env.py)          │     │ ├── DaytonaContainer (云)   │
+│ · reset / step 主循环  │     │ └── DockerContainer (本地)  │
+│ · 250 步上限           │     │ Janitor (atexit 兜底)       │
+│ · reward 读取          │     └─────────────────────────────┘
+└────────────────────────┘
+           │
+           ↓ 启动 agent 作为独立 asyncio Task
+┌────────────────────────┐     ┌─────────────────────────────┐
+│ CodeAgent (protocol)   │────→│ LLMClient (protocol)        │
+│ ├── MiniSWECodeAgent   │     │ ├── QueueMediatedLLMClient  │
+│ └── Terminus2Agent     │     │ │   ← 拦截到环境             │
+│     (tmux + 概括)      │     │ ├── ChatCompletionClient    │
+└────────────────────────┘     │ ├── LlamaCppClient (本地)   │
+           │                    │ └── HookedTransformerClient │
+           ↓                    └─────────────────────────────┘
+    ┌──────────────────────────────┐
+    │ ares-proxy (Go)              │
+    │ HTTP 版 queue-mediated       │
+    │ 用于跨进程/跨容器拦截        │
+    └──────────────────────────────┘
+```
+
+---
+
+## 4. 模块深度拆解
+
+### 4.1 Environment 层
+
+#### 4.1.1 Base 协议（`src/ares/environments/base.py`）
+
+- **Environment Protocol**（71-138 行）：四个泛型参数 `[ActionType, ObservationType, RewardType, DiscountType]`，三个 async 方法 `reset() / step(action) / close()`，实现 `__aenter__`/`__aexit__`。
+- **TimeStep**（31-69 行）：namedtuple，`step_type: Literal["FIRST", "MID", "LAST"]`，带 `last()` 辅助方法。
+
+#### 4.1.2 CodeEnvironment 主循环（`src/ares/environments/code_env.py:58-161`）
+
+继承签名：
+```python
+class CodeEnvironment(base.Environment[response.LLMResponse, request.LLMRequest | None, float, float])
+```
+Action 类型 = `LLMResponse`，Observation 类型 = `LLMRequest | None`（None 表示 episode 末端）。
+
+**`reset()` 流程**（95-127 行）：
+
+1. 清空步数、停旧容器（104-108 行）
+2. 随机选任务 `_reset_task()`（110 行）
+3. 启动容器 `_start_container()`（111 行）
+4. 启动 agent 作为独立 asyncio Task（112 行）—— agent 代码是线性的，环境把它跑在后台
+5. 等 agent 发出第一个 LLM 请求 `_get_time_step()`（114 行）
+6. 包装成 FIRST TimeStep 返回
+
+**`step(action)` 流程**（129-161 行）：
+
+1. 步数 +1（138 行）
+2. 把 `action`（LLMResponse）喂回 agent：`_llm_req_future.set_result(action)`（142 行）—— 唤醒 agent 的 `await`
+3. 等 agent 下一个 LLM 请求或 agent 任务完成（146 行）
+4. 如果 `step_count >= step_limit`（默认 250），强制 LAST，取消 agent task（148-153 行）
+5. 否则返回 MID(reward=0.0)，episode 结束时才算分
+
+**Reward 读取**（302-315 行）：
+
+- 优先尝试 `harbor_paths.EnvironmentPaths.reward_text_path`（通常 `/reward.txt`）
+- 退到 `.reward_json_path`（`/reward.json`，取唯一 key 的 value）
+- 两者都不存在 → `ValueError`
+
+**Episode 终止三条路**：
+
+| 触发 | 位置 | 终局 |
+|---|---|---|
+| agent task 完成 | 174-193 行 | LAST(reward=读 /reward.*) |
+| 步数超限 | 148-153 行 | LAST(reward=上一 reward) |
+| 已 LAST 再 step | 135-136 行 | 抛异常要求 reset |
+
+---
+
+### 4.2 Container 层
+
+#### 4.2.1 Container Protocol（`src/ares/containers/containers.py:24-131`）
+
+关键方法：
+```python
+async def start(env: dict[str, str] | None) -> None
+async def exec_run(command, workdir, env, timeout_s) -> ExecResult
+async def upload_files/download_files/upload_dir/download_dir
+def stop_and_remove() -> None   # 唯一同步方法，给 atexit 用
+```
+
+`ExecResult = dataclass(output: str, exit_code: int)`（10-13 行）。
+
+#### 4.2.2 DaytonaContainer（`src/ares/containers/daytona.py`）
+
+- **启动**（86-107 行）：`daytona.Image.base(image)` 或 `.from_dockerfile()` 创建 sandbox。
+  - `auto_stop_interval=30`（分钟不活跃自停，102 行）
+  - `auto_delete_interval=0`（停即删，103 行）
+  - `labels`：标记用户名（104 行）
+- **重试**（42-46 行）：`tenacity.retry` 装饰，10 次尝试 + `wait_exponential_jitter(max=60)`（35 行）。**只重 `DaytonaError`，不重 Timeout**（141 行）。
+- **文件传输**（181-211 行）：原生 API `sbx.fs.upload_files() / download_files()`。
+- **同步兜底**（158-179 行）：`stop_and_remove()` 用同步客户端 `timeout=10s`（176 行），给 `atexit` 用。
+
+#### 4.2.3 DockerContainer（`src/ares/containers/docker.py`）
+
+- **启动**（58-87 行）：
+  - Dockerfile 就现场 `build`（67-72 行）
+  - 用 `tail -f /dev/null` 让容器挂起（83 行）—— 否则 CMD 退出就立刻死
+- **执行**（96-121 行）：`["bash", "-lc", command]`，`asyncio.wait_for` 超时抛异常（**无重试**）
+- **文件传输**（130-185 行）：tar 打包
+  - 上传：打成 tar，`put_archive()` 解包到容器内
+  - 下载：`get_archive()` 拿 tar，解包到本地
+
+#### 4.2.4 双实现对比
+
+| 特性 | Daytona | Docker |
+|---|---|---|
+| 启动介质 | 云 API | docker-py，本地 build |
+| 重试 | 10 次指数退避 | 无 |
+| 超时 | 抛 TimeoutError | `asyncio.wait_for` |
+| 文件传输 | 原生 SDK | tar 打包 |
+| 资源配置 | CPU/Memory/Disk/GPU | ❌ TODO |
+| 清理 | auto_stop + auto_delete | force remove |
+
+#### 4.2.5 Janitor 兜底（`src/ares/environments/code_env.py:348-389`）
+
+机制 = **atexit 注册表**：
+
+- 环境 `__init__` 或 `__aenter__` 时调 `_ENVIRONMENT_JANITOR.register_for_cleanup(self)`（按 `id(env)` 索引，359-361 行）
+- `async with env:` 正常退出 → `unregister_for_cleanup()`（363-365 行）
+- 进程异常终止时 `atexit` 触发 `_sync_cleanup()`（375-384 行），逐个调 `container.stop_and_remove()`
+
+关键：atexit 回调中**不能跑 async**，所以所有容器都必须提供同步版 `stop_and_remove()`。
+
+---
+
+### 4.3 Code Agents 层
+
+#### 4.3.1 协议
+
+```python
+class CodeAgent(Protocol):
+    async def run(self, task: str) -> None
+```
+
+#### 4.3.2 MiniSWECodeAgent（`src/ares/code_agents/mini_swe_agent.py`）
+
+轻量封装 mini-swe-agent 库。
+
+**`run()` 主循环**（156-191 行）：
+1. 收集系统信息（`uname`），渲染实例模板
+2. 循环：`step()` → `query()`（198-227 行）→ `execute_action()`（229-258 行）
+3. `query()`：发 `LLMRequest`，检查步数/成本上限（`_step_limit`, `_cost_limit`，141-142 行）
+4. `execute_action()`：从 markdown 代码块提取 bash，喂容器
+
+**错误分层**：`_NonTerminatingError`（可恢复，继续循环）vs `_TerminatingError`（终止）。
+
+#### 4.3.3 Terminus2Agent（`src/ares/code_agents/terminus2/terminus2_agent.py` 1,110 行）
+
+**生产级复杂度**，基于 Terminal-Bench 的 tmux 会话。
+
+**`run()` 主循环**（482-645 行）：
+1. `_ensure_tmux_session()`（196-319 行）：检查/安装 tmux，创建 `160x40` session，设 50k 行历史
+2. 主循环（507+ 行）：
+   - 检查 tmux 存活（`_is_tmux_session_alive()` 321-339 行）
+   - `_query_llm()`（651-723 行）
+   - 解析响应（JSON 或 XML，在 `__post_init__` 158-163 行选）
+   - `_execute_commands()`（725-849 行）—— `shlex.quote` 处理（791-810 行）
+   - 两步完成确认（611-625 行）：避免 agent 误提交
+
+**上下文管理**（亮点）：
+- **主动概括**（661-680 行）：估算 token（`2 字符 = 1 token`，666 行），超 200k 触发 Harbor 三步概括
+- **被动救援**（700-723 行）：捕获 `context_length_exceeded` 后恢复
+
+**增量输出追踪**（416-461 行）：
+- 保存前一缓冲，用 `rfind` 定位新增
+- 兜底（453-456 行）：找不到就降级输出整屏（可能重复）
+
+**最大轮数 = 1M**（149 行），远高于 MiniSWE 的默认 250。
+
+#### 4.3.4 Parser 容错（`terminus2/json_parser.py` 274 行 + `xml_parser.py` 229 行）
+
+**三级降级**（`json_parser.py:75-99`）：
+
+```python
+try:
+    data = json.loads(json_str)
+except JSONDecodeError:
+    json_str = self._auto_fix_json(json_str)       # 补括号/引号
+    try:
+        data = json.loads(json_str)
+    except JSONDecodeError:
+        fallback_parsed = self._parse_with_regex(original)  # 正则降级
+```
+
+XML 侧还有 `salvage_truncated_response`（`xml_parser.py`）从截断的响应里抢救有效标签。
+
+#### 4.3.5 两类 Agent 对比
+
+| 维度 | MiniSWECodeAgent | Terminus2Agent |
+|---|---|---|
+| 目的 | 经典 SWE-bench 评估 | 长期交互 / 生产级 |
+| 步数上限 | 250 | 1M |
+| 会话模型 | 每步无状态 | tmux 持续会话 |
+| 上下文管理 | 无 | 主动 + 被动概括 |
+| 错误模型 | 异常分类 | 分层恢复 |
+| 代码复杂度 | 约 260 行 | 1,110 行 |
+| 适用 | 快速跑分 | 复杂任务、超长轨迹 |
+
+---
+
+### 4.4 LLM Client 层（最关键）
+
+#### 4.4.1 核心协议（`src/ares/llms/`）
+
+```python
+class LLMClient(Protocol):
+    async def __call__(self, request: LLMRequest) -> LLMResponse
+```
+
+- `LLMRequest`：messages + optional temperature + tool definitions
+- `LLMResponse`：ChatCompletion + cost tracking
+
+#### 4.4.2 QueueMediatedLLMClient（`src/ares/llms/queue_mediated_client.py`）**——全框架最关键的 50 行**
+
+```python
+async def __call__(self, req: request.LLMRequest) -> response.LLMResponse:
+    future = asyncio.Future[response.LLMResponse]()
+    await self.q.put(async_utils.ValueAndFuture(value=req, future=future))
+    return await future
+```
+
+**原理**：
+
+1. Agent 调 `llm_client(request)` → 以为在等 LLM
+2. 实际：请求被塞进 `asyncio.Queue`，Agent 挂在 `await future`
+3. 环境侧 `q.get()` 拿到 `(request, future)` 对
+4. 环境把 request 作为 **observation** 返回给调用者（训练器）
+5. 训练器算出 response 后，通过 `step(response)` 传回
+6. `step()` 内部 `future.set_result(response)` → Agent 的 `await` 返回
+7. Agent 继续往下跑，**完全不知道自己被 RL 环境托管**
+
+这就是 ARES 让 "线性 Agent 代码" 和 "RL 环境协议" 无缝接起来的核心。
+
+#### 4.4.3 ValueAndFuture 抽象（`src/ares/async_utils.py`）
+
+仅 8-10 行：
+
+```python
+@dataclasses.dataclass(frozen=True)
+class ValueAndFuture[ValType, FutureType]:
+    value: ValType
+    future: asyncio.Future[FutureType]
+```
+
+把"请求值"和"响应 future"绑在一个不可变对象里。任何"线性代码 + 外部控制器"场景都可复用（模拟器、游戏引擎、多租户队列）。
+
+#### 4.4.4 ChatCompletionCompatibleLLMClient（`src/ares/llms/chat_completions_compatible.py`）
+
+- **线程局部缓存**（22-41 行）：`httpx.AsyncClient` 绑事件循环，线程局部避免跨线程死锁
+- **重试装饰器**（44-53 行）：`tenacity` 3 次 + exponential(1-60s) + 随机抖动
+- **GPT-5 特殊处理**（66-67 行）：动态去 temperature（GPT-5 不支持）
+- **成本追踪**（72 行）：`accounting.get_llm_cost()` 累加到 `response.cost`
+
+#### 4.4.5 转换层：两份而非一份
+
+两个 OpenAI API 格式不同：
+
+- **`openai_chat_converter.py`**（395 行）：转成 Chat Completions API 格式
+  - messages[0] = system
+  - `ToolCallMessage` 展平成 `AssistantMessage.tool_calls`
+  - 损失检测：`top_k`（Claude 特性）、`stop_sequences > 4 个`
+- **`openai_responses_converter.py`**（435 行）：转成 Responses API 格式
+  - `input` 是多态数组（`function_call / function_call_output / message`）
+  - `system_prompt` → `instructions` 参数
+  - `stop_sequences` 完全不支持
+
+两个 API 的消息/工具结构不同，单一转换器会很乱，拆两份各司其职。
+
+#### 4.4.6 成本核算（`accounting.py`）
+
+- `martian_cost_list()`（38-67 行）：从 Martian API 拉价目表，LRU 缓存
+- `get_llm_cost()`（70-97 行）：`request_cost + prompt_tokens × prompt_price + completion_tokens × completion_price`
+- ⚠️ **不计入** `cached_tokens` 和 `reasoning_tokens`（Martian 当前未区分，90 行注释）
+
+---
+
+### 4.5 ares-proxy（Go HTTP 版 Queue-Mediated）
+
+位置：`ares-proxy/`，5 个 Go 文件。
+
+#### 4.5.1 为什么需要？
+
+in-process `QueueMediatedLLMClient` 只在 **Agent 和环境同 Python 进程**时能用。真实场景里 Agent 常跑在隔离容器里，容器里 Python 进程看不见宿主 asyncio.Queue。
+
+`ares-proxy` = **跨进程的队列中介**，把 asyncio.Queue 搬到 HTTP。
+
+#### 4.5.2 三端点数据流
+
+```
+容器内 Agent                 宿主 Environment
+  │                               │
+  ├──POST /v1/chat/completions──→ │
+  │   (阻塞等响应)                │
+  │                               │
+  │                        ←──GET /poll──┤
+  │                            (拿请求)  │
+  │                               │      │
+  │                            ┌──┘      │
+  │                            ↓         │
+  │                     Python 环境处理  │
+  │                     return LLMResp   │
+  │                            │         │
+  │                       ──POST /respond─→
+  │                                      │
+  │◀─────────────响应回到 Agent──────────┘
+```
+
+**端点实现**：
+
+| 端点 | 文件:行 | 行为 |
+|---|---|---|
+| `POST /v1/chat/completions` | `main.go:34-59` + `broker.go:36-73` | 生成 UUID，创建 `responseChan`，加入 map + 队列，阻塞等（默认 15min timeout） |
+| `GET /poll` | `main.go:64-80` + `broker.go:90-102` | 原子读整个 `requestQueue`，**立即清空**（99 行），返回 JSON 数组 |
+| `POST /respond` | `main.go:85-109` + `broker.go:106-122` | 查 ID，`responseChan <- response`，关闭通道 |
+
+#### 4.5.3 数据结构
+
+- `PendingRequest = {ID (UUID), Request (json.RawMessage), Timestamp}`
+- `Broker` 维护双结构（`broker.go:14-22`）：
+  - `pendingRequests map[string]chan json.RawMessage` —— ID → 响应通道
+  - `requestQueue []PendingRequest` —— 待轮询队列
+- `mutex` 保护并发
+
+#### 4.5.4 为什么用 Go？
+
+- 高并发（goroutine + channel 天然适合队列代理）
+- 纯 stdlib，无外部依赖
+- 单二进制部署，扔进容器就能跑
+- Python 在容器里做这个反而需要多装一堆依赖
+
+#### 4.5.5 适用边界
+
+| 场景 | 用哪个 |
+|---|---|
+| Agent 和环境同 Python 进程 | `QueueMediatedLLMClient`（0 网络） |
+| Agent 在 Docker/Daytona 容器 | `ares-proxy`（RTT 10-100ms） |
+| 分布式评估（Agent 跨机器） | `ares-proxy` |
+
+---
+
+### 4.6 Registry + Presets
+
+#### 4.6.1 任务选择器（`src/ares/registry.py:31-217`）
+
+三种 Selector：
+
+| 类 | 行 | 作用 |
+|---|---|---|
+| `IndexSelector(index)` | 47-58 | `tasks[index]` |
+| `SliceSelector(start, end)` | 62-75 | `tasks[start:end]` |
+| `ShardSelector(shard_index, total)` | 79-109 | 均匀分片，`start = round(i × len / total)` |
+
+**语法解析**（`parse_selector` 112-217 行）：
+
+```
+sbv-mswea           → SliceSelector(None, None)  # 全选
+sbv-mswea:5         → IndexSelector(5)
+sbv-mswea:0:10      → SliceSelector(0, 10)
+sbv-mswea:5:        → SliceSelector(5, None)
+sbv-mswea@2/8       → ShardSelector(2, 8)
+```
+
+#### 4.6.2 EnvironmentSpec（239-276 行）
+
+```python
+class EnvironmentSpec(Protocol):
+    def get_info(self) -> EnvironmentInfo  # (name, description, num_tasks)
+    def get_env(self, selector, container_factory, tracker) -> Environment
+```
+
+全局 `_REGISTRY: dict[str, EnvironmentSpec]`（279 行）。
+
+#### 4.6.3 已注册预设（`src/ares/presets.py`）
+
+- **HarborSpec 系列**（39-82 行）：
+  - 从 `code_env.list_harbor_datasets()` 动态枚举所有 Harbor 数据集
+  - 数据集 × agent（`mini_swe_agent` / `terminus2_agent`）全笛卡尔积
+  - 命名：`{dataset_id}-{code_agent_id}`，例如 `sbv-mswea`、`sbv-terminus2`
+- **TwentyQuestionsSpec**（85-119 行）：
+  - 20 Questions 游戏（无容器，纯文本）
+  - 125 个内置对象
+  - 展示 ARES 非 SWE-bench 能力
+
+#### 4.6.4 `@register_env` 装饰器（320-424 行）
+
+- 自动为任意函数生成 `EnvironmentSpec` 类
+- 参数：`name`（默认函数名）、`description`（默认 docstring）、`num_tasks`（必需）
+
+#### 4.6.5 `make()` 主入口（527-605 行）
+
+1. 解析 `preset_id`（拆 selector 语法）
+2. 查 `_REGISTRY`，不存在抛 KeyError
+3. 调 `spec.get_env(selector=..., container_factory=..., tracker=...)`
+4. 返回环境实例
+
+---
+
+### 4.7 Examples 渐进式（`examples/`）
+
+**01 → 02 → 03 → 20q** 是精心设计的学习梯度：
+
+#### 4.7.1 `01_sequential_eval_with_local_llm.py`（L23-78）
+
+- 最小环境循环：`async with ares.make("sbv-mswea:0")` + 异步循环
+- 用 `llama_cpp.create_qwen2_0_5b_instruct_llama_cpp_client()` 加载本地 Qwen2-0.5B
+- 默认 Docker 容器
+
+#### 4.7.2 `02_sequential_eval_with_api.py`（L19-68）
+
+- 唯一差别：`agent = ChatCompletionCompatibleLLMClient(model="openai/gpt-5-mini")`
+- **展示 ARES 的模块化** —— LLM / 容器 / 环境完全解耦，换一个组件不动别的
+
+#### 4.7.3 `03_parallel_eval_with_api.py`（并行核心）
+
+- **Semaphore 流控**（L121）：`sem = asyncio.Semaphore(args.num_parallel_workers)`，默认 20
+- **装饰器包装**（L130-134）：`_await_with_semaphore()` 每个任务抢信号量
+- **gather 收集**（L143）：`asyncio.gather(*tasks, return_exceptions=True)` 批量启动
+- **TUI Dashboard**（L124-128）：实时追踪每个任务的 step/reward/cost/duration
+- **瓶颈**：`num_parallel_workers` + 容器工厂（Daytona API 配额）
+
+#### 4.7.4 20 Questions 案例（5 阶段机制可解释性）
+
+`examples/20q_case_study/`：
+
+| 阶段 | 文件 | 做什么 |
+|---|---|---|
+| 采集 | `collect_20q_data.py:L1-80` | 跑游戏 + `HookedTransformer` 抓 Llama-3.2-1B 激活，并行跨 GPU，存盘 50 episodes |
+| 探针 | `phase1_probe.py:L1-80` | 训练线性探针识别"无效问题"标签，测试中间层残差流编码 |
+| 干预 | `phase2_steer.py:L1-80` | Contrastive Activation Addition (CAA)，在步骤 t* 干预模型 |
+
+展示 ARES 超越"跑分"的用途：**支持可解释性研究的全闭环**。
+
+---
+
+### 4.8 Integration Tests + Mock 体系
+
+#### 4.8.1 集成测试（`integration_tests/`）
+
+`test_default_workdir.py:L10-48`：
+- 验证 SWE-bench Verified `/testbed` vs TerminalBench `/app` 工作目录
+- **用真 Daytona**，不 mock
+- 流程：`ares.make(preset)` → `reset()` → `exec_run("pwd")` → 断言
+
+#### 4.8.2 Mock 体系（`src/ares/testing/`）
+
+- **`MockContainer`**（`mock_container.py:L10-130`）：记录所有 `exec_commands` / `uploaded_files` / `downloaded_files`，支持 `exec_handler` 回调动态生成响应
+- **`MockLLMClient`**（`mock_llm.py:L10-72`）：循环预设响应列表 / 自定义 `response_handler`，记录全部请求，`get_last_request()` / `get_request_messages()` 提供断言入口
+
+策略：**单元测试 Mock，集成测试真容器**。
+
+---
+
+### 4.9 StatTracker + 实验追踪
+
+#### 4.9.1 Protocol（`stat_tracker.py:16-21`）
+
+```python
+class StatTracker(Protocol):
+    @contextlib.contextmanager
+    def timeit(self, name: str) -> Generator: ...
+    def scalar(self, name: str, value: float) -> None: ...
+```
+
+#### 4.9.2 三实现
+
+| 实现 | 文件:行 | 机制 |
+|---|---|---|
+| `NullStatTracker` | `stat_tracker.py:23-30` | 无操作 |
+| `LoggingStatTracker` | `stat_tracker.py:33-62` | 后台任务每 60s 打所有指标分位数（p0/p25/p50/p75/p100），`np.percentile()` |
+| `TensorboardStatTracker` | `tensorboard.py:14-42` | 60s 周期 `SummaryWriter.add_histogram()` |
+
+**约束**：无 MLflow / wandb（仅 tensorboard），周期固定 60s 不可配。
+
+---
+
+### 4.10 contrib 附加模块
+
+#### 4.10.1 mech_interp（机制可解释性）
+
+三个核心：
+
+| 文件 | 行 | 作用 |
+|---|---|---|
+| `hooked_transformer_client.py` | 13-140 | 实现 `LLMClient`，底层 `transformer-lens.HookedTransformer.generate()`，支持温度 / 消息格式化 / 长度约束 |
+| `activation_capture.py` | 13-89 | `TrajectoryActivations`：列表存每步 `ActivationCache`，`torch.save/load` 持久化 |
+| `hook_utils.py` | 20-100 | **零融合钩子**（20-63）：ablate 位置/头；**路径补丁钩子**（66-100）：clean → corrupted 激活替换做因果分析 |
+
+**和训练什么关系？**
+- **不是**直接训练反馈
+- **是**离线可解释性：rollout → 抓激活 → 离线训探针 → 识别方向 → 在线 rollout 干预 → 因果验证
+- 20q 案例展示完整闭环
+
+#### 4.10.2 llama_cpp.py（本地 LLM，L49-111）
+
+- `LlamaCppLLMClient` 对接 GGUF 模型
+- 用 `asyncio.to_thread()` 把阻塞的 `create_chat_completion()` 挪到线程池
+- 工厂函数 `create_qwen2_0_5b_instruct_llama_cpp_client()`（L108-110）
+- 本地推理 0 成本、无 API 延迟，但模型弱
+
+#### 4.10.3 eval_visualizer.py（TUI Dashboard，L72-697）
+
+- `TrackedEnvironment` 装饰器：自动捕获 step/reward/cost 推到 dashboard
+- `EvaluationDashboard`（Textual TUI）：
+  - 顶部摘要：running / completed / errors / 成功率 / 平均 return / 总成本
+  - 中部 DataTable：任务实时 step/reward/cost/duration
+  - 右侧直方图：agent step 分布
+  - 底部日志：重定向 stdout/stderr（L610-638）
+- 快捷键：`p` 暂停，`q` 退出
+
+---
+
+## 5. 关键设计模式
+
+### 5.1 Queue-Mediated Communication
+**最重要**。`asyncio.Queue + Future` 让 Agent 线性代码与 RL 环境无感接合。代码量只有 50 行但抽象力巨大。
+
+### 5.2 Protocol-Oriented Design
+几乎所有核心类型是 `typing.Protocol`（`Environment`、`CodeAgent`、`Container`、`LLMClient`、`ContainerFactory`、`CodeAgentFactory`、`StatTracker`）。结构子类型，无继承树。
+
+### 5.3 Factory Pattern
+环境收"工厂"而非"实例"：`container_factory`、`code_agent_factory`。便于本地/云、A/B agent 互换。
+
+### 5.4 Async Context Manager
+所有资源都 `async with`，保证 `__aexit__` 清理。
+
+### 5.5 Frozen Dataclass
+大部分 dataclass `frozen=True`，async 并发安全。
+
+### 5.6 Atexit Janitor
+异常退出兜底：所有活环境同步清理容器，防止云资源泄露。
+
+### 5.7 YAGNI 哲学
+`CLAUDE.md:250-251` 明说：不做过度抽象。`CodeEnvironment` 直接实现 `Environment` 协议，不搞基类继承。
+
+---
+
+## 6. 亮点、坑点、可抄片段
+
+### 6.1 亮点
+
+1. **Queue-Mediated 50 行**：优雅解耦"线性 Agent 代码" vs "RL 环境协议"
+2. **Parser 三级降级**（JSON → auto-fix → regex）：容错极强
+3. **增量输出追踪**（`terminus2_agent.py:416-461`）：适配超长 tmux 会话
+4. **主动 + 被动概括**：200k token 阈值 + `context_length_exceeded` 救援双保险
+5. **双栈拦截**：in-process Python Queue + out-of-process Go HTTP，覆盖所有部署形态
+6. **成本内置**：每个 `LLMResponse` 带 `cost`，支持精细计费
+7. **线程局部 httpx 客户端**：规避 async 事件循环跨线程死锁
+8. **Janitor atexit**：云资源兜底清理
+
+### 6.2 坑点
+
+1. **Terminus2Agent tmux 初始化复杂**（196-319 行）：动态 apt-get 装 tmux，建议生产镜像预装
+2. **200k token 阈值硬编码**（666 行）：`2 字符 = 1 token` 估算粗糙
+3. **ares-proxy 响应通道大小 = 1**（`broker.go:41`）：agent 不及时取会延迟
+4. **Chat 与 Responses 转换器有重复**（tool_choice 部分）
+5. **增量输出定位失败兜底**（453-456 行）：rfind 返回 -1 时输出整屏，可能重复
+6. **Docker 实现不支持资源配置**（CPU/Memory TODO）
+7. **StatTracker 周期硬编码 60s**，无接口配置
+8. **无 wandb / mlflow**，仅 tensorboard
+
+### 6.3 可抄片段（给自己项目用）
+
+**A. Queue-Mediated 50 行**（`queue_mediated_client.py:47-50`）
+—— 任何"线性代码 + 外部控制"场景都能抄：模拟器、多租户推理、游戏 AI。
+
+**B. 三级降级 Parser**（`json_parser.py:75-99`）
+—— LLM 输出解析的最佳实践模板。
+
+**C. Tenacity 重试装饰器**（`chat_completions_compatible.py:44-53`）
+—— 异步 API 客户端通用重试。
+
+**D. 线程局部 httpx 客户端**（`chat_completions_compatible.py:22-41`）
+—— 多线程 async 场景规避事件循环冲突。
+
+**E. Janitor atexit 模式**（`code_env.py:348-389`）
+—— 任何管外部资源（容器、临时文件、远程 session）的系统都该抄。
+
+**F. `ValueAndFuture` 抽象**（`async_utils.py`）
+—— 8 行泛型 dataclass，把"值 + 响应 future"打成原子单位。
+
+---
+
+## 7. 对标 / 启发
+
+### 7.1 vs 其他 Agent RL 框架
+
+| | ARES | Verl | OpenPipe Mini-ART | OpenAI Gymnasium |
+|---|---|---|---|---|
+| 定位 | 环境层 | 训练器（主） + 环境 | Fine-tuning + eval 一体 | 通用 RL 环境标准 |
+| Agent 支持 | SWE + terminal | SWE | 多场景 | 非 LLM 为主 |
+| 沙箱 | Daytona + Docker | 自研 | 自研 | 无 |
+| 拦截机制 | asyncio.Queue + Go proxy | RPC | 直接调用 | N/A |
+| 可解释性 | mech_interp 附加 | 无 | 无 | 无 |
+
+### 7.2 对标 Manus
+
+Manus = 成品 Agent（**应用层**），ARES = 训练/评估基础设施（**基础设施层**）。但：
+
+**ARES 里的 Agent 运行内核（`terminus2_agent` + `ares-proxy` + Daytona 沙箱）≈ 一个 Manus-like Agent 运行器**。
+
+把 ARES 的 RL 训练钩子（reward 读取、并发 rollout、gather 聚合）**拆掉**，剩下的部分可以当独立 Agent 运行时复用 —— 这是最值得拿的"后半"。
+
+### 7.3 对个人项目的启发
+
+- **HiClaw / OpenClaw 魔改**：`QueueMediatedLLMClient` + `ares-proxy` 的双栈拦截可直接借鉴，给多 Agent 编排做统一观察接口
+- **手机 GUI Agent**：`Terminus2Agent` 的 tmux 增量输出追踪 + 主动概括策略可迁移到 GUI 长轨迹
+- **Hermes Personal**：`ChatCompletionCompatibleLLMClient` 的线程局部客户端 + tenacity 重试模板直接抄
+- **通用**：Parser 三级降级 + Janitor atexit 模式属于"看过一次就该用"
+
+---
+
+## 8. 阅读路径推荐
+
+想理解 ARES，按以下顺序读最省脑：
+
+1. **`CLAUDE.md`**（13,139 字节，仓库自带，比 README 密度高）
+2. **`src/ares/__init__.py`**（公开 API 清单）
+3. **`src/ares/environments/base.py`**（Environment 协议 + TimeStep）
+4. **`src/ares/llms/queue_mediated_client.py`**（50 行，看一眼懂核心）
+5. **`src/ares/async_utils.py`**（`ValueAndFuture`）
+6. **`src/ares/environments/code_env.py`**（250 行，RL 主循环）
+7. **`src/ares/code_agents/mini_swe_agent.py`**（简单 Agent）
+8. **`src/ares/containers/docker.py`**（熟悉容器抽象）
+9. **`examples/03_parallel_eval_with_api.py`**（端到端用法）
+10. **`ares-proxy/*.go`**（跨进程版队列中介）
+11. **`src/ares/code_agents/terminus2/terminus2_agent.py`**（1,110 行，生产级 Agent）
+12. **`src/ares/contrib/mech_interp/*`**（可解释性加成）
+
+---
+
+## 9. 未竟 / 后续可深挖
+
+- [ ] Harbor 数据集规范的细节（外部库 `harbor` 0.1.32）
+- [ ] `ares-proxy` 和 Daytona 的集成模式（宿主 / 容器网络拓扑）
+- [ ] 成本追踪的 cached_token / reasoning_token 支持（`accounting.py:90` 注释提到待补）
+- [ ] StatTracker 扩展 wandb / mlflow（现仅 tensorboard）
+- [ ] Terminus2Agent 概括阈值调参经验（200k 字符估算是否合理）
+- [ ] 对接本地训练器（trl / verl / openpipe）的适配代码示例
+
+---
+
+**解析完成。** 基于 commit `c804aa2`，共解析 72 个 Python 文件 + 5 个 Go 文件，总 LOC 约 11,900。
diff --git a/.memory/worklog.json b/.memory/worklog.json
index 59136cd..ad2fa30 100644
--- a/.memory/worklog.json
+++ b/.memory/worklog.json
@@ -27,6 +27,13 @@
       "message": "auto-save 2026-04-24 20:03 (~1)",
       "hash": "5440e8b",
       "files_changed": 1
+    },
+    {
+      "ts": "2026-04-24T20:09:20+08:00",
+      "type": "commit",
+      "message": "auto-save 2026-04-24 20:09 (~1)",
+      "hash": "3e3043c",
+      "files_changed": 1
     }
   ]
 }