252 lines
10 KiB
HTML
252 lines
10 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="zh-CN">
|
||
<head>
|
||
<meta charset="UTF-8">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||
<title>TrackOnR 真实世界点跟踪</title>
|
||
<style>
|
||
* { margin: 0; padding: 0; box-sizing: border-box; }
|
||
body {
|
||
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
|
||
background: #0a0a0a; color: #e0e0e0;
|
||
min-height: 100vh; padding: 2rem;
|
||
}
|
||
.container { max-width: 1200px; margin: 0 auto; }
|
||
h1 {
|
||
font-size: 2.5rem; font-weight: 700;
|
||
background: linear-gradient(135deg, #60a5fa, #a78bfa);
|
||
-webkit-background-clip: text; -webkit-text-fill-color: transparent;
|
||
margin-bottom: 0.5rem;
|
||
}
|
||
.subtitle { color: #888; font-size: 1.1rem; margin-bottom: 0.5rem; }
|
||
.meta { color: #666; font-size: 0.9rem; margin-bottom: 2rem; }
|
||
.meta a { color: #60a5fa; text-decoration: none; }
|
||
.meta a:hover { text-decoration: underline; }
|
||
.card {
|
||
background: #141414; border: 1px solid #222; border-radius: 12px;
|
||
padding: 2rem; margin-bottom: 1.5rem;
|
||
}
|
||
.card h2 { color: #60a5fa; margin-bottom: 1rem; font-size: 1.3rem; }
|
||
.card p, .card li { line-height: 1.8; color: #aaa; }
|
||
.card ul { padding-left: 1.5rem; }
|
||
.card li { margin-bottom: 0.5rem; }
|
||
.highlight { color: #a78bfa; font-weight: 600; }
|
||
.tag {
|
||
display: inline-block; background: #1e293b; color: #60a5fa;
|
||
padding: 0.25rem 0.75rem; border-radius: 6px; font-size: 0.85rem;
|
||
margin: 0.25rem 0.25rem 0.25rem 0;
|
||
}
|
||
table { width: 100%; border-collapse: collapse; margin: 1rem 0; }
|
||
th, td {
|
||
padding: 0.75rem 1rem; text-align: left;
|
||
border-bottom: 1px solid #222;
|
||
}
|
||
th { color: #60a5fa; font-weight: 600; }
|
||
td { color: #aaa; }
|
||
.grid { display: grid; grid-template-columns: 1fr 1fr; gap: 1.5rem; }
|
||
@media (max-width: 768px) { .grid { grid-template-columns: 1fr; } }
|
||
code {
|
||
background: #1e1e1e; padding: 0.2rem 0.5rem; border-radius: 4px;
|
||
font-family: "SF Mono", Monaco, monospace; font-size: 0.9rem; color: #7dd3fc;
|
||
}
|
||
.pipeline {
|
||
display: flex; align-items: center; gap: 0; flex-wrap: wrap;
|
||
margin: 1rem 0;
|
||
}
|
||
.pipeline-step {
|
||
background: #1e293b; padding: 0.75rem 1.25rem; border-radius: 8px;
|
||
text-align: center; font-size: 0.9rem; color: #e0e0e0;
|
||
}
|
||
.pipeline-arrow { color: #60a5fa; font-size: 1.5rem; padding: 0 0.5rem; }
|
||
.status-badge {
|
||
display: inline-block; background: #164e63; color: #22d3ee;
|
||
padding: 0.3rem 0.8rem; border-radius: 20px; font-size: 0.8rem;
|
||
font-weight: 600;
|
||
}
|
||
</style>
|
||
</head>
|
||
<body>
|
||
<div class="container">
|
||
<h1>Track-On-R</h1>
|
||
<p class="subtitle">Real-World Point Tracking with Verifier-Guided Pseudo-Labeling</p>
|
||
<p class="meta">
|
||
CVPR 2026 |
|
||
Gorkay Aydemir, Fatma Guney, Weidi Xie |
|
||
<a href="https://kuis-ai.github.io/track_on_r/" target="_blank">Project Page</a> |
|
||
<a href="https://arxiv.org/abs/2603.12217" target="_blank">Paper</a> |
|
||
<a href="https://github.com/gorkaydemir/track_on" target="_blank">GitHub</a>
|
||
<span class="status-badge">源码已 clone</span>
|
||
</p>
|
||
|
||
<!-- 核心概念 -->
|
||
<div class="card">
|
||
<h2>什么是点跟踪(Point Tracking)</h2>
|
||
<p>在视频的第一帧选中任意一个像素点,算法能在后续每一帧精确定位这个点的位置,即使目标被遮挡、光照变化、物体变形。这是计算机视觉中的基础能力,支撑视频编辑、机器人视觉、自动驾驶、AR/VR 等应用。</p>
|
||
</div>
|
||
|
||
<!-- Track-On 家族 -->
|
||
<div class="card">
|
||
<h2>Track-On 模型家族</h2>
|
||
<table>
|
||
<tr><th>模型</th><th>发表</th><th>核心创新</th></tr>
|
||
<tr>
|
||
<td>Track-On</td>
|
||
<td>ICLR 2025</td>
|
||
<td>首次提出在线逐帧点跟踪 + Transformer 紧凑记忆机制</td>
|
||
</tr>
|
||
<tr>
|
||
<td>Track-On2</td>
|
||
<td>TPAMI 2026</td>
|
||
<td>改进架构,更强性能和效率</td>
|
||
</tr>
|
||
<tr>
|
||
<td><span class="highlight">Track-On-R</span></td>
|
||
<td>CVPR 2026</td>
|
||
<td>Verifier-guided 伪标签,在真实视频上微调,SOTA</td>
|
||
</tr>
|
||
</table>
|
||
</div>
|
||
|
||
<!-- 技术架构 -->
|
||
<div class="card">
|
||
<h2>Track-On-R 技术架构</h2>
|
||
<p style="margin-bottom: 1rem;">三阶段训练流水线:</p>
|
||
<div class="pipeline">
|
||
<div class="pipeline-step">
|
||
<strong>Stage 1</strong><br>
|
||
Track-On2<br>
|
||
<small style="color:#888">合成数据预训练<br>(Kubric Movi-F)</small>
|
||
</div>
|
||
<span class="pipeline-arrow">→</span>
|
||
<div class="pipeline-step">
|
||
<strong>Stage 2</strong><br>
|
||
Verifier 训练<br>
|
||
<small style="color:#888">K-Epic 数据集<br>学习判断跟踪质量</small>
|
||
</div>
|
||
<span class="pipeline-arrow">→</span>
|
||
<div class="pipeline-step">
|
||
<strong>Stage 3</strong><br>
|
||
Track-On-R<br>
|
||
<small style="color:#888">真实视频微调<br>Verifier 筛选伪标签</small>
|
||
</div>
|
||
</div>
|
||
<ul style="margin-top: 1rem;">
|
||
<li><span class="highlight">在线处理</span>:逐帧处理视频,不需要看完整个视频再回溯,适合实时/流式场景</li>
|
||
<li><span class="highlight">Transformer 记忆</span>:紧凑的 memory 模块存储历史帧信息,平衡精度和效率</li>
|
||
<li><span class="highlight">Verifier 引导</span>:训练一个"质量检验员",对 6 个 teacher 模型的预测打分,只用高质量伪标签微调</li>
|
||
<li><span class="highlight">DINOv3 骨干网络</span>:基于 Meta DINOv3 ViT-S/16+ 特征提取</li>
|
||
</ul>
|
||
</div>
|
||
|
||
<div class="grid">
|
||
<!-- 性能指标 -->
|
||
<div class="card">
|
||
<h2>性能指标(δ_avg)</h2>
|
||
<table>
|
||
<tr><th>数据集</th><th>Track-On2</th><th>Track-On-R</th></tr>
|
||
<tr><td>DAVIS</td><td>79.9</td><td><span class="highlight">80.3</span></td></tr>
|
||
<tr><td>Kinetics</td><td>69.3</td><td><span class="highlight">71.0</span></td></tr>
|
||
<tr><td>RoboTAP</td><td>80.5</td><td><span class="highlight">82.6</span></td></tr>
|
||
<tr><td>EgoPoints</td><td>61.7</td><td><span class="highlight">67.3</span></td></tr>
|
||
<tr><td>Dynamic Replica</td><td>74.5</td><td><span class="highlight">75.1</span></td></tr>
|
||
<tr><td>PointOdyssey</td><td>45.1</td><td><span class="highlight">53.4</span></td></tr>
|
||
</table>
|
||
<p style="margin-top: 0.5rem; font-size: 0.85rem;">真实世界微调后,EgoPoints 提升 +5.6,PointOdyssey 提升 +8.3</p>
|
||
</div>
|
||
|
||
<!-- Teacher 模型集成 -->
|
||
<div class="card">
|
||
<h2>Teacher 模型集成(6 个)</h2>
|
||
<ul>
|
||
<li>Track-On2(自身)</li>
|
||
<li>BootsTAPNext(Google DeepMind)</li>
|
||
<li>BootsTAPIR(Google DeepMind)</li>
|
||
<li>CoTracker3 window(Meta)</li>
|
||
<li>Anthro-LocoTrack(KAIST)</li>
|
||
<li>AllTracker</li>
|
||
</ul>
|
||
<p style="margin-top: 0.75rem; font-size: 0.85rem;">Verifier 对每个 teacher 的预测打分,选最优结果作为伪标签训练 Track-On-R</p>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 预训练模型 -->
|
||
<div class="card">
|
||
<h2>预训练权重</h2>
|
||
<table>
|
||
<tr><th>模型</th><th>训练数据</th><th>下载</th></tr>
|
||
<tr>
|
||
<td>Track-On-R</td>
|
||
<td>Kubric + 真实视频</td>
|
||
<td><a href="https://huggingface.co/gorkaydemir/track_on_r/resolve/main/track_on_r.pt" style="color:#60a5fa">HuggingFace</a></td>
|
||
</tr>
|
||
<tr>
|
||
<td>Track-On2</td>
|
||
<td>Kubric</td>
|
||
<td><a href="https://huggingface.co/gorkaydemir/track_on2/resolve/main/trackon2_dinov3_checkpoint.pt" style="color:#60a5fa">HuggingFace</a></td>
|
||
</tr>
|
||
<tr>
|
||
<td>Verifier</td>
|
||
<td>K-Epic</td>
|
||
<td><a href="https://huggingface.co/gorkaydemir/track_on_r/resolve/main/verifier.pt" style="color:#60a5fa">HuggingFace</a></td>
|
||
</tr>
|
||
</table>
|
||
<p style="margin-top: 0.5rem; font-size: 0.85rem; color: #f59e0b;">
|
||
⚠ 需额外申请 DINOv3 骨干权重(Meta 许可限制),首次运行自动下载
|
||
</p>
|
||
</div>
|
||
|
||
<!-- 运行环境 -->
|
||
<div class="card">
|
||
<h2>运行环境要求</h2>
|
||
<div style="margin-bottom: 1rem;">
|
||
<span class="tag">Python 3.12</span>
|
||
<span class="tag">PyTorch 2.4.1</span>
|
||
<span class="tag">CUDA 12.1</span>
|
||
<span class="tag">mmcv 2.2.0</span>
|
||
<span class="tag">DINOv3</span>
|
||
</div>
|
||
<ul>
|
||
<li><span class="highlight">必须 NVIDIA GPU</span>(Mac 不支持 CUDA,无法运行)</li>
|
||
<li>推荐 GPU:A100 / RTX 3090 / RTX 4090 / H100</li>
|
||
<li>环境管理:<code>mamba</code> 或 <code>conda</code></li>
|
||
</ul>
|
||
</div>
|
||
|
||
<!-- 应用场景 -->
|
||
<div class="card">
|
||
<h2>应用场景</h2>
|
||
<div class="grid" style="margin-top: 0.5rem;">
|
||
<ul>
|
||
<li><strong>视频编辑</strong> — 跟踪物体做特效、抠像、替换</li>
|
||
<li><strong>机器人视觉</strong> — 跟踪抓取目标关键点</li>
|
||
<li><strong>自动驾驶</strong> — 跟踪行人/车辆关键点</li>
|
||
</ul>
|
||
<ul>
|
||
<li><strong>运动分析</strong> — 跟踪运动员关节运动轨迹</li>
|
||
<li><strong>AR/VR</strong> — 空间锚点实时追踪</li>
|
||
<li><strong>手语识别</strong> — 跟踪手指/手势关键点</li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
|
||
<!-- 本地文件 -->
|
||
<div class="card">
|
||
<h2>本地项目结构</h2>
|
||
<p style="font-family: monospace; font-size: 0.9rem; line-height: 2;">
|
||
<code>source/</code> — Track-On 源码(GitHub clone)<br>
|
||
<code>source/demo.py</code> — 可直接运行的 demo 脚本<br>
|
||
<code>source/model/</code> — 模型定义(Predictor 类)<br>
|
||
<code>source/config/</code> — 训练/推理配置 YAML<br>
|
||
<code>source/evaluation/</code> — 6 个 benchmark 评估脚本<br>
|
||
<code>source/ensemble/</code> — Teacher 模型集成<br>
|
||
<code>source/verifier/</code> — Verifier 模型<br>
|
||
</p>
|
||
</div>
|
||
|
||
<p style="text-align: center; color: #444; margin-top: 2rem; font-size: 0.85rem;">
|
||
TrackOnR 研究页 · 端口 4130 · 待 NVIDIA GPU 到位后本地运行
|
||
</p>
|
||
</div>
|
||
</body>
|
||
</html>
|