Files
inspatio-world/index.html
2026-04-25 19:23:25 +08:00

438 lines
18 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>InSpatio-World 4D 世界模型 — 深度核查</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
background: #0a0a0a; color: #e0e0e0;
min-height: 100vh; padding: 2rem;
}
.container { max-width: 1200px; margin: 0 auto; }
h1 {
font-size: 2.5rem; font-weight: 700;
background: linear-gradient(135deg, #60a5fa, #a78bfa);
-webkit-background-clip: text; -webkit-text-fill-color: transparent;
margin-bottom: 0.5rem;
}
.subtitle { color: #888; font-size: 1.1rem; margin-bottom: 0.5rem; }
.meta { color: #666; font-size: 0.9rem; margin-bottom: 2rem; }
.meta a { color: #60a5fa; text-decoration: none; }
.meta a:hover { text-decoration: underline; }
.card {
background: #141414; border: 1px solid #222; border-radius: 12px;
padding: 2rem; margin-bottom: 1.5rem;
}
.card h2 { color: #60a5fa; margin-bottom: 1rem; font-size: 1.3rem; }
.card h3 { color: #a78bfa; margin: 1.2rem 0 0.6rem; font-size: 1.1rem; }
.card p, .card li { line-height: 1.8; color: #aaa; }
.card ul { padding-left: 1.5rem; }
.card li { margin-bottom: 0.5rem; }
.highlight { color: #a78bfa; font-weight: 600; }
.highlight-green { color: #4ade80; font-weight: 600; }
.highlight-red { color: #f87171; font-weight: 600; }
.highlight-yellow { color: #fbbf24; font-weight: 600; }
.tag {
display: inline-block; background: #1e293b; color: #60a5fa;
padding: 0.25rem 0.75rem; border-radius: 6px; font-size: 0.85rem;
margin: 0.25rem 0.25rem 0.25rem 0;
}
.tag-warn {
display: inline-block; background: #422006; color: #fbbf24;
padding: 0.25rem 0.75rem; border-radius: 6px; font-size: 0.85rem;
margin: 0.25rem 0.25rem 0.25rem 0;
}
.tag-red {
display: inline-block; background: #450a0a; color: #f87171;
padding: 0.25rem 0.75rem; border-radius: 6px; font-size: 0.85rem;
margin: 0.25rem 0.25rem 0.25rem 0;
}
table { width: 100%; border-collapse: collapse; margin: 1rem 0; }
th, td {
padding: 0.75rem 1rem; text-align: left;
border-bottom: 1px solid #222;
}
th { color: #60a5fa; font-weight: 600; }
td { color: #aaa; }
.grid { display: grid; grid-template-columns: 1fr 1fr; gap: 1.5rem; }
@media (max-width: 768px) { .grid { grid-template-columns: 1fr; } }
code {
background: #1e1e1e; padding: 0.2rem 0.5rem; border-radius: 4px;
font-family: "SF Mono", Monaco, monospace; font-size: 0.9rem; color: #7dd3fc;
}
.pipeline {
display: flex; align-items: center; gap: 0; flex-wrap: wrap;
margin: 1rem 0;
}
.pipeline-step {
background: #1e293b; padding: 0.75rem 1.25rem; border-radius: 8px;
text-align: center; font-size: 0.9rem; color: #e0e0e0;
min-width: 140px;
}
.pipeline-arrow { color: #60a5fa; font-size: 1.5rem; padding: 0 0.5rem; }
.status-badge {
display: inline-block; background: #164e63; color: #22d3ee;
padding: 0.3rem 0.8rem; border-radius: 20px; font-size: 0.8rem;
font-weight: 600;
}
.verdict-badge {
display: inline-block; background: #422006; color: #fbbf24;
padding: 0.3rem 0.8rem; border-radius: 20px; font-size: 0.8rem;
font-weight: 600;
}
.verdict-box {
background: #1a1a2e; border: 2px solid #fbbf24; border-radius: 12px;
padding: 2rem; margin-bottom: 1.5rem; text-align: center;
}
.verdict-box h2 { color: #fbbf24; font-size: 1.5rem; margin-bottom: 0.75rem; }
.verdict-box p { color: #ccc; font-size: 1.05rem; line-height: 1.8; }
.meter-bar {
height: 8px; background: #222; border-radius: 4px; margin: 0.5rem 0;
overflow: hidden;
}
.meter-fill {
height: 100%; border-radius: 4px;
transition: width 0.6s ease;
}
.person-card {
display: flex; align-items: flex-start; gap: 1rem;
padding: 1rem; background: #1a1a1a; border-radius: 8px; margin-bottom: 0.75rem;
}
.person-avatar {
width: 48px; height: 48px; border-radius: 50%;
background: linear-gradient(135deg, #60a5fa, #a78bfa);
display: flex; align-items: center; justify-content: center;
font-size: 1.2rem; font-weight: 700; color: #fff; flex-shrink: 0;
}
.person-info h4 { color: #e0e0e0; margin-bottom: 0.25rem; }
.person-info p { font-size: 0.85rem; color: #888; line-height: 1.6; }
.claim-row {
display: grid; grid-template-columns: 1fr 1fr auto; gap: 1rem;
padding: 1rem 0; border-bottom: 1px solid #222; align-items: start;
}
.claim-row:last-child { border-bottom: none; }
.claim-label { color: #e0e0e0; font-weight: 500; }
.claim-reality { color: #aaa; font-size: 0.95rem; }
@media (max-width: 768px) {
.claim-row { grid-template-columns: 1fr; }
h1 { font-size: 1.8rem; }
}
</style>
</head>
<body>
<div class="container">
<h1>InSpatio-World</h1>
<p class="subtitle">Video-Conditioned 4D World Model — 深度核查报告</p>
<p class="meta">
InSpatio (2026.03.18) &nbsp;|&nbsp;
1.3B 参数 · Apache-2.0 开源 &nbsp;|&nbsp;
<a href="https://inspatio.github.io/inspatio-world/" target="_blank">Project Page</a> &nbsp;|&nbsp;
<a href="https://github.com/inspatio/inspatio-world" target="_blank">GitHub</a> &nbsp;|&nbsp;
<a href="https://arxiv.org/abs/2603.11911" target="_blank">WorldFM Paper</a>
&nbsp;&nbsp;<span class="verdict-badge">噱头成分较重</span>
</p>
<!-- 核查结论 -->
<div class="verdict-box">
<h2>核查结论:团队靠谱,宣传跑在技术前面</h2>
<p>不是骗局,但营销话术明显夸大。<br>
真实水平:一个有实力的创业团队发布了<span class="highlight-yellow">早期开源原型</span><br>
但宣传文案按"已验证突破"级别在写。4D 模型<span class="highlight-red">无技术论文</span>,零独立评测。</p>
</div>
<!-- 团队背景 -->
<div class="card">
<h2>团队背景 — 这部分是真的</h2>
<div class="person-card">
<div class="person-avatar"></div>
<div class="person-info">
<h4>章国锋 — 创始人 & CEO</h4>
<p>浙江大学 CAD&CG 国家重点实验室教授 · 前商汤首席科学家 · 20+ 年 3D 视觉研究 · ISMAR 最佳论文 · Google Scholar 高引用</p>
</div>
</div>
<div class="person-card">
<div class="person-avatar"></div>
<div class="person-info">
<h4>刘浩敏 — 联合创始人 & CTO</h4>
<p>前商汤研发总监 · 浙大博士 · 商用移动端 SLAM 方案开创者(早于 ARKit/ARCore· 主导商汤 StarGen 基础模型</p>
</div>
</div>
<p style="margin-top: 1rem;">
<span class="tag">浙大 CAD&CG 实验室</span>
<span class="tag">前商汤 3D 视觉核心团队</span>
<span class="tag">18 人研发团队</span>
<span class="tag">2025 年成立</span>
</p>
<p style="margin-top: 0.75rem; color: #4ade80; font-size: 0.9rem;">
团队学术背景过硬,在 SLAM、3D 重建领域有真实且可验证的积累。
</p>
</div>
<!-- 技术架构 -->
<div class="card">
<h2>技术架构</h2>
<p style="margin-bottom: 1rem;">以视频为条件锚定"局部世界状态",生成可交互的 4D 场景:</p>
<div class="pipeline">
<div class="pipeline-step">
<strong>输入视频</strong><br>
<small style="color:#888">任意视频片段</small>
</div>
<span class="pipeline-arrow"></span>
<div class="pipeline-step">
<strong>Depth-Anything-3</strong><br>
<small style="color:#888">深度估计</small>
</div>
<span class="pipeline-arrow"></span>
<div class="pipeline-step">
<strong>Florence-2</strong><br>
<small style="color:#888">视频描述生成</small>
</div>
<span class="pipeline-arrow"></span>
<div class="pipeline-step">
<strong>Wan2.1 + Self-Forcing</strong><br>
<small style="color:#888">4D 世界状态生成</small>
</div>
<span class="pipeline-arrow"></span>
<div class="pipeline-step">
<strong>4D 漫游</strong><br>
<small style="color:#888">多视角 + 时间轴</small>
</div>
</div>
<p style="margin-top: 1rem;">
<span class="tag">1.3B 参数</span>
<span class="tag">Self-Forcing (NeurIPS 2025 Spotlight)</span>
<span class="tag">Wan2.1 (阿里视频扩散)</span>
<span class="tag">Depth-Anything-3</span>
<span class="tag">Florence-2</span>
<span class="tag">Apache-2.0</span>
</p>
<p style="margin-top: 0.75rem; color: #4ade80; font-size: 0.9rem;">
底层技术栈均为已发表、可验证的成熟方案。架构组合本身是合理的。
</p>
</div>
<!-- 宣传 vs 实际 — 逐项核查 -->
<div class="card">
<h2>宣传声明 vs 实际验证</h2>
<div class="claim-row">
<div>
<div class="claim-label">"首个视频条件 4D 世界模型"</div>
</div>
<div class="claim-reality">
窄化定义下的"首个"。NeoVerse (CVPR 2026)、D4RT (DeepMind)、DeepVerse、Kinema4D 等都在做类似方向。"首个"取决于如何定义类别边界。
</div>
<div><span class="tag-warn">窄化定义</span></div>
</div>
<div class="claim-row">
<div>
<div class="claim-label">"单卡 24 FPS 实时漫游"</div>
</div>
<div class="claim-reality">
24 FPS 需要数据中心级 GPUA100/H100。消费级 RTX 4090 实测 7-10 FPS。官方 README 注明代码"尚未做速度优化"。
</div>
<div><span class="tag-red">误导</span></div>
</div>
<div class="claim-row">
<div>
<div class="claim-label">"消除物体消失幻觉"</div>
</div>
<div class="claim-reality">
理论构想。团队自己的 WorldFM 论文承认:动态场景处理不好、存在"明显帧间抖动"、在线推理时出现运动边界伪影。
</div>
<div><span class="tag-red">自相矛盾</span></div>
</div>
<div class="claim-row">
<div>
<div class="claim-label">"WorldScore-Dynamic 排行榜第一"</div>
</div>
<div class="claim-reality">
限定条件:"实时推理速度的模型中"排第一。这是子集排名,非总榜第一。且无法从独立来源验证该排名。
</div>
<div><span class="tag-warn">选择性披露</span></div>
</div>
<div class="claim-row">
<div>
<div class="claim-label">"不超过 100 块 GPU 训练"</div>
</div>
<div class="claim-reality">
1.3B 参数量 + 基于 Wan2.1 微调100 GPU 规模合理。但未独立验证。
</div>
<div><span class="tag">合理可信</span></div>
</div>
</div>
<!-- 可信度评分 -->
<div class="card">
<h2>可信度评分</h2>
<table>
<tr>
<th style="width:40%">维度</th>
<th style="width:40%">评分</th>
<th style="width:20%">分数</th>
</tr>
<tr>
<td>团队学术背景</td>
<td>
<div class="meter-bar"><div class="meter-fill" style="width:90%; background: #4ade80;"></div></div>
</td>
<td><span class="highlight-green">9/10</span></td>
</tr>
<tr>
<td>技术方案合理性</td>
<td>
<div class="meter-bar"><div class="meter-fill" style="width:75%; background: #60a5fa;"></div></div>
</td>
<td><span style="color:#60a5fa; font-weight:600;">7.5/10</span></td>
</tr>
<tr>
<td>宣传与实际匹配度</td>
<td>
<div class="meter-bar"><div class="meter-fill" style="width:35%; background: #f87171;"></div></div>
</td>
<td><span class="highlight-red">3.5/10</span></td>
</tr>
<tr>
<td>论文/同行评审</td>
<td>
<div class="meter-bar"><div class="meter-fill" style="width:20%; background: #f87171;"></div></div>
</td>
<td><span class="highlight-red">2/10</span></td>
</tr>
<tr>
<td>独立第三方验证</td>
<td>
<div class="meter-bar"><div class="meter-fill" style="width:5%; background: #f87171;"></div></div>
</td>
<td><span class="highlight-red">0.5/10</span></td>
</tr>
<tr>
<td>代码开源程度</td>
<td>
<div class="meter-bar"><div class="meter-fill" style="width:80%; background: #4ade80;"></div></div>
</td>
<td><span class="highlight-green">8/10</span></td>
</tr>
</table>
</div>
<div class="grid">
<!-- 最关键的问题 -->
<div class="card">
<h2>最关键的缺陷</h2>
<ul>
<li><span class="highlight-red">InSpatio-World4D 模型)没有技术论文</span> — arxiv 上的 2603.11911 是 WorldFM3D 帧模型),是另一个系统</li>
<li><span class="highlight-red">无标准量化指标</span> — LPIPS / PSNR / FID 均未报告</li>
<li><span class="highlight-red">无对比实验表</span> — 没有和 NeoVerse / D4RT 等竞品的 head-to-head 对比</li>
<li><span class="highlight-red">零社区讨论</span> — Reddit / HN / Twitter 上找不到独立技术讨论</li>
<li><span class="highlight-red">自承矛盾</span> — WorldFM 论文承认动态场景差、帧抖动,但 World 项目页宣称"消除幻觉"</li>
</ul>
</div>
<!-- 真实优势 -->
<div class="card">
<h2>真实的优势</h2>
<ul>
<li><span class="highlight-green">团队扎实</span> — 章国锋是 3D 视觉领域顶级学者,非"PPT 创业"</li>
<li><span class="highlight-green">完整开源</span> — Apache-2.0 协议,代码可检查复现</li>
<li><span class="highlight-green">底层可靠</span> — Self-Forcing (NeurIPS 2025)、Wan2.1、Depth-Anything-3 均为成熟方案</li>
<li><span class="highlight-green">架构创新</span> — "视频作为持久世界状态锚点"的思路是真实的技术贡献</li>
<li><span class="highlight-green">训练成本</span>&le;100 GPU 微调已有模型,路径合理</li>
</ul>
</div>
</div>
<!-- 竞品对比 -->
<div class="card">
<h2>同期竞品对比</h2>
<table>
<tr>
<th>项目</th>
<th>来源</th>
<th>方法</th>
<th>论文</th>
<th>速度</th>
</tr>
<tr>
<td><span class="highlight">InSpatio-World</span></td>
<td>InSpatio 创业公司</td>
<td>视频条件 + 扩散世界模型</td>
<td><span class="highlight-red"></span></td>
<td>10 FPS (4090) / 24 FPS (A100)</td>
</tr>
<tr>
<td>NeoVerse</td>
<td>CreateAI · CVPR 2026</td>
<td>前馈 4D Gaussian Splatting</td>
<td>CVPR 2026</td>
<td>&lt;30s 推理</td>
</tr>
<tr>
<td>D4RT</td>
<td>Google DeepMind</td>
<td>4D 重建 + 跟踪</td>
<td></td>
<td>1 分钟视频 ~5s (TPU)</td>
</tr>
<tr>
<td>DeepVerse</td>
<td>arxiv 2506.01103</td>
<td>4D 自回归视频世界模型</td>
<td>arxiv</td>
<td></td>
</tr>
<tr>
<td>Kinema4D</td>
<td>arxiv 2603.16669</td>
<td>运动学 4D 世界建模</td>
<td>arxiv</td>
<td></td>
</tr>
</table>
<p style="margin-top: 0.75rem; font-size: 0.85rem; color: #888;">
InSpatio-World 的独特之处在于"视频→持久世界状态"的概念,但缺乏论文和对比数据,难以客观判断其相对优劣。
</p>
</div>
<!-- WorldFM 论文自承局限 -->
<div class="card">
<h2>WorldFM 论文自述局限2603.11911</h2>
<p style="margin-bottom: 0.75rem; color: #888;">以下来自团队自己的 WorldFM 论文InSpatio-World 基于同一架构:</p>
<ul>
<li><span class="highlight-yellow">动态场景处理差</span> — 训练数据"以静态场景为主",对运动物体的泛化能力不足</li>
<li><span class="highlight-yellow">帧间抖动</span> — 在线推理时帧与帧之间存在"明显抖动",缺乏时间一致性约束</li>
<li><span class="highlight-yellow">运动边界伪影</span> — 视角变化较大时出现边界伪影</li>
<li><span class="highlight-yellow">无标准指标</span> — 论文中未报告 LPIPS / PSNR / FID / SSIM 等常用量化指标</li>
</ul>
</div>
<!-- 底部结语 -->
<div class="card" style="border-color: #333;">
<h2>一句话总结</h2>
<p style="font-size: 1.1rem; color: #ccc; line-height: 2;">
<span class="highlight-green">不是骗局</span>,但<span class="highlight-yellow">噱头成分很重</span>
<br>
真实水平:有实力的创业团队发布早期开源原型。
<br>
宣传水平:按"已验证突破"级别在写文案。
<br>
建议:可以<span class="highlight">关注</span>这个方向,但别把项目主页的数字当论文结论看。
<br>
等到 <span class="highlight">技术论文发表 + 独立复现</span> 之后再下判断不迟。
</p>
</div>
<p style="text-align: center; color: #444; margin-top: 2rem; font-size: 0.85rem;">
InSpatio-World 深度核查报告 · 2026-03-22 · 端口 4150
</p>
</div>
</body>
</html>