Files
ltx2/index.html
kang 3f76496c90 feat: LTX-2 音画同出完整研究页面
涵盖模型概述、双流DiT架构、3D控制、技术规格、
发展时间线、硬件需求、ComfyUI工作流、相关资源。

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:11:38 +08:00

1173 lines
37 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>LTX-2 音画同出研究</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link href="https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500&family=Noto+Sans+SC:wght@400;500;600;700&display=swap" rel="stylesheet">
<style>
:root {
--bg: #09090b;
--bg-card: #111113;
--bg-card-hover: #18181b;
--border: #222225;
--border-light: #2a2a2e;
--text: #e4e4e7;
--text-muted: #71717a;
--text-dim: #52525b;
--accent-blue: #60a5fa;
--accent-purple: #a78bfa;
--accent-cyan: #22d3ee;
--accent-green: #4ade80;
--accent-orange: #fb923c;
--accent-red: #f87171;
--font-sans: 'DM Sans', 'Noto Sans SC', system-ui, -apple-system, sans-serif;
--font-mono: 'JetBrains Mono', ui-monospace, monospace;
}
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
font-family: var(--font-sans);
background: var(--bg);
color: var(--text);
min-height: 100vh;
-webkit-font-smoothing: antialiased;
line-height: 1.7;
}
::-webkit-scrollbar { width: 6px; }
::-webkit-scrollbar-track { background: transparent; }
::-webkit-scrollbar-thumb { background: #333; border-radius: 3px; }
/* ===== Hero ===== */
.hero {
position: relative;
padding: 5rem 2rem 4rem;
text-align: center;
overflow: hidden;
}
.hero::before {
content: '';
position: absolute;
top: -40%;
left: 50%;
transform: translateX(-50%);
width: 800px;
height: 800px;
background: radial-gradient(ellipse, rgba(96,165,250,0.08) 0%, rgba(167,139,250,0.04) 40%, transparent 70%);
pointer-events: none;
}
.hero-badge {
display: inline-flex;
align-items: center;
gap: 8px;
padding: 6px 16px;
border: 1px solid var(--border);
border-radius: 100px;
font-size: 12px;
color: var(--text-muted);
letter-spacing: 0.08em;
text-transform: uppercase;
margin-bottom: 2rem;
animation: fadeInDown 0.6s ease-out;
}
.hero-badge .dot {
width: 6px; height: 6px;
background: var(--accent-green);
border-radius: 50%;
animation: pulse-dot 2s infinite;
}
@keyframes pulse-dot {
0%, 100% { opacity: 1; }
50% { opacity: 0.4; }
}
.hero h1 {
font-size: clamp(2.5rem, 5vw, 4rem);
font-weight: 700;
letter-spacing: -0.035em;
line-height: 1.08;
margin-bottom: 1rem;
animation: fadeInUp 0.6s ease-out;
}
.hero h1 .gradient {
background: linear-gradient(135deg, var(--accent-blue), var(--accent-purple), var(--accent-cyan));
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
background-clip: text;
}
.hero .subtitle {
font-size: clamp(1rem, 2vw, 1.2rem);
color: var(--text-muted);
max-width: 640px;
margin: 0 auto 2.5rem;
animation: fadeInUp 0.6s ease-out 0.1s both;
}
.hero-stats {
display: flex;
justify-content: center;
gap: 3rem;
flex-wrap: wrap;
animation: fadeInUp 0.6s ease-out 0.2s both;
}
.hero-stat {
text-align: center;
}
.hero-stat .num {
font-size: 2rem;
font-weight: 700;
font-family: var(--font-mono);
letter-spacing: -0.02em;
}
.hero-stat .num.blue { color: var(--accent-blue); }
.hero-stat .num.purple { color: var(--accent-purple); }
.hero-stat .num.cyan { color: var(--accent-cyan); }
.hero-stat .num.green { color: var(--accent-green); }
.hero-stat .label {
font-size: 12px;
color: var(--text-dim);
letter-spacing: 0.08em;
text-transform: uppercase;
margin-top: 4px;
}
/* ===== Container ===== */
.container {
max-width: 1200px;
margin: 0 auto;
padding: 0 2rem;
}
/* ===== Section ===== */
.section {
padding: 4rem 0;
border-top: 1px solid var(--border);
}
.section-label {
font-size: 11px;
letter-spacing: 0.12em;
text-transform: uppercase;
color: var(--text-dim);
font-weight: 500;
margin-bottom: 1rem;
}
.section-title {
font-size: clamp(1.75rem, 3.5vw, 2.2rem);
font-weight: 600;
letter-spacing: -0.03em;
margin-bottom: 1rem;
}
.section-desc {
color: var(--text-muted);
font-size: 1rem;
max-width: 680px;
margin-bottom: 2.5rem;
}
/* ===== Cards Grid ===== */
.cards-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
gap: 1rem;
}
.card {
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 12px;
padding: 2rem;
transition: all 0.2s ease;
}
.card:hover {
background: var(--bg-card-hover);
border-color: var(--border-light);
}
.card-icon {
width: 40px;
height: 40px;
border-radius: 10px;
display: flex;
align-items: center;
justify-content: center;
font-size: 20px;
margin-bottom: 1.2rem;
}
.card-icon.blue { background: rgba(96,165,250,0.12); color: var(--accent-blue); }
.card-icon.purple { background: rgba(167,139,250,0.12); color: var(--accent-purple); }
.card-icon.cyan { background: rgba(34,211,238,0.12); color: var(--accent-cyan); }
.card-icon.green { background: rgba(74,222,128,0.12); color: var(--accent-green); }
.card-icon.orange { background: rgba(251,146,60,0.12); color: var(--accent-orange); }
.card-icon.red { background: rgba(248,113,113,0.12); color: var(--accent-red); }
.card h3 {
font-size: 1.1rem;
font-weight: 600;
margin-bottom: 0.6rem;
letter-spacing: -0.02em;
}
.card p {
color: var(--text-muted);
font-size: 14px;
line-height: 1.7;
}
/* ===== Architecture Diagram ===== */
.arch-diagram {
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 12px;
padding: 3rem 2rem;
margin-bottom: 2rem;
overflow-x: auto;
}
.arch-flow {
display: flex;
align-items: center;
justify-content: center;
gap: 0.5rem;
flex-wrap: wrap;
min-width: 600px;
}
.arch-node {
padding: 12px 20px;
border-radius: 10px;
font-size: 13px;
font-weight: 500;
text-align: center;
white-space: nowrap;
min-width: 120px;
}
.arch-node.input {
background: rgba(96,165,250,0.1);
border: 1px solid rgba(96,165,250,0.25);
color: var(--accent-blue);
}
.arch-node.process {
background: rgba(167,139,250,0.1);
border: 1px solid rgba(167,139,250,0.25);
color: var(--accent-purple);
}
.arch-node.core {
background: rgba(34,211,238,0.1);
border: 1px solid rgba(34,211,238,0.25);
color: var(--accent-cyan);
padding: 16px 28px;
font-size: 14px;
font-weight: 600;
}
.arch-node.output {
background: rgba(74,222,128,0.1);
border: 1px solid rgba(74,222,128,0.25);
color: var(--accent-green);
}
.arch-arrow {
color: var(--text-dim);
font-size: 18px;
flex-shrink: 0;
}
.arch-node small {
display: block;
font-size: 11px;
opacity: 0.7;
font-weight: 400;
margin-top: 2px;
}
/* ===== Dual Stream ===== */
.dual-stream {
display: grid;
grid-template-columns: 1fr auto 1fr;
gap: 1.5rem;
align-items: stretch;
margin-top: 2rem;
}
.stream-card {
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 12px;
padding: 2rem;
}
.stream-card h3 {
font-size: 1.1rem;
font-weight: 600;
margin-bottom: 0.3rem;
}
.stream-card .params {
font-family: var(--font-mono);
font-size: 2rem;
font-weight: 700;
margin: 0.8rem 0;
}
.stream-card ul {
list-style: none;
padding: 0;
}
.stream-card li {
font-size: 14px;
color: var(--text-muted);
padding: 6px 0;
border-bottom: 1px solid var(--border);
display: flex;
align-items: center;
gap: 8px;
}
.stream-card li:last-child { border-bottom: none; }
.stream-card li::before {
content: '▸';
font-size: 10px;
}
.stream-connector {
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
gap: 0.5rem;
}
.connector-line {
width: 2px;
flex: 1;
background: linear-gradient(to bottom, rgba(96,165,250,0.3), rgba(167,139,250,0.3));
}
.connector-label {
font-size: 11px;
color: var(--text-dim);
writing-mode: vertical-rl;
text-orientation: mixed;
letter-spacing: 0.1em;
text-transform: uppercase;
padding: 8px 0;
}
/* ===== Spec Table ===== */
.spec-table {
width: 100%;
border-collapse: collapse;
font-size: 14px;
}
.spec-table th {
text-align: left;
padding: 12px 16px;
font-weight: 500;
color: var(--text-dim);
font-size: 11px;
letter-spacing: 0.1em;
text-transform: uppercase;
border-bottom: 1px solid var(--border);
}
.spec-table td {
padding: 12px 16px;
border-bottom: 1px solid var(--border);
color: var(--text-muted);
}
.spec-table tr:last-child td { border-bottom: none; }
.spec-table .value {
font-family: var(--font-mono);
color: var(--text);
font-weight: 500;
}
/* ===== Timeline ===== */
.timeline {
position: relative;
padding-left: 2rem;
}
.timeline::before {
content: '';
position: absolute;
left: 7px;
top: 8px;
bottom: 8px;
width: 2px;
background: linear-gradient(to bottom, var(--accent-blue), var(--accent-purple), var(--accent-cyan));
opacity: 0.3;
}
.timeline-item {
position: relative;
padding-bottom: 2rem;
}
.timeline-item:last-child { padding-bottom: 0; }
.timeline-dot {
position: absolute;
left: -2rem;
top: 6px;
width: 16px;
height: 16px;
border-radius: 50%;
border: 2px solid var(--border);
background: var(--bg);
}
.timeline-dot.active {
border-color: var(--accent-cyan);
background: rgba(34,211,238,0.15);
}
.timeline-date {
font-family: var(--font-mono);
font-size: 12px;
color: var(--text-dim);
margin-bottom: 4px;
}
.timeline-title {
font-weight: 600;
margin-bottom: 4px;
}
.timeline-desc {
font-size: 14px;
color: var(--text-muted);
}
/* ===== 3D Control ===== */
.control-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(260px, 1fr));
gap: 1rem;
}
.control-card {
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 12px;
padding: 1.5rem;
transition: all 0.2s ease;
}
.control-card:hover {
border-color: var(--border-light);
}
.control-card .tag {
display: inline-block;
font-size: 11px;
letter-spacing: 0.08em;
text-transform: uppercase;
padding: 3px 10px;
border-radius: 100px;
margin-bottom: 0.8rem;
font-weight: 500;
}
.control-card .tag.depth {
background: rgba(96,165,250,0.12);
color: var(--accent-blue);
}
.control-card .tag.pose {
background: rgba(167,139,250,0.12);
color: var(--accent-purple);
}
.control-card .tag.camera {
background: rgba(34,211,238,0.12);
color: var(--accent-cyan);
}
.control-card .tag.keyframe {
background: rgba(74,222,128,0.12);
color: var(--accent-green);
}
.control-card h4 {
font-size: 1rem;
font-weight: 600;
margin-bottom: 0.5rem;
}
.control-card p {
font-size: 13px;
color: var(--text-muted);
line-height: 1.7;
}
/* ===== Compare ===== */
.compare-block {
display: grid;
grid-template-columns: 1fr auto 1fr;
gap: 1.5rem;
align-items: start;
}
.compare-side {
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 12px;
padding: 2rem;
}
.compare-side h3 {
font-size: 1rem;
font-weight: 600;
margin-bottom: 1rem;
}
.compare-side.old h3 { color: var(--text-dim); }
.compare-side.new h3 { color: var(--accent-cyan); }
.compare-vs {
display: flex;
align-items: center;
justify-content: center;
font-size: 14px;
font-weight: 600;
color: var(--text-dim);
padding-top: 2.5rem;
}
.compare-side ol {
padding-left: 1.2rem;
}
.compare-side li {
font-size: 14px;
color: var(--text-muted);
padding: 4px 0;
}
.compare-side.old li { text-decoration: line-through; opacity: 0.5; }
/* ===== Hardware ===== */
.hw-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
gap: 1rem;
}
.hw-card {
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 12px;
padding: 1.5rem;
}
.hw-card .tier {
font-size: 11px;
letter-spacing: 0.1em;
text-transform: uppercase;
font-weight: 500;
margin-bottom: 0.5rem;
}
.hw-card .tier.best { color: var(--accent-green); }
.hw-card .tier.good { color: var(--accent-blue); }
.hw-card .tier.ok { color: var(--accent-orange); }
.hw-card .tier.hard { color: var(--accent-red); }
.hw-card h4 {
font-size: 1rem;
font-weight: 600;
margin-bottom: 0.3rem;
}
.hw-card .vram {
font-family: var(--font-mono);
font-size: 13px;
color: var(--text-muted);
margin-bottom: 0.8rem;
}
.hw-card p {
font-size: 13px;
color: var(--text-muted);
line-height: 1.6;
}
/* ===== Links ===== */
.links-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
gap: 1rem;
}
.link-card {
display: block;
background: var(--bg-card);
border: 1px solid var(--border);
border-radius: 12px;
padding: 1.5rem;
text-decoration: none;
transition: all 0.2s ease;
}
.link-card:hover {
background: var(--bg-card-hover);
border-color: var(--border-light);
}
.link-card .link-label {
font-size: 11px;
letter-spacing: 0.08em;
text-transform: uppercase;
color: var(--text-dim);
margin-bottom: 0.5rem;
}
.link-card .link-title {
font-weight: 600;
color: var(--text);
margin-bottom: 0.3rem;
}
.link-card .link-url {
font-family: var(--font-mono);
font-size: 12px;
color: var(--accent-blue);
word-break: break-all;
}
/* ===== Footer ===== */
.footer {
padding: 3rem 0;
border-top: 1px solid var(--border);
text-align: center;
color: var(--text-dim);
font-size: 13px;
}
/* ===== Animations ===== */
@keyframes fadeInUp {
from { opacity: 0; transform: translateY(20px); }
to { opacity: 1; transform: translateY(0); }
}
@keyframes fadeInDown {
from { opacity: 0; transform: translateY(-10px); }
to { opacity: 1; transform: translateY(0); }
}
.fade-in {
opacity: 0;
transform: translateY(16px);
transition: opacity 0.5s ease, transform 0.5s ease;
}
.fade-in.visible {
opacity: 1;
transform: translateY(0);
}
/* ===== Responsive ===== */
@media (max-width: 768px) {
.hero { padding: 3rem 1.5rem 2.5rem; }
.hero-stats { gap: 1.5rem; }
.container { padding: 0 1.5rem; }
.dual-stream { grid-template-columns: 1fr; }
.stream-connector { flex-direction: row; padding: 0.5rem 0; }
.connector-line { width: auto; height: 2px; flex: 1; }
.connector-label { writing-mode: horizontal-tb; }
.compare-block { grid-template-columns: 1fr; }
.compare-vs { padding: 0; }
.arch-flow { justify-content: flex-start; }
}
</style>
</head>
<body>
<!-- ===== Hero ===== -->
<section class="hero">
<div class="hero-badge">
<span class="dot"></span>
<span>Lightricks · 开源 · Apache 2.0</span>
</div>
<h1>
<span class="gradient">LTX-2</span> 音画同出
</h1>
<p class="subtitle">
开源界第一个在单次推理中同时生成视频和音频的生产级 AI 模型——声画同步,一步到位
</p>
<div class="hero-stats">
<div class="hero-stat">
<div class="num blue">19B</div>
<div class="label">模型参数</div>
</div>
<div class="hero-stat">
<div class="num purple">4K</div>
<div class="label">原生分辨率</div>
</div>
<div class="hero-stat">
<div class="num cyan">50fps</div>
<div class="label">最高帧率</div>
</div>
<div class="hero-stat">
<div class="num green">20s</div>
<div class="label">单次最长</div>
</div>
</div>
</section>
<div class="container">
<!-- ===== 什么是 LTX-2 ===== -->
<section class="section fade-in">
<p class="section-label">Overview</p>
<h2 class="section-title">什么是 LTX-2</h2>
<p class="section-desc">
LTX-2 由以色列 AI 公司 <strong>Lightricks</strong> 开发,是一个 19B 参数的扩散变换器DiT模型。
它的核心突破在于:不再需要先生成视频再配音,而是在<strong>同一次前向传播</strong>中同时产出画面和声音。
</p>
<div class="cards-grid">
<div class="card">
<div class="card-icon blue">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><polygon points="23 7 16 12 23 17 23 7"/><rect x="1" y="5" width="15" height="14" rx="2" ry="2"/></svg>
</div>
<h3>视频生成</h3>
<p>14B 参数的视频流,支持 1080p / 1440p / 4K 多分辨率输出24~50fps 可调帧率,画面细节远超同级开源模型。</p>
</div>
<div class="card">
<div class="card-icon purple">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M9 18V5l12-2v13"/><circle cx="6" cy="18" r="3"/><circle cx="18" cy="16" r="3"/></svg>
</div>
<h3>音频生成</h3>
<p>5B 参数的音频流,生成对白、音效、环境音、背景音乐。唇形同步精准,脚步声随动作节奏自然匹配。</p>
</div>
<div class="card">
<div class="card-icon cyan">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M4 15s1-1 4-1 5 2 8 2 4-1 4-1V3s-1 1-4 1-5-2-8-2-4 1-4 1z"/><line x1="4" y1="22" x2="4" y2="15"/></svg>
</div>
<h3>双向交叉注意力</h3>
<p>视频流与音频流通过双向交叉注意力层耦合,共享时间步条件,确保去噪过程中两个模态始终对齐。</p>
</div>
<div class="card">
<div class="card-icon green">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M21 16V8a2 2 0 0 0-1-1.73l-7-4a2 2 0 0 0-2 0l-7 4A2 2 0 0 0 3 8v8a2 2 0 0 0 1 1.73l7 4a2 2 0 0 0 2 0l7-4A2 2 0 0 0 21 16z"/></svg>
</div>
<h3>3D 精准控制</h3>
<p>通过深度图、姿态骨骼、摄像机路径等三维空间信息条件化生成,实现对画面的精确、可复现的控制。</p>
</div>
<div class="card">
<div class="card-icon orange">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><rect x="2" y="3" width="20" height="14" rx="2" ry="2"/><line x1="8" y1="21" x2="16" y2="21"/><line x1="12" y1="17" x2="12" y2="21"/></svg>
</div>
<h3>消费级硬件</h3>
<p>FP8 量化版可在单张 RTX 409024GB上运行无需专业级 GPU 集群,个人创作者也能本地部署。</p>
</div>
<div class="card">
<div class="card-icon red">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M12 22s8-4 8-10V5l-8-3-8 3v7c0 6 8 10 8 10z"/></svg>
</div>
<h3>完全开源</h3>
<p>Apache 2.0 协议。学术免费,年收入 &lt;1000 万美元的公司免费使用。权重、代码、ComfyUI 节点全部公开。</p>
</div>
</div>
</section>
<!-- ===== 音画同出 vs 传统流程 ===== -->
<section class="section fade-in">
<p class="section-label">Core Innovation</p>
<h2 class="section-title">"音画同出"是什么意思?</h2>
<p class="section-desc">
一次推理,视频和音频同步产出。告别"先生成视频、再配音、手动对齐"的割裂流程。
</p>
<div class="compare-block">
<div class="compare-side old">
<h3>传统流程</h3>
<ol>
<li>用模型 A 生成视频</li>
<li>用模型 B 生成/匹配音效</li>
<li>手动调整唇形同步</li>
<li>后期合成、反复校对</li>
<li>经常对不齐,体验割裂</li>
</ol>
</div>
<div class="compare-vs">VS</div>
<div class="compare-side new">
<h3>LTX-2 音画同出</h3>
<ol>
<li>输入文本 / 图片提示</li>
<li>模型同时生成视频+音频</li>
<li>口型、动作、音效天然同步</li>
<li>直接输出完整音视频文件</li>
<li>一步到位,零后期对齐</li>
</ol>
</div>
</div>
</section>
<!-- ===== 模型架构 ===== -->
<section class="section fade-in">
<p class="section-label">Architecture</p>
<h2 class="section-title">非对称双流 DiT 架构</h2>
<p class="section-desc">
两条独立的扩散变换器流(视频 14B + 音频 5B通过双向交叉注意力机制在每一步去噪中保持对齐。
</p>
<!-- Flow Diagram -->
<div class="arch-diagram">
<div class="arch-flow">
<div class="arch-node input">
文本提示<small>Text Prompt</small>
</div>
<div class="arch-arrow"></div>
<div class="arch-node process">
文本嵌入<small>Text Encoder</small>
</div>
<div class="arch-arrow"></div>
<div class="arch-node core">
双流 DiT<small>Video 14B + Audio 5B</small>
</div>
<div class="arch-arrow"></div>
<div class="arch-node process">
模态 VAE<small>Video + Audio Decoder</small>
</div>
<div class="arch-arrow"></div>
<div class="arch-node output">
音视频输出<small>Synced A/V</small>
</div>
</div>
</div>
<!-- Dual Stream Detail -->
<div class="dual-stream">
<div class="stream-card">
<h3 style="color: var(--accent-blue);">视频流 Video Stream</h3>
<div class="params" style="color: var(--accent-blue);">14B</div>
<ul>
<li>基于 DiTDiffusion Transformer</li>
<li>原生支持 4K / 1440p / 1080p</li>
<li>帧率 24 / 25 / 48 / 50 fps</li>
<li>时空压缩 VAE 编码</li>
<li>深度图 / Canny / 姿态条件输入</li>
</ul>
</div>
<div class="stream-connector">
<div class="connector-line"></div>
<div class="connector-label">双向交叉注意力</div>
<div class="connector-line"></div>
</div>
<div class="stream-card">
<h3 style="color: var(--accent-purple);">音频流 Audio Stream</h3>
<div class="params" style="color: var(--accent-purple);">5B</div>
<ul>
<li>语音对白(唇形同步)</li>
<li>环境音效(脚步、风声)</li>
<li>背景音乐</li>
<li>音频 VAE 编码/解码</li>
<li>共享时间步条件</li>
</ul>
</div>
</div>
</section>
<!-- ===== 3D 控制 ===== -->
<section class="section fade-in">
<p class="section-label">3D Control</p>
<h2 class="section-title">只有三维才能精准控制 AI</h2>
<p class="section-desc">
纯文字提示太模糊,通过深度图、姿态骨骼、摄像机路径等三维信息才能实现精确可控的视频生成。
</p>
<div class="control-grid">
<div class="control-card">
<span class="tag depth">Depth</span>
<h4>深度图控制</h4>
<p>通过 IC-LoRA 注入深度图,强制摄像机几何关系,控制大尺度布局和主体距离。模型在约束下自由选择纹理和光照。</p>
</div>
<div class="control-card">
<span class="tag pose">Pose</span>
<h4>姿态控制</h4>
<p>从参考视频提取人体骨骼运动,通过姿态 IC-LoRA 迁移到生成视频中。人物动作精确复现,表情自然过渡。</p>
</div>
<div class="control-card">
<span class="tag camera">Camera</span>
<h4>30 种电影摄像机运动</h4>
<p>内置推拉Dolly、横移、旋转、跟踪等 30 种预设。Orbit 环绕、侧向移动、深度感知构图——覆盖主流电影镜头语言。</p>
</div>
<div class="control-card">
<span class="tag keyframe">Keyframe</span>
<h4>多关键帧条件</h4>
<p>指定首帧 + 中间帧 + 尾帧,模型在关键帧之间智能插值。实现无缝单镜头运动,精确控制视频走向。</p>
</div>
</div>
</section>
<!-- ===== 技术规格 ===== -->
<section class="section fade-in">
<p class="section-label">Specifications</p>
<h2 class="section-title">技术规格</h2>
<div class="card" style="overflow-x: auto;">
<table class="spec-table">
<thead>
<tr>
<th>参数</th>
<th>规格</th>
<th>备注</th>
</tr>
</thead>
<tbody>
<tr>
<td>模型总参数</td>
<td class="value">19B</td>
<td>视频 14B + 音频 5B</td>
</tr>
<tr>
<td>架构</td>
<td class="value">非对称双流 DiT</td>
<td>Asymmetric Dual-Stream Diffusion Transformer</td>
</tr>
<tr>
<td>最大分辨率</td>
<td class="value">4K (3840×2160)</td>
<td>也支持 1080p、1440p</td>
</tr>
<tr>
<td>帧率</td>
<td class="value">24 / 25 / 48 / 50 fps</td>
<td>可通过 Temporal Upscaler 2× 提升</td>
</tr>
<tr>
<td>单次最长时长</td>
<td class="value">20 秒</td>
<td>音视频同步生成</td>
</tr>
<tr>
<td>支持输入</td>
<td class="value">文本 / 图片 / 视频</td>
<td>T2V、I2V、V2V 均支持</td>
</tr>
<tr>
<td>控制条件</td>
<td class="value">深度图 / Canny / 姿态</td>
<td>通过 IC-LoRA 注入</td>
</tr>
<tr>
<td>空间超分</td>
<td class="value">2× Spatial Upscaler</td>
<td>ltx-2-spatial-upscaler-x2</td>
</tr>
<tr>
<td>时间超分</td>
<td class="value">2× Temporal Upscaler</td>
<td>ltx-2-temporal-upscaler-x2</td>
</tr>
<tr>
<td>量化格式</td>
<td class="value">FP16 / FP8</td>
<td>FP8 可在 24GB 显存运行</td>
</tr>
<tr>
<td>开源协议</td>
<td class="value">Apache 2.0</td>
<td>年营收 &lt;$10M 免费商用</td>
</tr>
<tr>
<td>最新版本</td>
<td class="value">LTX-2.3</td>
<td>2026-03-05 发布</td>
</tr>
</tbody>
</table>
</div>
</section>
<!-- ===== 发展时间线 ===== -->
<section class="section fade-in">
<p class="section-label">Timeline</p>
<h2 class="section-title">发展时间线</h2>
<p class="section-desc">从纯视频到音画同出Lightricks 的迭代路径。</p>
<div class="timeline">
<div class="timeline-item">
<div class="timeline-dot"></div>
<div class="timeline-date">2025</div>
<div class="timeline-title">LTX-Video 1.0</div>
<div class="timeline-desc">初代视频生成模型,基于 DiT 架构2B / 13B 参数变体,纯视频无音频。</div>
</div>
<div class="timeline-item">
<div class="timeline-dot"></div>
<div class="timeline-date">2025-10</div>
<div class="timeline-title">LTX-2 发布</div>
<div class="timeline-desc">19B 参数,首次实现音视频同步生成。非对称双流 DiT 架构,双向交叉注意力。</div>
</div>
<div class="timeline-item">
<div class="timeline-dot"></div>
<div class="timeline-date">2026-01-06</div>
<div class="timeline-title">LTX-2 开源</div>
<div class="timeline-desc">Apache 2.0 协议开源权重和代码ComfyUI Day-0 集成。开源界第一个生产级音画同出模型。</div>
</div>
<div class="timeline-item">
<div class="timeline-dot active"></div>
<div class="timeline-date">2026-03-05</div>
<div class="timeline-title">LTX-2.3</div>
<div class="timeline-desc">最新迭代。画面细节、肖像视频、音频质量全面提升。IC-LoRA 3D 控制系统成熟。</div>
</div>
</div>
</section>
<!-- ===== 硬件需求 ===== -->
<section class="section fade-in">
<p class="section-label">Hardware</p>
<h2 class="section-title">硬件需求</h2>
<p class="section-desc">
完整模型需要 NVIDIA CUDA GPU。Mac 用户建议使用云端方案。
</p>
<div class="hw-grid">
<div class="hw-card">
<div class="tier best">推荐</div>
<h4>NVIDIA A100 / H100</h4>
<div class="vram">80GB VRAM</div>
<p>FP16 全精度运行4K 输出流畅,多批次推理无压力。专业工作站或云端首选。</p>
</div>
<div class="hw-card">
<div class="tier good">可用</div>
<h4>NVIDIA RTX 4090</h4>
<div class="vram">24GB VRAM</div>
<p>FP8 量化运行。1080p 输出体验良好4K 需分阶段处理。个人创作者最佳性价比。</p>
</div>
<div class="hw-card">
<div class="tier ok">勉强</div>
<h4>NVIDIA RTX 4080 / 3090</h4>
<div class="vram">16-24GB VRAM</div>
<p>需 FP8 量化 + 显存优化。720p-1080p 可用,生成速度较慢,但能跑。</p>
</div>
<div class="hw-card">
<div class="tier hard">不建议</div>
<h4>Apple Silicon (Mac)</h4>
<div class="vram">无 CUDA 支持</div>
<p>MPS 后端对大模型支持不稳定,音画同出工作流大概率报错。建议使用云端替代方案。</p>
</div>
</div>
</section>
<!-- ===== ComfyUI 工作流 ===== -->
<section class="section fade-in">
<p class="section-label">Workflow</p>
<h2 class="section-title">ComfyUI 工作流</h2>
<p class="section-desc">
LTX-2 已原生集成到 ComfyUIDay-0 即支持。标准工作流节点如下:
</p>
<div class="cards-grid">
<div class="card">
<div class="card-icon blue">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4"/><polyline points="7 10 12 15 17 10"/><line x1="12" y1="15" x2="12" y2="3"/></svg>
</div>
<h3>1. 加载模型</h3>
<p>加载 <code>ltx-2-19b-dev</code> 或 FP8 量化版。推荐使用官方 ComfyUI-LTXVideo 节点包。</p>
</div>
<div class="card">
<div class="card-icon purple">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><line x1="17" y1="10" x2="3" y2="10"/><line x1="21" y1="6" x2="3" y2="6"/><line x1="21" y1="14" x2="3" y2="14"/><line x1="17" y1="18" x2="3" y2="18"/></svg>
</div>
<h3>2. 文本提示</h3>
<p>200 字以内单段描述。包含动作、外观、背景、镜头运动、光影信息,越具体效果越好。</p>
</div>
<div class="card">
<div class="card-icon cyan">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><circle cx="12" cy="12" r="10"/><polygon points="10 8 16 12 10 16 10 8"/></svg>
</div>
<h3>3. 音画同步生成</h3>
<p>一次推理同时输出视频帧和音频波形。双流 DiT 自动对齐,无需手动同步。</p>
</div>
<div class="card">
<div class="card-icon green">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><polyline points="15 3 21 3 21 9"/><polyline points="9 21 3 21 3 15"/><line x1="21" y1="3" x2="14" y2="10"/><line x1="3" y1="21" x2="10" y2="14"/></svg>
</div>
<h3>4. 可选超分</h3>
<p>Spatial Upscaler 2× 提升分辨率Temporal Upscaler 2× 提升帧率。链式调用逐步增强。</p>
</div>
</div>
</section>
<!-- ===== 相关链接 ===== -->
<section class="section fade-in">
<p class="section-label">Resources</p>
<h2 class="section-title">相关资源</h2>
<div class="links-grid">
<a class="link-card" href="https://github.com/Lightricks/LTX-Video" target="_blank" rel="noopener">
<div class="link-label">GitHub</div>
<div class="link-title">Lightricks/LTX-Video</div>
<div class="link-url">github.com/Lightricks/LTX-Video</div>
</a>
<a class="link-card" href="https://github.com/Lightricks/ComfyUI-LTXVideo" target="_blank" rel="noopener">
<div class="link-label">ComfyUI 节点</div>
<div class="link-title">ComfyUI-LTXVideo</div>
<div class="link-url">github.com/Lightricks/ComfyUI-LTXVideo</div>
</a>
<a class="link-card" href="https://arxiv.org/abs/2601.03233" target="_blank" rel="noopener">
<div class="link-label">论文</div>
<div class="link-title">LTX-2: Joint Audio-Visual Foundation Model</div>
<div class="link-url">arxiv.org/abs/2601.03233</div>
</a>
<a class="link-card" href="https://ltx.io/model/ltx-2" target="_blank" rel="noopener">
<div class="link-label">官网</div>
<div class="link-title">LTX-2 Model Page</div>
<div class="link-url">ltx.io/model/ltx-2</div>
</a>
<a class="link-card" href="https://docs.comfy.org/tutorials/video/ltx/ltx-2" target="_blank" rel="noopener">
<div class="link-label">教程</div>
<div class="link-title">ComfyUI LTX-2 工作流文档</div>
<div class="link-url">docs.comfy.org/tutorials/video/ltx</div>
</a>
<a class="link-card" href="https://www.digitfold.com/2870.html" target="_blank" rel="noopener">
<div class="link-label">中文教程</div>
<div class="link-title">数字折叠 — ComfyUI-LTXVideo 插件指南</div>
<div class="link-url">digitfold.com/2870.html</div>
</a>
</div>
</section>
<!-- ===== Footer ===== -->
<footer class="footer">
<p>LTX-2 音画同出研究 · 2026-03-27 · Lightricks 开源项目</p>
</footer>
</div>
<script>
// Scroll-triggered fade-in
const observer = new IntersectionObserver((entries) => {
entries.forEach(entry => {
if (entry.isIntersecting) {
entry.target.classList.add('visible');
}
});
}, { threshold: 0.08, rootMargin: '-40px' });
document.querySelectorAll('.fade-in').forEach(el => observer.observe(el));
</script>
</body>
</html>