feat: add synced video waveform timeline

This commit is contained in:
2026-05-17 14:58:12 +08:00
parent 27a6ef0818
commit 120dacf8b6
2 changed files with 238 additions and 59 deletions

View File

@@ -588,7 +588,7 @@
<tr><td><code>web/next.config.mjs</code></td><td>Next.js 构建配置:静态导出、图片不走优化、禁用开发环境左下角 Next Dev Indicator并移除 Next 16 已不支持的 <code>eslint</code> 顶层配置,避免本地 dev 出现配置 Issue 提示。</td></tr>
<tr><td><code>web/app/globals.css</code></td><td>全局主题变量、登录页视觉样式、ReactFlow 样式引用,以及本地开发态 <code>nextjs-portal</code> 遮挡隐藏规则。</td></tr>
<tr><td><code>web/app/page.tsx</code></td><td>产品工作台主状态jobs、activeJobId、生成任务状态主渲染为全屏素材输入列 + 音频解析工作表;“开始”编排状态只负责在下载完成后自动触发 <code>triggerTranscribe</code>不再默认触发抽帧、Vision 扫描或分镜初稿保存;底部吸附音频条不再从主界面渲染。</td></tr>
<tr><td><code>web/components/ad-recreation-board.tsx</code></td><td>信息流广告音频解析工作表:左侧素材输入;右侧展示视频下载状态、默认折叠的音频文案依据,以及统一的音频解析结果面板;面板顶部是一行讲话人/节奏/背景音摘要,下面是逐句时间轴。旧分镜卡、抽帧控制和视频生成组件仍保留在文件里,但当前主路径不渲染。</td></tr>
<tr><td><code>web/components/ad-recreation-board.tsx</code></td><td>信息流广告音频解析工作表:左侧素材输入;右侧展示视频下载状态、默认折叠的音频文案依据,以及统一的音频解析结果面板;面板顶部是一行讲话人/节奏/背景音摘要,下方左侧为原视频播放器、右侧为逐句时间轴,底部横向音频波形用于观察切割点和爆点节奏。视频播放会同步高亮并滚动当前句,点击波形或字幕行会跳转原视频时间。旧分镜卡、抽帧控制和视频生成组件仍保留在文件里,但当前主路径不渲染。</td></tr>
<tr><td><code>web/app/login/page.tsx</code></td><td>生产登录页:访问账号/访问密钥表单、保持登录、错误/成功状态;当前只在原版 Digital Oasis 动态背景上叠加一个组合登录框,桌面端左侧是动态角色,右侧是图标化登录表单;面板左上角展示官网 SKG 字标和中文“营销内容工作台”系统标识。</td></tr>
<tr><td><code>web/app/login/layout.tsx</code></td><td>登录路由专属 layout覆盖全站默认网页标题和描述为空避免 <code>/login</code> 继承工作台 metadata 后在页面源码里继续出现登录界面文字以外的文案。</td></tr>
<tr><td><code>web/components/login/oasis-canvas.tsx</code></td><td>登录页全屏动态视觉层:用 iframe 直接承载下载包 <code>web/public/oasis-source/index.html</code> 的原 WebGPU / Three.js 草场源码;父级登录页只覆盖自己的文案和表单,并在捕获阶段把全局鼠标坐标同时用原生事件和 <code>postMessage</code> 转发给 iframe避免登录面板或输入框遮挡时草地失去鼠标响应。</td></tr>
@@ -625,7 +625,7 @@
web/app/page.tsx
-> 音频解析工作表web/components/ad-recreation-board.tsx
-> 开始:创建/激活 job → 下载完成后自动触发音频处理
-> 左侧素材输入列 + 右侧默认折叠的音频文案依据 + 统一音频解析结果面板(声音摘要在上,逐句时间轴在下
-> 左侧素材输入列 + 右侧默认折叠的音频文案依据 + 统一音频解析结果面板(声音摘要在上,原视频与逐句时间轴并排,底部波形联动
-> 底部音频条:不再渲染,音频结果集中到右侧工作表
-> 旧节点/深度素材面板web/components/nodes/index.tsx、web/components/lightbox.tsx、web/components/storyboard-workbench.tsx底层保留当前不作为主入口
-> API 契约web/lib/api.ts
@@ -648,7 +648,7 @@ api/main.py
<div class="flow-row">
<div><strong>你看到的区域</strong><span>音频解析结果表</span></div>
<div><strong>主要源码</strong><span><code>AudioIntakePanel</code> / <code>AudioIntakeStatus</code> in <code>web/components/ad-recreation-board.tsx</code>;复用 <code>triggerTranscribe</code><code>AudioScript</code></span></div>
<div><strong>适合怎么描述</strong><span>逐句时间轴、讲话人、节奏、背景音还需要哪些字段;全文文案依据是否需要展开查看”。</span></div>
<div><strong>适合怎么描述</strong><span>原视频播放、音频波形、逐句时间轴滚动、高亮和跳转联动还需要怎么调整”。</span></div>
</div>
<div class="flow-row">
<div><strong>你看到的区域</strong><span>旧深度素材面板(当前不作为主路径)</span></div>
@@ -819,7 +819,7 @@ SubjectAsset {
<tr><td>删除输入视频</td><td><code>DELETE /jobs/{id}</code></td><td><code>deleteJob</code></td><td>从任务队列、URL 和磁盘 <code>jobs/&lt;id&gt;</code> 目录移除整个 job包括源视频、关键帧、元素提取图和生成视频。</td></tr>
<tr><td>解析视频</td><td><code>POST /jobs/{id}/analyze?frames=&amp;target=&amp;mode=&amp;quality=</code></td><td><code>analyzeJob</code></td><td>后续阶段保留的抽帧能力。默认 <code>frames=12</code><code>target</code> 支持透明骨架人、综合、清晰主体、转场变化、表情瞬间、动作峰值。当前第一步主流程不自动调用该接口。</td></tr>
<tr><td>音频文案轨</td><td><code>POST /jobs/{id}/transcribe</code></td><td><code>triggerTranscribe</code></td><td>若尚未拆轨,先从 <code>source.mp4</code> 提取 <code>audio.wav</code> 并回填 <code>source_audio_url</code>;随后用 ASR 提取原始文案,翻译成中文,写入 <code>audio_script.source_text</code><code>source_zh</code> 和逐句 <code>transcript</code>。远端 <code>ASR_MODEL</code> 失败后先走本机 <code>LOCAL_ASR_BIN</code>/<code>LOCAL_ASR_MODEL</code>(默认 <code>mlx_whisper</code>),再尝试 <code>ASR_FALLBACK_MODEL</code>。后端会拒绝重复文本、逐秒假字幕或覆盖率过低的结果,不再把不可听的多模态输出写进时间轴。再用 <code>ASR_FALLBACK_MODEL</code> 多模态音频分析讲话人、语速节奏、停顿、背景音乐/环境声/音效,写入 <code>speaker_profile</code><code>rhythm_profile</code><code>background_audio_profile</code>。当前第一步不默认生成 SKG 新口播和 MiniMax 配音。</td></tr>
<tr><td>原始音频文件</td><td><code>GET /jobs/{id}/audio.wav</code></td><td><code>sourceAudioUrl</code></td><td>返回拆轨得到的 wav当前主界面不再渲染底部音频条右侧音频解析工作表直接使用 <code>transcript</code><code>audio_script</code> 展示文字与声音分析结果</td></tr>
<tr><td>原始音频文件</td><td><code>GET /jobs/{id}/audio.wav</code></td><td><code>sourceAudioUrl</code></td><td>返回拆轨得到的 wav当前主界面不再渲染底部吸附音频条,右侧音频解析工作表会读取该文件生成横向波形,并和原视频、逐句时间轴联动</td></tr>
<tr><td>改写配音文件</td><td><code>GET /jobs/{id}/audio-script.mp3</code></td><td><code>apiAssetUrl(job.audio_script.voice_url)</code></td><td>后续新配音阶段保留的 MiniMax T2A 产物。当前第一步不默认生成该文件。</td></tr>
<tr><td>手动加帧</td><td><code>POST /jobs/{id}/frames?t=</code></td><td><code>addManualFrame</code></td><td>按视频时间戳抽一帧index 递增但 frames 按 timestamp 排序。</td></tr>
<tr><td>Vision 识别</td><td><code>POST /frames/{idx}/describe</code></td><td><code>describeFrame</code></td><td>写入 frame.description后续可从 objects 加候选元素。</td></tr>
@@ -941,6 +941,18 @@ SubjectAsset {
<h2>变更记录</h2>
<p>这个记录不是 git log 的替代品。它记录“产品理解发生了什么变化、影响了哪些源码、你以后描述需求时该怎么说”。后续每次改功能都要补一条。</p>
<div class="changelog">
<article class="change">
<header>
<h3>2026-05-17 · 增加原视频与波形联动审片</h3>
<span class="tag rose">UI</span>
<span class="tag cyan">Workflow</span>
</header>
<div class="body">
<p><strong>问题:</strong>只有逐句时间轴难以判断原视频节奏、停顿和爆点位置;用户需要边看原版视频边看字幕,并通过音频波形快速识别结构切割点。</p>
<p><strong>改动:</strong><code>web/components/ad-recreation-board.tsx</code> 在音频解析结果面板中加入原视频播放器和横向音频波形,布局为左侧原视频、右侧逐句时间轴、底部波形。视频播放时根据 <code>currentTime</code> 高亮并自动滚动当前字幕行;点击波形或字幕行会跳转原视频时间。波形从 <code>audio.wav</code> 解码生成,失败时用本地 fallback peaks 保持布局可用。</p>
<p><strong>影响:</strong><code>web/components/ad-recreation-board.tsx</code><code>docs/source-analysis.html</code>。后续如果要做爆点标记,应基于当前波形和字幕时间轴继续加 marker而不是恢复底部音频条。</p>
</div>
</article>
<article class="change">
<header>
<h3>2026-05-17 · 收紧音频解析第一步版面</h3>

View File

@@ -1,6 +1,6 @@
"use client"
import { type ReactNode, type RefObject, useEffect, useRef, useState } from "react"
import { type ReactNode, type RefObject, useEffect, useMemo, useRef, useState } from "react"
import {
AlertTriangle, Check, ChevronDown, Circle, Film, FileText, Image as ImageIcon, Link2, Loader2,
Mic, Package, PanelRight, Play, Plus, Scissors, Sparkles, Trash2, Upload, Wand2,
@@ -24,6 +24,7 @@ import {
generatedImageUrl,
hasCutout,
representativeCutoutUrl,
sourceAudioUrl,
updateStoryboard,
videoUrl,
} from "@/lib/api"
@@ -92,6 +93,47 @@ function formatSeconds(raw?: number) {
return `${raw.toFixed(1)}s`
}
function clampNumber(value: number, min: number, max: number) {
return Math.min(max, Math.max(min, value))
}
function fallbackPeaks(count: number, seedText: string) {
let seed = 0
for (let i = 0; i < seedText.length; i++) seed = (seed * 31 + seedText.charCodeAt(i)) % 9973
return Array.from({ length: count }, (_, i) => {
const wave = Math.sin((i + seed) * 0.39) * 0.35 + Math.sin((i + seed) * 0.12) * 0.24
const pulse = ((i + seed) % 11) / 24
return clampNumber(0.18 + Math.abs(wave) + pulse, 0.12, 1)
})
}
async function decodeWaveform(url: string, targetPeaks = 180) {
const res = await fetch(url)
if (!res.ok) throw new Error(`audio ${res.status}`)
const arrayBuffer = await res.arrayBuffer()
const AudioContextClass = window.AudioContext || (window as typeof window & { webkitAudioContext?: typeof AudioContext }).webkitAudioContext
if (!AudioContextClass) throw new Error("AudioContext unavailable")
const ctx = new AudioContextClass()
try {
const buffer = await ctx.decodeAudioData(arrayBuffer.slice(0))
const data = buffer.getChannelData(0)
const bucket = Math.max(1, Math.floor(data.length / targetPeaks))
let maxPeak = 0.01
const raw: number[] = []
for (let i = 0; i < targetPeaks; i++) {
const start = i * bucket
const end = Math.min(data.length, start + bucket)
let peak = 0
for (let j = start; j < end; j++) peak = Math.max(peak, Math.abs(data[j] || 0))
raw.push(peak)
maxPeak = Math.max(maxPeak, peak)
}
return raw.map((p) => clampNumber(p / maxPeak, 0.08, 1))
} finally {
void ctx.close().catch(() => {})
}
}
function frameLabel(frame: KeyFrame, order: number) {
return `S${String(order + 1).padStart(2, "0")} · ${frame.timestamp.toFixed(1)}s`
}
@@ -409,10 +451,6 @@ export function AdRecreationBoard({
<div className="min-h-0 flex-1 overflow-y-auto p-4">
<AudioIntakePanel job={job} />
</div>
<footer className="shrink-0 border-t border-white/10 p-3">
<AudioStepSummary job={job} audioReady={audioReady} />
</footer>
</section>
</div>
</div>
@@ -540,17 +578,62 @@ function AudioIntakeStatus({ job, audioReady }: { job: Job | null; audioReady: b
}
function AudioIntakePanel({ job }: { job: Job | null }) {
if (!job) {
return <EmptyState text="先在左侧粘贴 TK 链接或上传本地视频。点击开始后,会先下载视频,再自动解析原音频文案、讲话人节奏和背景音。" />
}
const script = job.audio_script
const [currentTime, setCurrentTime] = useState(0)
const [mediaDuration, setMediaDuration] = useState(0)
const [peaks, setPeaks] = useState<number[]>(() => fallbackPeaks(180, "initial-waveform"))
const videoRef = useRef<HTMLVideoElement | null>(null)
const rowRefs = useRef<Record<number, HTMLDivElement | null>>({})
const script = job?.audio_script
const audioSrcUrl = job ? apiAssetUrl(job.source_audio_url) || sourceAudioUrl(job.id) : ""
const profiles = [
{ label: "讲话人", value: script?.speaker_profile },
{ label: "节奏", value: script?.rhythm_profile },
{ label: "背景音", value: script?.background_audio_profile },
]
const processing = job.status === "transcribing" || script?.status === "rewriting"
const processing = !!job && (job.status === "transcribing" || script?.status === "rewriting")
const timelineDuration = useMemo(() => {
if (!job) return 1
const lastTranscriptEnd = job.transcript.reduce((max, segment) => Math.max(max, segment.end || 0), 0)
return Math.max(
mediaDuration,
job.duration ?? 0,
lastTranscriptEnd,
1,
)
}, [job, mediaDuration])
const activeSegment = job?.transcript.find((segment) => currentTime >= segment.start && currentTime <= Math.max(segment.end, segment.start + 0.2))
useEffect(() => {
if (!job?.id || !audioSrcUrl) return
setCurrentTime(0)
setMediaDuration(0)
setPeaks(fallbackPeaks(180, `${job.id}-loading`))
let cancelled = false
decodeWaveform(audioSrcUrl)
.then((next) => {
if (!cancelled) setPeaks(next)
})
.catch(() => {
if (!cancelled) setPeaks(fallbackPeaks(180, job.id))
})
return () => { cancelled = true }
}, [audioSrcUrl, job?.id])
useEffect(() => {
if (activeSegment) rowRefs.current[activeSegment.index]?.scrollIntoView({ block: "nearest" })
}, [activeSegment?.index])
const seekTo = (time: number) => {
const next = clampNumber(time, 0, timelineDuration)
if (videoRef.current) videoRef.current.currentTime = next
setCurrentTime(next)
}
if (!job) {
return <EmptyState text="先在左侧粘贴 TK 链接或上传本地视频。点击开始后,会先下载视频,再自动解析原音频文案、讲话人节奏和背景音。" />
}
const videoSrcUrl = apiAssetUrl(job.video_url) || videoUrl(job.id)
return (
<section className="rounded-lg border border-white/10 bg-black/28 p-2.5">
@@ -568,34 +651,141 @@ function AudioIntakePanel({ job }: { job: Job | null }) {
))}
</div>
<div className="mb-2 flex items-center justify-between gap-3 border-t border-white/8 pt-2">
<SectionTitle icon={<FileText className="h-4 w-4" />} title="逐句时间轴" />
<span className="rounded-md border border-white/10 bg-black/35 px-2 py-1 text-[11px] text-white/45">{job.transcript.length} </span>
</div>
{job.transcript.length ? (
<div className="overflow-hidden rounded-md border border-white/10">
<div className="grid grid-cols-[82px_minmax(0,1fr)_minmax(0,1fr)] border-b border-white/10 bg-white/[0.04] px-3 py-1.5 text-[11px] font-semibold text-white/50">
<div></div>
<div></div>
<div></div>
<div className="grid gap-2 border-t border-white/8 pt-2">
<div className="rounded-md border border-white/10 bg-black/32 p-2">
<div className="mb-1 flex items-center justify-between text-[10px] text-white/40">
<span> / </span>
<span className="font-mono">{currentTime.toFixed(1)}s</span>
</div>
<AudioWaveform
peaks={peaks}
currentTime={currentTime}
duration={timelineDuration}
segments={job.transcript}
onSeek={seekTo}
/>
</div>
<div className="grid gap-2 xl:grid-cols-[230px_minmax(0,1fr)]">
<div className="min-w-0">
<div className="mb-2 flex items-center justify-between gap-3">
<SectionTitle icon={<Play className="h-4 w-4" />} title="原版视频" />
<span className="font-mono text-[11px] text-white/38">{currentTime.toFixed(1)}s / {formatSeconds(timelineDuration)}</span>
</div>
<div className="max-h-[164px] overflow-y-auto">
{job.transcript.map((segment) => (
<div key={segment.index} className="grid grid-cols-[82px_minmax(0,1fr)_minmax(0,1fr)] gap-3 border-b border-white/8 px-3 py-1.5 text-[11.5px] leading-snug text-white/64 last:border-b-0">
<div className="font-mono text-[11px] text-white/38">{segment.start.toFixed(1)}-{segment.end.toFixed(1)}s</div>
<div className="truncate" title={segment.en}>{segment.en || <span className="text-white/30">-</span>}</div>
<div className="truncate" title={segment.zh}>{segment.zh || <span className="text-white/30"></span>}</div>
</div>
))}
<div className="overflow-hidden rounded-md border border-white/10 bg-black/45">
{job.video_url ? (
<video
ref={videoRef}
controls
playsInline
className="h-[238px] w-full bg-black object-contain"
src={videoSrcUrl}
onTimeUpdate={(event) => setCurrentTime(event.currentTarget.currentTime)}
onSeeked={(event) => setCurrentTime(event.currentTarget.currentTime)}
onLoadedMetadata={(event) => {
setMediaDuration(Number.isFinite(event.currentTarget.duration) ? event.currentTarget.duration : 0)
setCurrentTime(event.currentTarget.currentTime)
}}
/>
) : (
<div className="flex h-[238px] items-center justify-center text-[12px] text-white/38"></div>
)}
</div>
</div>
) : (
<EmptyState text={processing ? "音频解析中,完成后这里会按时间列出原文案和中文翻译。" : "下载完成后会自动解析音频;也可以点击右上角“解析音频”手动重试。"} />
)}
<div className="min-w-0">
<div className="mb-2 flex items-center justify-between gap-3">
<SectionTitle icon={<FileText className="h-4 w-4" />} title="逐句时间轴" />
<span className="rounded-md border border-white/10 bg-black/35 px-2 py-1 text-[11px] text-white/45">{job.transcript.length} </span>
</div>
{job.transcript.length ? (
<div className="overflow-hidden rounded-md border border-white/10">
<div className="grid grid-cols-[82px_minmax(0,1fr)_minmax(0,1fr)] border-b border-white/10 bg-white/[0.04] px-3 py-1.5 text-[11px] font-semibold text-white/50">
<div></div>
<div></div>
<div></div>
</div>
<div className="max-h-[238px] overflow-y-auto">
{job.transcript.map((segment) => {
const active = activeSegment?.index === segment.index
return (
<div
key={segment.index}
ref={(node) => { rowRefs.current[segment.index] = node }}
onClick={() => seekTo(segment.start)}
className={`grid cursor-pointer grid-cols-[82px_minmax(0,1fr)_minmax(0,1fr)] gap-3 border-b px-3 py-1.5 text-[11.5px] leading-snug transition last:border-b-0 ${
active
? "border-emerald-300/18 bg-emerald-300/[0.12] text-white"
: "border-white/8 text-white/64 hover:bg-white/[0.045]"
}`}
>
<div className={`font-mono text-[11px] ${active ? "text-emerald-100" : "text-white/38"}`}>{segment.start.toFixed(1)}-{segment.end.toFixed(1)}s</div>
<div className="truncate" title={segment.en}>{segment.en || <span className="text-white/30">-</span>}</div>
<div className="truncate" title={segment.zh}>{segment.zh || <span className="text-white/30"></span>}</div>
</div>
)
})}
</div>
</div>
) : (
<EmptyState text={processing ? "音频解析中,完成后这里会按时间列出原文案和中文翻译。" : "下载完成后会自动解析音频;也可以点击右上角“解析音频”手动重试。"} />
)}
</div>
</div>
</div>
</section>
)
}
function AudioWaveform({
peaks,
currentTime,
duration,
segments,
onSeek,
}: {
peaks: number[]
currentTime: number
duration: number
segments: Array<{ start: number; end: number }>
onSeek: (time: number) => void
}) {
const pointerPct = clampNumber((currentTime / Math.max(duration, 1)) * 100, 0, 100)
return (
<div
className="relative h-16 cursor-pointer overflow-hidden rounded-md border border-white/10 bg-black/35 px-2"
onClick={(event) => {
const rect = event.currentTarget.getBoundingClientRect()
onSeek(((event.clientX - rect.left) / Math.max(rect.width, 1)) * duration)
}}
>
<div className="absolute inset-y-1 left-2 right-2 flex items-center gap-[2px]">
{peaks.map((peak, index) => (
<div
key={index}
className="flex-1 rounded-full bg-cyan-200/55"
style={{
height: `${Math.round(6 + peak * 42)}px`,
opacity: 0.32 + peak * 0.48,
}}
/>
))}
</div>
{segments.map((segment, index) => (
<div
key={`${segment.start}-${index}`}
className="absolute inset-y-0 w-px bg-white/12"
style={{ left: `${clampNumber((segment.start / Math.max(duration, 1)) * 100, 0, 100)}%` }}
/>
))}
<div
className="pointer-events-none absolute inset-y-0 w-[2px] bg-emerald-200 shadow-[0_0_16px_rgba(110,231,183,0.85)]"
style={{ left: `${pointerPct}%` }}
/>
</div>
)
}
function ProfileTile({ label, value, running }: { label: string; value?: string; running?: boolean }) {
return (
<div className="min-h-[74px] rounded-md border border-white/10 bg-black/35 p-2.5">
@@ -1039,29 +1229,6 @@ function SegmentBand({ icon, title, children }: { icon: ReactNode; title: string
)
}
function AudioStepSummary({ job, audioReady }: { job: Job | null; audioReady: boolean }) {
const downloading = !!job && ["created", "downloading"].includes(job.status)
const audioRunning = !!job && (job.status === "transcribing" || job.audio_script?.status === "rewriting")
return (
<div className="flex items-center justify-between gap-3 rounded-lg border border-white/10 bg-black/35 px-3 py-2">
<div className="flex min-w-0 items-center gap-2">
<PanelRight className="h-4 w-4 shrink-0 text-rose-200" />
<div className="min-w-0">
<div className="text-[13px] font-semibold text-white"></div>
<div className="truncate text-[11px] text-white/40">
{job?.message || "等待素材输入;完成后再进入分镜规划和素材生成。"}
</div>
</div>
</div>
<div className="flex shrink-0 items-center gap-2 text-[11px] text-white/52">
<Requirement label="下载" ready={!!job?.video_url} detail={downloading ? "running" : job?.video_url ? "ready" : "wait"} />
<Requirement label="音频" ready={!!job?.source_audio_url} detail={audioRunning ? "running" : job?.source_audio_url ? "ready" : "wait"} />
<Requirement label="文案" ready={audioReady} detail={audioReady ? `${job?.transcript.length ?? 0}` : "wait"} />
</div>
</div>
)
}
function ComposeSummary({
audioReady,
selectedVideoCount,