13 Commits

16 changed files with 3264 additions and 2391 deletions

View File

@@ -1,6 +1,6 @@
# 项目接力
- 生成时间May 21, 2026 at 02:06
- 生成时间May 22, 2026 at 09:01
- 项目SKG Marketing Studio / SKG 营销内容工作台
- 路径:/Users/kangwan/Projects/business/20260512-20260512-skg-tk-二创验证
- 状态active
@@ -9,7 +9,7 @@
## 最近助手会话概览
- Claudea9e0449c-d9cb-4a2a-bb16-16596dfb552a · 时间未知
- Codex019e447d-68c7-7db1-a499-b5eb6a98a7c2 · 时间未知
- Codex019e49d1-d114-7d52-8cf0-fbb4585f5ce6 · 时间未知
- Cursor未找到匹配当前项目的最近会话
## Claude 最近会话
@@ -92,42 +92,57 @@
## Codex 最近会话
- Session ID019e447d-68c7-7db1-a499-b5eb6a98a7c2
- Transcript/Users/kangwan/.codex/sessions/2026/05/20/rollout-2026-05-20T16-25-30-019e447d-68c7-7db1-a499-b5eb6a98a7c2.jsonl
- Session ID019e49d1-d114-7d52-8cf0-fbb4585f5ce6
- Transcript/Users/kangwan/.codex/sessions/2026/05/21/rollout-2026-05-21T17-15-47-019e49d1-d114-7d52-8cf0-fbb4585f5ce6.jsonl
- 工作目录:/Users/kangwan/Projects/business/20260512-20260512-skg-tk-二创验证
- 分支main
- 敏感字段:已对 token / key / password / secret 做脱敏
### 最近用户要求
- 我看的没改变呀 我感觉这个已经彻底调整崩了
- figma mcp是通的么
- 连一些被 我都打开了
- 分镜这个板块,我每次点击视频 都变成了选择 其实我就是想打开看下而已 你这个逻辑不太对 你看下
- 为什么要选择呢 你给我点击下载的按钮不就好了
- 而且每个生成的视频当然是自动保存呀 还需要我点击保存干啥
### Codex 最近回复
> 现在命令行访问本机端口被沙箱拦了一次,`nc` 明确返回 `Operation not permitted`。我用已授权外部执行再测一次这样能区分“Figma 没开”和“Codex 沙箱不让连本地端口”
> 生产部署正在跑第二轮远端备份和同步已经完成Web 镜像在构建;我等脚本验证结束
> 已连上本地 Figma Desktop MCP
> 对,这个点你说得对。生成的视频本来就会写进当前 job 的 `generated_videos`,完成后自动回填 mp4不应该让用户理解成还要点“保存视频”
>
> 做了这些验证
> 已经改完并上线
>
> - Figma 桌面端已打开 Design 文件,并显示 `MCP server enabled on http://127.0.0.1:3845/mcp`
> - 本机端口确认监听:`Figma ... 127.0.0.1:3845 (LISTEN)`
> - MCP `initialize` 成功返回:`Figma Dev Mode MCP Server 1.0.0`
> - 工具列表可用:`get_design_context`、`get_screenshot`、`get_metadata`、`get_variable_defs`
> - 实际调用 `get_metadata` 成功,返回当前文件页面:`0:1: Page 1`
> - 分镜视频候选轨现在显示 `自动保存`
> - 视频候选普通点击只负责打开预览
> - 右上角提供显式 `下载` 按钮
> - “保存全部 / 保存本条规划”改成了“同步规划 / 同步本条规划”,只表达同步三字段文案规划,不再像保存视频
> - 源码解析页也同步更新了这条交互语义
>
> 注意:官方远程 `figma` 那条还没通,因为当前环境没有 `FIGMA_OAUTH_TOKEN``codex mcp login figma` 也返回了 `invalid_redirect_uri`。但本地 `figma-desktop` 这条已经能用。Figma 官方文档也是这个流程:打开桌面端 Design 文件,启用本地 MCP然后配置 `http://127.0.0.1:3845/mcp`。
> 验证和发布:
>
> 当前仓库没改代码,状态还是:
> - `web/npm run build` 通过
> - 已推送 Gitea
> - `4efb2ce fix: make storyboard video clicks previewable`
> - `eb4347a fix: clarify storyboard videos auto-save`
> - 已执行 `./scripts/deploy-prod-safe.sh` 上线,生产验证通过:`/login/ 200`、`/api/health 401`、容器内 `api:health ok`
>
> 当前 `git status -sb`
>
> ```text
> ## main...origin/main
> M .memory/worklog.json
> ```
>
> 来源Figma 官方 Codex MCP 设置文档 https://help.figma.com/hc/en-us/articles/39888629089175-Codex-and-Figma-Set-up-the-MCP-server
> 只剩自动日志 `.memory/worklog.json`,业务代码和文档都已提交并推送。
>
> 补充:`npm run lint` 现在跑不了,因为项目里没有可执行的 `eslint``tsc --noEmit` 仍有一批既有的 dashboard/ui 旧类型错误,不是这次改动引入的。
>
> <oai-mem-citation>
> <citation_entries>
> MEMORY.md:214-286|note=[used SKG worktable context and docs update expectation]
> </citation_entries>
> <rollout_ids>
> 019e2fef-97a7-7523-a1ff-051a8c6…
## Cursor 最近会话
@@ -136,8 +151,8 @@
## 当前仓库状态
- 当前分支main
- 未提交变更:8
- 最近提交:docs: record image timeout deployment
- 未提交变更:1
- 最近提交:fix: clarify storyboard videos auto-save
- 变更文件:
- M .memory/worklog.json

File diff suppressed because it is too large Load Diff

View File

@@ -78,6 +78,11 @@
"type" : "backend",
"url" : "https:\/\/marketing.skg.com\/api"
},
{
"label" : "agent-cut-preview",
"type" : "app",
"url" : "http:\/\/2.24.28.41:4290\/agent\/"
},
{
"label" : "git",
"type" : "repo",

View File

@@ -11,11 +11,14 @@
- 详见 `CLAUDE.md` 立项决策段 + `.memory/plan.md` 七步管线拆解
- 风格:`04-Dark-Gallery-Ambient`(路径:`~/Projects/research/20260305-网页风格库/04-Dark-Gallery-Ambient.md`
- 第一冲刺:步骤 1-4下载 / 拆轨 / 关键帧 / ASR+翻译)
- 当前产品方向2026-05-20 再确认):信息流广告快速复刻默认进入“三字段候选生成”工作流。主界面为“左侧素材输入列 + 右侧信息流复刻工作表”;工作台已取消 1800x1000 固定画布和整页缩放,改为正常流式桌面容器,宽度跟随浏览器展开,只保留 1280px 最低操作宽度防止核心表格被压烂,不再通过应用层 `zoom` 把整页缩小导致文字发虚。用户粘贴 TK 链接或上传视频后点击“开始分析”,系统自动下载源视频;下载完成后并行启动两条路:音频文案路提取原音频文案/字幕,并分析讲话人、语速节奏、背景音乐/环境声/音效;视频视觉路自动抽取参考帧。源视频工作区主体链路改为“上方参考帧池 + 转换层、下方主体元素结果栏”:参考帧池竖向排列;转换层是轻量对话式生图确认区,参考图可通过左侧缩略图 `+`、参考帧拖拽、胶片拖拽或本地图片拖入进入转换层,用户选择 GPT/Gemini 套件后先分析参考图;识别结果里的特征 chip 只作为“保留元素”本地选择,点亮=保留、再点取消,点击不立即请求模型,随下一条发送消息提交;用户再在下方发送区发送复刻/创新/卡通和画面要求,界面只保留生成要求输入框、张数控件和提示词就绪状态,不展示当前要求摘要、保留元素副本、收起记录计数或重复模型确认话术,生成数量通过发送区旁边的张数控件控制;后端返回英文出图 prompt 后不再自动弹窗,发送区主按钮直接切换为“确认生成 N 张”,用户点击才生成对应数量的统一多角度套图。主体元素结果栏在转换层下方横向展示套图输出、文件夹分组、单张重生、删除和 hover 预览,空态只保留紧凑提示,不再挤占右侧整列。旧下方“相似主体 / 主体模板库”不再作为主路径。波形下方的画面胶片只是临时预览,点击只跳转原视频时间点,双击或拖进参考帧池才正式加入关键帧,已加入的胶片直接显示“已添加”。产品图上传后独立形成产品资产包,自动识别视角/结构/比例并补缺角度。分镜工作台按逐句时间轴默认只露“文案 / 场景一句话 / 人物+产品+动作”,产品素材池、批量控制、三字段、视频候选和高级区都必须可折叠;视频候选无内容时默认不占大面积,有候选时默认只显示迷你缩略条,展开后才显示 4-grid。单条默认生成 4 个视频候选,顶部支持整片批量生成候选;首尾帧、视觉规划、产品出现方式和旧 6 字段保留在“高级”抽屉与后端 quick-plan 自动展开中,不能再作为客户默认闸门。
- 当前产品方向2026-05-20 再确认):信息流广告快速复刻默认进入“三字段候选生成”工作流。主界面为“左侧素材输入列 + 右侧信息流复刻工作表”;工作台已取消 1800x1000 固定画布和整页缩放,改为正常流式桌面容器,宽度跟随浏览器展开,只保留 1280px 最低操作宽度防止核心表格被压烂,不再通过应用层 `zoom` 把整页缩小导致文字发虚。用户粘贴 TK 链接或上传视频后点击“开始分析”,系统自动下载源视频;下载完成后并行启动两条路:音频文案路自动识别中文、英文和其他多语言原音频文案/字幕,统一补齐中文镜像,并分析讲话人、语速节奏、背景音乐/环境声/音效;视频视觉路自动抽取参考帧。源视频工作区主体链路改为“上方参考帧池 + 转换层、下方主体元素结果栏”:参考帧池竖向排列;转换层是轻量对话式生图确认区,参考图可通过左侧缩略图 `+`、参考帧拖拽、胶片拖拽或本地图片拖入进入转换层,用户选择 GPT/Gemini 套件后先分析参考图;识别结果里的特征 chip 只作为“保留元素”本地选择,点亮=保留、再点取消,点击不立即请求模型,随下一条发送消息提交;用户再在下方发送区发送复刻/创新/卡通和画面要求,界面只保留生成要求输入框、张数控件和提示词就绪状态,不展示当前要求摘要、保留元素副本、收起记录计数或重复模型确认话术,生成数量通过发送区旁边的张数控件控制;后端返回英文出图 prompt 后不再自动弹窗,发送区主按钮直接切换为“确认生成 N 张”,用户点击才生成对应数量的统一多角度套图。主体元素结果栏在转换层下方横向展示套图输出、文件夹分组、单张重生、删除和 hover 预览,空态只保留紧凑提示,不再挤占右侧整列。旧下方“相似主体 / 主体模板库”不再作为主路径。波形下方的画面胶片只是临时预览,点击只跳转原视频时间点,双击或拖进参考帧池才正式加入关键帧,已加入的胶片直接显示“已添加”。产品图上传后独立形成产品资产包,自动识别视角/结构/比例并补缺角度。分镜工作台按逐句时间轴默认只露“文案 / 场景一句话 / 人物+产品+动作”,产品素材池、批量控制、三字段、视频候选和高级区都必须可折叠;视频候选无内容时默认不占大面积,有候选时默认只显示迷你缩略条,展开后才显示 4-grid。单条默认生成 4 个视频候选,顶部支持整片批量生成候选;首尾帧、视觉规划、产品出现方式和旧 6 字段保留在“高级”抽屉与后端 quick-plan 自动展开中,不能再作为客户默认闸门。
## 部署事实
- 平台VPS `76.13.31.179`Ubuntu 24.04 / Docker Compose / Coolify Traefik
- Agent Cut 独立预览服务器:`2.24.28.41`Ubuntu 24.04 / Docker Compose / 裸端口 `4290`),部署目录 `/opt/skg-marketing-studio`Compose 入口 `docker-compose.standalone.yml`,访问地址 `http://2.24.28.41:4290/agent/`。该入口用于“一分钟二创出片终端”预览:用户只提交 TikTok 链接和产品图,后端 `AgentRun` 状态机负责下载、抽帧、规划、生成、自动重跑、审片和合成。
- Agent Cut 独立预览验证2026-05-21已在 `2.24.28.41``/opt/skg-marketing-studio``docker-compose.standalone.yml` 启动 `skg-agent-api` / `skg-agent-web`;独立 compose 通过网络别名兼容 Nginx 的 `skg-marketing-api` upstream。该裸 IP HTTP 入口的服务器 `deploy/.env.production` 需要 `WEB_AUTH_COOKIE_SECURE=false`;本次已补齐 `WEB_AUTH_*` 后重启验证通过:未登录 `/agent/` 返回 302 到 `/login/`,登录后 `/agent/` 返回 200`/api/agent-runs` 返回数组,容器内 `/health` 返回 `ok:true``auth_configured:true`
- 发布状态已部署并验证2026-05-20主体元素按套图文件夹分组展示主体生成接口提交后立即返回 queued 占位并后台逐视角生成、逐张回填;工作台外层取消 1800x1000 固定画布和应用层 `zoom` 缩放,改为正常流式桌面容器,最低操作宽度 1280px源视频工作区主体链路为上方竖向参考帧池 + 宽幅对话式转换层、下方主体元素结果栏;转换层通过参考帧 `+` 加入、参考图分析、生图对话,英文 prompt 就绪后由发送区主按钮切换为确认生成,点击后才触发主体套图生成;转换层不再固定 640px 长高,按内容自然高度显示,仅以 560px 最大高度兜底内部滚动;下方主体元素结果栏的套图输出、轮询、文件夹分组、单张重生、删除和 hover 预览逻辑保持不变;胶片双击/拖拽加入参考帧池 + 胶片缓存复用 + 音频解析失败可重试,参考帧缩略图保持小尺寸 9:16 比例 + hover 左侧紧凑预览,旧主体模板区移出主路径 + 逐句时间轴移到原版视频下方并支持双行文案 + 波形同框时间对齐画面胶片 + 胶片密度按钮上移波形顶部 + 去分隔线 + 胶片上下错落 + body 顶层原位大放大 + 隐藏源视频工作区音频解析摘要卡 + 隐藏工作区顶部状态提示条 + 三字段候选生成工作流 + 折叠紧凑候选区);`https://marketing.skg.com` 已启用应用内登录页,未登录 API 返回 401认证后首页 200容器内 `/health` 返回 `ok:true`
- 最近部署验证2026-05-22`6427935` 已通过 `./scripts/deploy-prod-safe.sh` 部署到 `/opt/skg-marketing-studio`;部署前备份为 `/opt/skg-marketing-studio-backups/skg-marketing-preserve-20260522012756.tgz`,生产 Docker 重建后脚本内验证通过web/API 容器 Up、`/` 302、`/login/` 200、缺失 `_next` 资源 404、未登录 `/api/health` 401、容器内 `api:health ok``api:ytdlp_cookie_args []`)。部署后已把生产私有 `deploy/.env.production` 明确固定为多语言本地 ASR 路径并重启 API`ASR_LANGUAGE=auto``FASTER_WHISPER_MODEL=base``ASR_REMOTE_ENABLED=false``ASR_LOCAL_FALLBACK_ENABLED=true``ASR_AUDIO_FALLBACK_ENABLED=false`;复验 `./scripts/verify-prod-docker.sh root@76.13.31.179` 通过,容器内 `/health` 确认 `asr_language=auto``faster_whisper=base`
- 最近部署验证2026-05-21`8458dac` 已按“先本地 Docker、再上传部署”流程上线。上线前在本机 Docker 构建 `skg-marketing-studio-web:latest` / `skg-marketing-studio-api:latest`,并用本地 Compose 容器验证通过:`web:/ 302``web:/login/ 200``web:/_next/does-not-exist.js 404``web:/api/health 401``api:health ok``api:ytdlp_cookie_args []`、静态 bundle 包含 `未来健康 · 营销内容工作台``信息流广告复刻生产`,且未发现本地 API/dev URL 泄漏。随后通过 `./scripts/deploy-prod-safe.sh` 部署到 `/opt/skg-marketing-studio`;部署前备份为 `/opt/skg-marketing-studio-backups/skg-marketing-preserve-20260521070327.tgz`,生产 Docker 重建后脚本内验证通过web/API 容器 Up、`/` 302、`/login/` 200、缺失 `_next` 资源 404、未登录 `/api/health` 401、容器内 `api:health ok`)。
- 最近部署验证2026-05-20`6597db3` 已通过 `./scripts/deploy-prod-safe.sh` 部署到 `/opt/skg-marketing-studio`;部署前备份为 `/opt/skg-marketing-studio-backups/skg-marketing-preserve-20260520151033.tgz`,生产 Docker 重建后脚本内验证通过(`web:/login/ 200``web:/api/health 401``api:health ok`)。线上登录后检查首页静态资源,当前加载 chunk `/_next/static/chunks/c48f07b9aef1cd29.js` 已包含 `min-w-[1280px]``max-w-[1920px]`,未再命中旧的 `h-[1000px]``w-[1800px]``BOARD_SCALE_PRESETS``boardScale`;对应工作台取消固定画布缩放,按浏览器正常流式布局渲染。
- 最近部署验证2026-05-20`2b842fd` 已通过 `./scripts/deploy-prod-safe.sh` 部署到 `/opt/skg-marketing-studio`;部署前备份为 `/opt/skg-marketing-studio-backups/skg-marketing-preserve-20260520145223.tgz`,生产 Docker 重建后脚本内验证通过(`web:/login/ 200``web:/api/health 401``api:health ok`)。线上登录后检查首页静态资源,当前加载 chunk `/_next/static/chunks/743b82648dfa9db9.js` 已包含 `h-32``maxHeight:560``提示词就绪``确认生成`,且未再命中旧的 `height:640` / `h-40`;对应转换层取消固定长高,生成要求输入区回到 128px底部仍由发送区主按钮确认生成。
@@ -57,9 +60,10 @@
- 服务器目录:`/opt/skg-marketing-studio`
- 生产部署唯一入口:`./scripts/deploy-prod-safe.sh`(先在服务器备份 `deploy/.env.production``data/jobs`、资源库和 `secrets`,再用受保护 rsync 同步代码,最后 Docker 重建并运行 `verify-prod-docker.sh`
- 生产容器重建命令:`docker compose -f docker-compose.prod.yml --env-file deploy/.env.production up -d --build`;只允许脚本内部或明确只重启容器时使用,不允许再用裸 `rsync --delete` 手动同步。
- 独立预览容器重建命令:服务器 `/opt/skg-marketing-studio` 下执行 `docker compose -f docker-compose.standalone.yml --env-file deploy/.env.production up -d --build`Web 暴露 `0.0.0.0:4290->80`,后端仅在 compose 内部网络暴露,`/api/` 由 Web 容器 Nginx 反代并复用应用内登录校验。
- 生产架构:`web` 容器用 Nginx 承载 Next 静态导出;`/login/``/_next/``/assets/``/skg-logo-black.svg``/oasis-source/` 等登录页必需静态资源公开访问;未登录访问工作台跳转 `/login/``/api/` 通过 Nginx `auth_request` 校验 FastAPI 会话 Cookie 后反代到 `skg-marketing-api:4291`Traefik 通过 `coolify` 外部网络接入 80/443
- Web 验收必须以生产 Docker 形态为准:前端是 `next export` 静态产物 + Nginx不是 `next dev` / `next start`。任何 Web 改动部署后必须运行 `./scripts/verify-prod-docker.sh`,确认 `/login/``/_next/``/api/health`、本地 API 地址泄漏和 API 镜像 `.env` 污染检查通过;不能只用本地 `npm run build` 作为上线依据。
- 当前音频解析:`https://ai.skg.com/azure/v1``gpt-4o-transcribe` 当前返回 `DeploymentNotFound`,且官方 Azure OpenAI transcription 路径探测也未返回可用部署;生产临时复制本地成功策略,直接使用容器内 `faster-whisper tiny.en` 真实转写,关闭 Gemini 多模态音频兜底。拿到真实 Azure ASR deployment 名后再恢复 `ASR_REMOTE_ENABLED=true`
- 当前音频解析:`https://ai.skg.com/azure/v1``gpt-4o-transcribe` 当前返回 `DeploymentNotFound`,且官方 Azure OpenAI transcription 路径探测也未返回可用部署;生产临时复制本地成功策略,直接使用容器内多语言 `faster-whisper` 真实转写,默认语种为 `auto`,支持中文、英文和其他多语言原文识别,关闭 Gemini 多模态音频兜底。拿到真实 Azure ASR deployment 名后再恢复 `ASR_REMOTE_ENABLED=true`,并保持 `ASR_LANGUAGE` 为空或 `auto`,除非明确只想强制单一语种
- 持久化目录:服务器 `./data/jobs` 挂载到后端 `/data/jobs`;全局资源中心持久化在 `./data/asset_library``./data/prompt_library``./data/_trash`
- TikTok 下载登录态:公开视频默认不带 cookies 直接下载,生产环境变量必须显式保持 `YTDLP_COOKIES_FILE=``YTDLP_COOKIES_FROM_BROWSER=` 为空,防止容器读取不存在的浏览器 cookies。只有 TikTok 明确要求登录态时,才使用服务器私有 cookies 文件 `./secrets/tiktok_cookies.txt` 挂载到 API 容器 `/run/secrets/tiktok_cookies.txt` 并配置 `YTDLP_COOKIES_FILE=/run/secrets/tiktok_cookies.txt``yt-dlp` 会在任务结束时回写 cookies因此不要把该挂载设为只读不要使用云端浏览器读取方案也不要把 cookies 入库。生产容器严禁使用 `YTDLP_COOKIES_FROM_BROWSER=chrome`
- 登录凭证:用户名写下方快捷登录;密码明文备份只放服务器 `/root/skg-marketing-studio-login.txt`,生产环境变量 `WEB_AUTH_PASSWORD` / `WEB_AUTH_SESSION_SECRET` 只放服务器 `deploy/.env.production`
@@ -91,11 +95,11 @@
- `LLM_BASE_URL` / `LLM_API_KEY`OpenAI 兼容网关,用于翻译、文案改写、音频分析等文本/多模态理解模型调用
- `ASR_BASE_URL` / `ASR_API_KEY`OpenAI Audio Transcriptions 兼容网关,用于上传 `audio.wav` 做真实转写;未配置 `ASR_API_KEY` 时复用 `LLM_API_KEY`,生产默认指向 `https://ai.skg.com/azure/v1`
- `ASR_MODEL`OpenAI Audio Transcriptions 音频转写模型;微软通道使用 Azure OpenAI 部署名 `gpt-4o-transcribe`,如果 Azure 侧实际部署名不同必须同步改这里
- `ASR_LANGUAGE`:远端 ASR 的输入语言提示,默认 `en`;微软官方说明指定 ISO-639-1 语言可改善准确率和延迟
- `ASR_LANGUAGE`:远端和本地 ASR 的可选输入语言提示,默认空值/`auto`,由模型自动识别中文、英文和其他多语言;只有明确知道素材固定语种时才填写 ISO-639-1 代码强制识别
- `ASR_REMOTE_ENABLED`:是否启用远端 OpenAI Audio Transcriptions微软 ASR 验收时必须为 `true`。当前生产因 `https://ai.skg.com/azure/v1``gpt-4o-transcribe` 返回 `DeploymentNotFound`,临时设为 `false`,直接走容器内 `faster-whisper`,等真实 Azure deployment 名补齐后再恢复。
- `ASR_LOCAL_FALLBACK_ENABLED`:是否允许远端 ASR 失败后落到本机 / 容器内 ASR当前生产为 `true`,复制本地成功路径的“本机真实转写”策略,云端用 CPU 版 `faster-whisper` 替代本机 Mac 的 `mlx_whisper`
- `ASR_AUDIO_FALLBACK_ENABLED`:是否允许远端和本机 ASR 失败后落到多模态音频兜底;生产微软 ASR 验收设为 `false`,避免静默使用 Gemini 音频
- `FASTER_WHISPER_MODEL` / `FASTER_WHISPER_DEVICE` / `FASTER_WHISPER_COMPUTE_TYPE`:容器内本地 ASR 兜底,仅在 `ASR_LOCAL_FALLBACK_ENABLED=true` 时启用
- `FASTER_WHISPER_MODEL` / `FASTER_WHISPER_DEVICE` / `FASTER_WHISPER_COMPUTE_TYPE`:容器内本地 ASR 兜底,仅在 `ASR_LOCAL_FALLBACK_ENABLED=true` 时启用;默认用多语言 `base`,不要改回 `*.en` 英文专用模型,否则中文和多语言识别会退化。
- `ASR_FALLBACK_MODEL`:多模态音频兜底模型,仅在 `ASR_AUDIO_FALLBACK_ENABLED=true` 时用于兜底或音频画像,默认 `gemini-2.5-flash`;如果模型不能真实听到音频或返回疑似逐秒假字幕,后端必须拒绝写入时间轴
- `ASR_TIMEOUT_SECONDS`:远端 ASR / 翻译 / 音频分析单次请求超时;当前生产本地转写模式设为 45 秒,微软 ASR 重新启用时可按素材长度提高。
- `LOCAL_ASR_BIN` / `LOCAL_ASR_MODEL` / `LOCAL_ASR_TIMEOUT_SECONDS`:本机 ASR 兜底,默认使用 `/opt/homebrew/bin/mlx_whisper` + `mlx-community/whisper-tiny`,用于当前 SKG 网关 `/audio/transcriptions` 不可用时生成真实逐句时间轴

View File

@@ -35,6 +35,6 @@ uvicorn main:app --host 127.0.0.1 --port 4291
- `ffmpeg` 系统二进制(拆轨 / 抽帧)
- `yt-dlp` 系统二进制(也可走 Python 包)
- OpenAI 兼容 LLM 网关ASR / 翻译 / 文案改写 / 视觉 brief远端 `whisper-1` 失败后先走本机 `mlx_whisper`,再用 `ASR_FALLBACK_MODEL` Gemini 多模态音频识别,后端会拒绝疑似假字幕或覆盖率过低的时间轴
- OpenAI 兼容 LLM 网关ASR / 翻译 / 文案改写 / 视觉 briefASR 默认自动识别中文、英文和其他多语言,远端失败后先走容器内多语言 `faster-whisper` / 本机 `mlx_whisper`,再按开关`ASR_FALLBACK_MODEL` 走多模态音频识别,后端会拒绝疑似假字幕或覆盖率过低的时间轴
- GPT 图片网关(当前所有生图 / 修图 / 产品视角识别 / 主体资产 / 首尾帧都强制使用 `gpt-image-2`,不做其他图片模型 fallback
- Azure OpenAI TTS后续新配音阶段使用 `AZURE_OPENAI_API_KEY`;默认模型 `gpt-4o-mini-tts`,按 `AZURE_TTS_PATHS` 依次尝试语音路径)

View File

@@ -0,0 +1,3 @@
{
"items": []
}

View File

@@ -29,6 +29,8 @@ load_dotenv()
JOBS_DIR = Path(os.getenv("JOBS_DIR", "./jobs")).resolve()
JOBS_DIR.mkdir(parents=True, exist_ok=True)
AGENT_RUNS_DIR = Path(os.getenv("AGENT_RUNS_DIR", JOBS_DIR.parent / "agent_runs")).resolve()
AGENT_RUNS_DIR.mkdir(parents=True, exist_ok=True)
CORS_ORIGINS = [o.strip() for o in os.getenv("CORS_ORIGINS", "http://localhost:4290,http://127.0.0.1:4290").split(",") if o.strip()]
PRODUCT_LIBRARY_DIR = Path(
os.getenv("PRODUCT_LIBRARY_DIR", Path(__file__).resolve().parent / "product_library" / "skg-products")
@@ -61,13 +63,13 @@ LLM_API_KEY = os.getenv("LLM_API_KEY", "").strip()
ASR_BASE_URL = os.getenv("ASR_BASE_URL", LLM_BASE_URL).strip()
ASR_API_KEY = (os.getenv("ASR_API_KEY") or LLM_API_KEY).strip()
ASR_MODEL = os.getenv("ASR_MODEL", "whisper-1")
ASR_LANGUAGE = os.getenv("ASR_LANGUAGE", "en").strip()
ASR_LANGUAGE = os.getenv("ASR_LANGUAGE", "").strip()
ASR_REMOTE_ENABLED = os.getenv("ASR_REMOTE_ENABLED", "true").strip().lower() not in {"0", "false", "no", "off"}
ASR_LOCAL_FALLBACK_ENABLED = os.getenv("ASR_LOCAL_FALLBACK_ENABLED", "true").strip().lower() not in {"0", "false", "no", "off"}
ASR_AUDIO_FALLBACK_ENABLED = os.getenv("ASR_AUDIO_FALLBACK_ENABLED", "true").strip().lower() not in {"0", "false", "no", "off"}
ASR_FALLBACK_MODEL = os.getenv("ASR_FALLBACK_MODEL", "gemini-2.5-flash").strip() or "gemini-2.5-flash"
ASR_TIMEOUT_SECONDS = max(15, int(os.getenv("ASR_TIMEOUT_SECONDS", "45")))
FASTER_WHISPER_MODEL = os.getenv("FASTER_WHISPER_MODEL", "tiny.en").strip() or "tiny.en"
FASTER_WHISPER_MODEL = os.getenv("FASTER_WHISPER_MODEL", "base").strip() or "base"
FASTER_WHISPER_DEVICE = os.getenv("FASTER_WHISPER_DEVICE", "cpu").strip() or "cpu"
FASTER_WHISPER_COMPUTE_TYPE = os.getenv("FASTER_WHISPER_COMPUTE_TYPE", "int8").strip() or "int8"
LOCAL_ASR_BIN = os.getenv("LOCAL_ASR_BIN", "").strip()
@@ -77,6 +79,20 @@ TRANSLATE_MODEL = os.getenv("TRANSLATE_MODEL", "gemini-2.5-flash")
DEFAULT_GPT_TEXT_MODEL = os.getenv("GPT_TEXT_MODEL", "gpt-4o").strip() or "gpt-4o"
ASR_AUTO_LANGUAGE_VALUES = {"", "auto", "detect", "multilingual", "multi"}
def _asr_language_hint() -> str:
language = ASR_LANGUAGE.strip()
if language.lower() in ASR_AUTO_LANGUAGE_VALUES:
return ""
return language
def _asr_language_label() -> str:
return _asr_language_hint() or "auto"
def gpt_model_env(name: str, default: str | None = None) -> str:
value = os.getenv(name, default or DEFAULT_GPT_TEXT_MODEL).strip()
if not value or value.lower().startswith("gemini-"):
@@ -2809,7 +2825,7 @@ def _clean_asr_segments(segments: list[dict], duration: float) -> list[dict]:
def _segment_text_key(text: str) -> str:
return re.sub(r"[^a-z0-9]+", " ", text.lower()).strip()
return re.sub(r"[^\w]+", " ", text.casefold(), flags=re.UNICODE).strip()
def _validate_asr_segments(segments: list[dict], duration: float, source: str) -> list[dict]:
@@ -2907,19 +2923,22 @@ def _transcribe_faster_whisper_sync(wav: Path) -> list[dict]:
device=FASTER_WHISPER_DEVICE,
compute_type=FASTER_WHISPER_COMPUTE_TYPE,
)
raw_segments, _info = model.transcribe(
str(wav.resolve()),
language="en",
beam_size=1,
vad_filter=True,
condition_on_previous_text=False,
)
language_hint = _asr_language_hint()
transcribe_options = {
"beam_size": 1,
"vad_filter": True,
"condition_on_previous_text": False,
}
if language_hint:
transcribe_options["language"] = language_hint
raw_segments, _info = model.transcribe(str(wav.resolve()), **transcribe_options)
detected_language = str(getattr(_info, "language", "") or language_hint or "auto")
segments = [
{"start": float(seg.start), "end": float(seg.end), "text": str(seg.text or "").strip()}
for seg in raw_segments
if str(seg.text or "").strip()
]
return _validate_asr_segments(segments, duration, f"faster-whisper:{FASTER_WHISPER_MODEL}")
return _validate_asr_segments(segments, duration, f"faster-whisper:{FASTER_WHISPER_MODEL}:{detected_language}")
def _transcribe_gemini_sync(wav: Path) -> list[dict]:
@@ -2929,8 +2948,9 @@ def _transcribe_gemini_sync(wav: Path) -> list[dict]:
"Transcribe the attached audio. Return strict JSON only, no markdown. "
"If you cannot truly hear the audio, return {\"can_hear\": false}. Do not guess. "
"If you can hear it, return {\"can_hear\": true, \"segments\": "
"[{\"start\": 0.0, \"end\": 1.2, \"text\": \"English transcript\"}]}. "
"Use English for the transcript. Only include timestamps you can infer from the audio."
"[{\"start\": 0.0, \"end\": 1.2, \"text\": \"original-language transcript\"}]}. "
"Keep the transcript in the spoken source language; do not translate it here. "
"Only include timestamps you can infer from the audio."
)
last_error: Exception | None = None
for attempt in range(3):
@@ -2959,19 +2979,21 @@ def _transcribe_sync(wav: Path) -> list[dict]:
if ASR_REMOTE_ENABLED:
try:
with wav.open("rb") as f:
language_hint = _asr_language_hint()
resp = asr_llm().with_options(timeout=ASR_TIMEOUT_SECONDS).audio.transcriptions.create(
file=(wav.name, f, "audio/wav"),
model=ASR_MODEL,
response_format="verbose_json",
timestamp_granularities=["segment"],
**({"language": ASR_LANGUAGE} if ASR_LANGUAGE else {}),
**({"language": language_hint} if language_hint else {}),
)
raw = resp.model_dump() if hasattr(resp, "model_dump") else resp
segments = raw.get("segments") or []
# 兜底:网关如果不返回 segments把全文当一段
if not segments and raw.get("text"):
segments = [{"start": 0.0, "end": float(raw.get("duration", 0) or 0), "text": raw["text"]}]
return _validate_asr_segments(segments, duration, ASR_MODEL)
detected_language = str(raw.get("language") or language_hint or "auto")
return _validate_asr_segments(segments, duration, f"{ASR_MODEL}:{detected_language}")
except Exception as e:
errors.append(f"{ASR_MODEL}: {e}")
else:
@@ -2999,11 +3021,13 @@ def _transcribe_sync(wav: Path) -> list[dict]:
def _translate_sync(segments: list[dict]) -> list[str]:
"""批量翻译为中文,按段返回"""
payload = [{"i": i, "en": s.get("text", "").strip()} for i, s in enumerate(segments)]
payload = [{"i": i, "text": s.get("text", "").strip()} for i, s in enumerate(segments)]
prompt = (
"你是字幕翻译。把下列英文字幕段翻译为简体中文,保持原意、口语化、自然流畅。"
"严格返回 JSON 数组,不要任何 markdown 或多余文字schema: "
'[{"i": 0, "zh": "..."}, ...]\n\n输入:\n'
"你是多语言字幕翻译。把下列原语言字幕段翻译为简体中文"
"如果原文已经是中文,只做简体中文规范化和口语化整理,不要改写意思。"
"保持原意、口语化、自然流畅。"
"严格返回 JSON object不要任何 markdown 或多余文字schema: "
'{"translations":[{"i": 0, "zh": "..."}]}\n\n输入:\n'
+ json.dumps(payload, ensure_ascii=False)
)
try:
@@ -3430,7 +3454,7 @@ def pipeline_transcribe(job_id: str, manage_job_status: bool = True) -> None:
return
# 1) whisper ASR
progress(f"{ASR_MODEL} 转录中…", 78)
progress(f"{ASR_MODEL} {_asr_language_label()} 语种转录中…", 78)
segments = _transcribe_sync(wav)
if not segments:
raise TranscriptionUnavailable("ASR 未返回可用字幕段")
@@ -4492,7 +4516,7 @@ def health() -> dict:
"voice_base_url": AZURE_OPENAI_BASE_URL,
"models": {
"asr": ASR_MODEL,
"asr_language": ASR_LANGUAGE,
"asr_language": _asr_language_label(),
"asr_base_url": ASR_BASE_URL or LLM_BASE_URL or "openai-default",
"asr_remote_enabled": ASR_REMOTE_ENABLED,
"asr_local_fallback_enabled": ASR_LOCAL_FALLBACK_ENABLED,
@@ -8011,6 +8035,418 @@ def copy_character_library_assets(job_id: str, req: CopyCharacterLibraryAssetReq
}
class AgentRunLog(BaseModel):
ts: float
level: Literal["info", "warn", "error"] = "info"
message: str
class AgentRun(BaseModel):
id: str
job_id: str
status: Literal["draft", "queued", "executing", "reviewing", "completed", "failed"] = "queued"
stage: str = "queued"
progress: int = 0
logs: list[AgentRunLog] = Field(default_factory=list)
video_ids: list[str] = Field(default_factory=list)
final_video_url: str = ""
contact_sheet_url: str = ""
error: str = ""
created_at: float = Field(default_factory=time.time)
updated_at: float = Field(default_factory=time.time)
AGENT_RUNS: dict[str, AgentRun] = {}
AGENT_DEFAULT_PRODUCT_IDS = [
"desktop-skg-product-angle-01",
"desktop-skg-product-angle-02",
"desktop-skg-product-angle-03",
"desktop-skg-product-angle-04",
]
AGENT_DEFAULT_CHARACTER_ID = os.getenv("AGENT_DEFAULT_CHARACTER_ID", "character-02").strip() or "character-02"
AGENT_SHOT_COUNT = max(8, min(12, int(os.getenv("AGENT_SHOT_COUNT", "12"))))
AGENT_SHOT_DURATION_SECONDS = max(4.0, min(8.0, float(os.getenv("AGENT_SHOT_DURATION_SECONDS", "5"))))
AGENT_VIDEO_TIMEOUT_SECONDS = max(300, int(os.getenv("AGENT_VIDEO_TIMEOUT_SECONDS", "1500")))
def agent_run_dir(run_id: str) -> Path:
return AGENT_RUNS_DIR / run_id
def agent_run_path(run_id: str) -> Path:
return agent_run_dir(run_id) / "state.json"
def save_agent_run(run: AgentRun) -> None:
run.updated_at = time.time()
d = agent_run_dir(run.id)
d.mkdir(parents=True, exist_ok=True)
agent_run_path(run.id).write_text(run.model_dump_json(indent=2), encoding="utf-8")
AGENT_RUNS[run.id] = run
def agent_log(
run: AgentRun,
message: str,
*,
stage: str | None = None,
progress: int | None = None,
status: Literal["draft", "queued", "executing", "reviewing", "completed", "failed"] | None = None,
level: Literal["info", "warn", "error"] = "info",
) -> None:
if stage is not None:
run.stage = stage
if progress is not None:
run.progress = max(0, min(100, int(progress)))
if status is not None:
run.status = status
run.logs = (run.logs + [AgentRunLog(ts=time.time(), level=level, message=message)])[-240:]
save_agent_run(run)
async def save_agent_product_upload(job_id: str, upload: UploadFile, index: int) -> dict:
if not upload.filename:
raise HTTPException(400, "product image filename required")
content_type = (upload.content_type or "").lower()
suffix = Path(upload.filename).suffix.lower()
if content_type and not content_type.startswith("image/"):
raise HTTPException(400, f"product image must be image/*, got {content_type}")
if not content_type and suffix not in {".jpg", ".jpeg", ".png", ".webp", ".bmp"}:
raise HTTPException(400, f"unsupported product image: {suffix}")
out_dir = job_dir(job_id) / "assets"
out_dir.mkdir(parents=True, exist_ok=True)
asset_id = uuid.uuid4().hex[:12]
tmp = out_dir / f"{asset_id}.upload"
out = out_dir / f"{asset_id}.jpg"
try:
await _save_upload_to_path(upload, tmp)
meta = normalize_product_asset_image(tmp, out)
except Exception as e:
try:
out.unlink()
except OSError:
pass
raise HTTPException(400, f"product upload failed: {e}")
finally:
try:
tmp.unlink()
except OSError:
pass
return {
"kind": "asset",
"frame_idx": -1,
"element_id": asset_id,
"cutout_id": asset_id,
"label": f"用户产品图 {index} · {upload.filename}",
"asset_meta": meta,
}
def agent_fallback_product_refs(job_id: str) -> list[dict]:
refs: list[dict] = []
for product_id in AGENT_DEFAULT_PRODUCT_IDS:
try:
refs.append(copy_product_library_asset(job_id, CopyProductLibraryAssetReq(product_id=product_id)))
except Exception:
continue
return refs
def agent_subject_refs(job_id: str) -> list[dict]:
try:
payload = copy_character_library_assets(job_id, CopyCharacterLibraryAssetReq(character_id=AGENT_DEFAULT_CHARACTER_ID))
except Exception:
return []
images = payload.get("images") or []
preferred = []
for ref in images:
label = str(ref.get("label") or "")
if any(key in label for key in ("正面", "左45", "半身近景", "侧面")):
preferred.append(ref)
return (preferred or images)[:4]
def agent_base_prompt() -> str:
return (
"Vertical 9:16 original SKG short-form ad. Do not copy the real person from the source video. "
"Use the provided transparent anatomy subject as the recurring character when a person is needed. "
"Use the provided SKG white U-shaped neck-and-shoulder massager product references as rigid product truth: "
"one clean U-shaped wearable device, silver contact pads, red heat/light accents, premium white shell, correct scale around the neck and shoulders. "
"No captions, no platform UI, no watermark, no medical treatment claims. Natural creator-demo pacing, clean premium lighting."
)
def agent_shot_plan() -> list[dict]:
base = agent_base_prompt()
shots = [
("hook", "Hook close-up: transparent anatomy character faces camera and raises the SKG neck-and-shoulder massager into the foreground, fast creator-ad opening energy, clean blue-white studio background."),
("pain", "Pain-point scene: the character sits at a desk after long screen work, shoulders tense, then notices the SKG massager beside the laptop; show neck and shoulder area clearly."),
("product_macro", "Macro product detail: slow moving close-up across the SKG U-shaped device, buttons, inner massage nodes, silver pads, premium white plastic and red heat accents."),
("wear", "Wear demo: the character places the SKG U-shaped massager externally around the back of the neck and upper shoulders, hands guiding both arms into position."),
("contact", "Heat/contact moment: close-up of silver massage pads aligned with side neck and upper trapezius, subtle red warmth glow, product outside the transparent body, no clipping."),
("office_use", "Office use beat: the character works calmly at a desk while wearing the SKG massager, small relief gesture, device stable and visible around neck and shoulders."),
("living_room", "Comfort beat: relaxed home setting, character leans back slightly, SKG device running, premium wellness mood, smooth gentle camera drift."),
("angle_proof", "Product angle proof: clean tabletop shot with the SKG U-shaped massager rotating or being lifted by hand, show thickness, contact pads, seams, and control button."),
("mobility", "Daily mobility scene: character walks from desk to sofa wearing the SKG massager, lightweight lifestyle demonstration, product silhouette remains accurate."),
("benefit", "Benefit visualization: transparent anatomy view emphasizes neck and shoulder contact zones with tasteful red warmth accents while the device stays opaque and external."),
("packaging", "Brand proof shot: SKG product and packaging on a clean surface, hand picks up the device, premium white product photography look, no extra text overlays."),
("cta", "Ending CTA: character faces camera wearing the SKG massager, then the final frame lands on a clean product hero angle with confident premium ad finish."),
]
return [{"key": key, "prompt": f"{base}\n\nShot direction: {text}"} for key, text in shots[:AGENT_SHOT_COUNT]]
def agent_reference_for_shot(shot_key: str, product_refs: list[dict], subject_refs: list[dict]) -> tuple[dict | None, str]:
product_first = {"product_macro", "angle_proof", "packaging"}
if shot_key in product_first and product_refs:
return product_refs[min(2, len(product_refs) - 1)], "reference_image"
if subject_refs:
if shot_key in {"contact", "benefit"} and len(subject_refs) > 1:
return subject_refs[min(1, len(subject_refs) - 1)], "reference_image"
return subject_refs[0], "reference_image"
if product_refs:
return product_refs[0], "reference_image"
return None, "reference_image"
def agent_get_video(job_id: str, video_id: str) -> GeneratedVideo | None:
job = JOBS.get(job_id)
if not job:
return None
return next((item for item in job.generated_videos if item.id == video_id), None)
def agent_wait_videos(run: AgentRun, ids: list[str], *, target_completed: int) -> list[str]:
deadline = time.time() + AGENT_VIDEO_TIMEOUT_SECONDS
last_summary = ""
while time.time() < deadline:
completed: list[str] = []
active = 0
failed = 0
for video_id in ids:
item = agent_get_video(run.job_id, video_id)
if not item:
active += 1
continue
if item.status == "completed" and item.url:
completed.append(video_id)
elif item.status == "failed":
failed += 1
else:
active += 1
summary = f"视频生成中 · 完成 {len(completed)}/{target_completed} · 运行 {active} · 失败 {failed}"
if summary != last_summary:
agent_log(run, summary, stage="execute", progress=58 + min(24, len(completed) * 2))
last_summary = summary
if len(completed) >= target_completed or active == 0:
return completed
time.sleep(6)
return [video_id for video_id in ids if (agent_get_video(run.job_id, video_id) and agent_get_video(run.job_id, video_id).status == "completed")]
def agent_submit_shot(
run: AgentRun,
frame: KeyFrame,
shot: dict,
product_refs: list[dict],
subject_refs: list[dict],
retry: int = 0,
) -> str:
first_ref, primary_role = agent_reference_for_shot(str(shot["key"]), product_refs, subject_refs)
if not first_ref:
raise RuntimeError("no reference image available for video generation")
job = JOBS[run.job_id]
prompt = str(shot["prompt"])
if retry:
prompt += f"\n\nRetry pass {retry}: keep the same idea but simplify motion, keep the product shape stable, avoid strange anatomy or deformed product."
req = GenerateStoryboardVideoReq(
prompt=prompt,
duration=AGENT_SHOT_DURATION_SECONDS,
count=1,
storyboard_row_idx=len(run.video_ids),
first_image=first_ref,
product_images=product_refs[:6],
subject_images=subject_refs[:4],
model="seedance",
size="720x1280",
)
# _enqueue_storyboard_videos derives the primary role from first_image. Keep the
# local variable above for future provider-specific tuning without changing API.
_ = primary_role
ids = _enqueue_storyboard_videos(job, frame, req, None)
return ids[0]
def agent_compose_final(agent: AgentRun, ordered_ids: list[str]) -> None:
d = agent_run_dir(agent.id)
d.mkdir(parents=True, exist_ok=True)
final_dir = job_dir(agent.job_id) / "final"
final_dir.mkdir(parents=True, exist_ok=True)
final = final_dir / f"agent-{agent.id}.mp4"
concat_file = d / "concat.txt"
paths: list[Path] = []
for video_id in ordered_ids:
p = job_dir(agent.job_id) / "storyboard_videos" / video_id / "video.mp4"
if p.exists() and p.stat().st_size > 0:
paths.append(p.resolve())
if not paths:
raise RuntimeError("no completed video files to compose")
concat_file.write_text("".join(f"file '{str(p).replace(chr(39), chr(39) + chr(92) + chr(39) + chr(39))}'\n" for p in paths), encoding="utf-8")
try:
run_cmd = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(concat_file), "-c", "copy", "-movflags", "+faststart", str(final)]
run(run_cmd)
except Exception:
run_cmd = [
"ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(concat_file),
"-vf", "scale=720:1280,setsar=1", "-r", "24", "-c:v", "mpeg4", "-q:v", "4",
"-c:a", "aac", "-b:a", "160k", "-movflags", "+faststart", str(final),
]
run(run_cmd)
contact = d / "contact.jpg"
try:
run([
"ffmpeg", "-y", "-i", str(final),
"-vf", "select='not(mod(n,120))',scale=180:320,tile=12x1",
"-frames:v", "1", str(contact),
])
agent.contact_sheet_url = f"/agent-runs/{agent.id}/contact.jpg"
except Exception as e:
agent_log(agent, f"抽帧审片图生成失败:{str(e)[:180]}", level="warn")
agent.final_video_url = f"/agent-runs/{agent.id}/final.mp4"
save_agent_run(agent)
def agent_run_worker(run_id: str, product_refs: list[dict]) -> None:
run = AGENT_RUNS[run_id]
try:
agent_log(run, "接管任务:创建 1 分钟二创出片流程", status="executing", stage="download", progress=4)
pipeline_download(run.job_id)
job = JOBS[run.job_id]
if job.status == "failed":
raise RuntimeError(job.error or job.message or "source video download failed")
agent_log(run, f"源视频就绪 · {job.duration:.1f}s · {job.width}x{job.height}", stage="download", progress=14)
refs = product_refs[:6] or agent_fallback_product_refs(run.job_id)
if not refs:
raise RuntimeError("需要至少 1 张产品图")
update(job, product_refs=refs, message=f"Agent 已接入产品图 · {len(refs)}")
agent_log(run, f"产品素材就绪 · {len(refs)}", stage="assets", progress=20)
subject_refs = agent_subject_refs(run.job_id)
if subject_refs:
agent_log(run, f"主体参考就绪 · {len(subject_refs)} 张透明骨架角色", stage="assets", progress=24)
else:
agent_log(run, "未找到主体角色库,改用产品图和文本约束生成", stage="assets", progress=24, level="warn")
agent_log(run, "抽取源视频节奏帧 · 12 张", stage="analyze", progress=28)
pipeline_analyze(run.job_id, frame_count=12, target="transparent_human", mode="replace", quality="auto")
job = JOBS[run.job_id]
if not job.frames:
raise RuntimeError(job.error or "keyframe extraction failed")
agent_log(run, f"节奏帧完成 · {len(job.frames)}", stage="plan", progress=40)
shots = agent_shot_plan()
agent_log(run, f"生成二创镜头计划 · {len(shots)}× {AGENT_SHOT_DURATION_SECONDS:g}s", stage="plan", progress=46)
submitted: list[str] = []
for idx, shot in enumerate(shots):
frame = job.frames[idx % len(job.frames)]
video_id = agent_submit_shot(run, frame, shot, refs, subject_refs)
submitted.append(video_id)
run.video_ids = submitted
save_agent_run(run)
agent_log(run, f"提交镜头 {idx + 1:02d}/{len(shots)} · {shot['key']} · {video_id}", stage="execute", progress=48 + idx)
completed = agent_wait_videos(run, submitted, target_completed=len(shots))
failed_positions = [i for i, video_id in enumerate(submitted) if video_id not in completed]
if failed_positions:
agent_log(run, f"{len(failed_positions)} 段未完成,自动重跑一次", stage="execute", progress=82, level="warn")
for pos in failed_positions:
frame = job.frames[pos % len(job.frames)]
retry_id = agent_submit_shot(run, frame, shots[pos], refs, subject_refs, retry=1)
submitted[pos] = retry_id
run.video_ids = submitted
save_agent_run(run)
agent_log(run, f"重跑镜头 {pos + 1:02d} · {retry_id}", stage="execute", progress=83)
completed = agent_wait_videos(run, submitted, target_completed=len(shots))
ordered_completed = [video_id for video_id in submitted if video_id in completed]
if len(ordered_completed) < max(8, len(shots) - 2):
raise RuntimeError(f"可用镜头不足:{len(ordered_completed)}/{len(shots)}")
agent_log(run, f"自动审片通过 · 可用 {len(ordered_completed)}/{len(shots)}", status="reviewing", stage="review", progress=88)
agent_log(run, "合成最终成片", stage="compose", progress=92)
agent_compose_final(run, ordered_completed)
agent_log(run, f"成片完成 · {len(ordered_completed)}", status="completed", stage="final", progress=100)
except Exception as e:
run.error = str(e)[:600]
agent_log(run, f"任务失败:{run.error}", status="failed", stage="failed", progress=100, level="error")
@app.post("/agent-runs", response_model=AgentRun)
async def create_agent_run(
tk_url: str = Form(...),
product_files: list[UploadFile] | None = File(None),
) -> AgentRun:
if not tk_url.strip():
raise HTTPException(400, "tk_url required")
job_id = uuid.uuid4().hex[:12]
run_id = uuid.uuid4().hex[:12]
job = Job(id=job_id, url=tk_url.strip())
JOBS[job_id] = job
save_state(job)
refs: list[dict] = []
for index, upload in enumerate((product_files or [])[:6], start=1):
refs.append(await save_agent_product_upload(job_id, upload, index))
run = AgentRun(id=run_id, job_id=job_id, status="queued", stage="queued", progress=1)
save_agent_run(run)
agent_log(run, f"任务已入队 · job={job_id} · 产品图 {len(refs)}", status="queued", stage="queued", progress=1)
threading.Thread(target=agent_run_worker, args=(run_id, refs), daemon=True).start()
return run
@app.get("/agent-runs", response_model=list[AgentRun])
def list_agent_runs(limit: int = 20) -> list[AgentRun]:
for p in AGENT_RUNS_DIR.iterdir():
if p.is_dir() and (p / "state.json").exists() and p.name not in AGENT_RUNS:
try:
AGENT_RUNS[p.name] = AgentRun.model_validate_json((p / "state.json").read_text(encoding="utf-8"))
except Exception:
pass
items = list(AGENT_RUNS.values())
items.sort(key=lambda item: item.updated_at, reverse=True)
return items[:max(1, min(100, limit))]
@app.get("/agent-runs/{run_id}", response_model=AgentRun)
def get_agent_run(run_id: str) -> AgentRun:
run = AGENT_RUNS.get(run_id)
if not run and agent_run_path(run_id).exists():
run = AgentRun.model_validate_json(agent_run_path(run_id).read_text(encoding="utf-8"))
AGENT_RUNS[run_id] = run
if not run:
raise HTTPException(404, "agent run not found")
return run
@app.get("/agent-runs/{run_id}/final.mp4")
def get_agent_run_final(run_id: str):
run = get_agent_run(run_id)
p = job_dir(run.job_id) / "final" / f"agent-{run.id}.mp4"
if not p.exists():
raise HTTPException(404, "final video not found")
return FileResponse(p, media_type="video/mp4")
@app.get("/agent-runs/{run_id}/contact.jpg")
def get_agent_run_contact(run_id: str):
p = agent_run_dir(run_id) / "contact.jpg"
if not p.exists():
raise HTTPException(404, "contact sheet not found")
return FileResponse(p, media_type="image/jpeg")
def product_image_alpha(img: Image.Image) -> Image.Image:
rgba = img.convert("RGBA")
rgb = rgba.convert("RGB")

View File

@@ -0,0 +1,3 @@
{
"items": []
}

View File

@@ -10,11 +10,13 @@ services:
- ./deploy/.env.production
environment:
JOBS_DIR: /data/jobs
AGENT_RUNS_DIR: /data/agent_runs
ASSET_LIBRARY_DIR: /data/asset_library
PROMPT_LIBRARY_DIR: /data/prompt_library
CORS_ORIGINS: https://marketing.skg.com
volumes:
- ./data/jobs:/data/jobs
- ./data/agent_runs:/data/agent_runs
- ./data/asset_library:/data/asset_library
- ./data/prompt_library:/data/prompt_library
- ./data/_trash:/data/_trash

View File

@@ -0,0 +1,47 @@
name: skg-agent-cut
services:
api:
build:
context: .
dockerfile: Dockerfile.api
container_name: skg-agent-api
env_file:
- ./deploy/.env.production
environment:
JOBS_DIR: /data/jobs
AGENT_RUNS_DIR: /data/agent_runs
ASSET_LIBRARY_DIR: /data/asset_library
PROMPT_LIBRARY_DIR: /data/prompt_library
CORS_ORIGINS: http://2.24.28.41:4290,http://localhost:4290
volumes:
- ./data/jobs:/data/jobs
- ./data/agent_runs:/data/agent_runs
- ./data/asset_library:/data/asset_library
- ./data/prompt_library:/data/prompt_library
- ./data/_trash:/data/_trash
- ./secrets/tiktok_cookies.txt:/run/secrets/tiktok_cookies.txt
restart: unless-stopped
networks:
skg-agent-internal:
aliases:
- skg-marketing-api
web:
build:
context: .
dockerfile: Dockerfile.web
args:
NEXT_PUBLIC_API_BASE: /api
container_name: skg-agent-web
depends_on:
- api
ports:
- "0.0.0.0:4290:80"
restart: unless-stopped
networks:
- skg-agent-internal
networks:
skg-agent-internal:
name: skg-agent-internal

File diff suppressed because one or more lines are too long

332
web/app/agent/page.tsx Normal file
View File

@@ -0,0 +1,332 @@
"use client"
import { useEffect, useMemo, useRef, useState } from "react"
import {
ArrowDownToLine,
CheckCircle2,
CircleAlert,
Film,
ImagePlus,
Link2,
Loader2,
Play,
RotateCcw,
TerminalSquare,
Upload,
} from "lucide-react"
const API_BASE = process.env.NEXT_PUBLIC_API_BASE ?? "http://localhost:4291"
type AgentRunLog = {
ts: number
level: "info" | "warn" | "error"
message: string
}
type AgentRun = {
id: string
job_id: string
status: "draft" | "queued" | "executing" | "reviewing" | "completed" | "failed"
stage: string
progress: number
logs: AgentRunLog[]
video_ids: string[]
final_video_url: string
contact_sheet_url: string
error: string
created_at: number
updated_at: number
}
const STAGES = [
{ key: "download", label: "下载" },
{ key: "assets", label: "素材" },
{ key: "analyze", label: "拆解" },
{ key: "plan", label: "规划" },
{ key: "execute", label: "生成" },
{ key: "review", label: "审片" },
{ key: "compose", label: "合成" },
{ key: "final", label: "成片" },
]
function formatClock(ts: number) {
if (!ts) return "--:--:--"
return new Date(ts * 1000).toLocaleTimeString("zh-CN", { hour12: false })
}
function runVideoUrl(run: AgentRun | null) {
if (!run?.final_video_url) return ""
return `${API_BASE}${run.final_video_url}`
}
function runContactUrl(run: AgentRun | null) {
if (!run?.contact_sheet_url) return ""
return `${API_BASE}${run.contact_sheet_url}`
}
export default function AgentPage() {
const [url, setUrl] = useState("")
const [files, setFiles] = useState<File[]>([])
const [run, setRun] = useState<AgentRun | null>(null)
const [recent, setRecent] = useState<AgentRun[]>([])
const [submitting, setSubmitting] = useState(false)
const [error, setError] = useState("")
const terminalRef = useRef<HTMLDivElement>(null)
const previews = useMemo(() => files.map((file) => ({ file, url: URL.createObjectURL(file) })), [files])
useEffect(() => () => previews.forEach((item) => URL.revokeObjectURL(item.url)), [previews])
useEffect(() => {
fetch(`${API_BASE}/agent-runs?limit=8`, { cache: "no-store" })
.then((res) => (res.ok ? res.json() : []))
.then((items: AgentRun[]) => {
setRecent(items)
const latest = items.find((item) => item.status === "executing" || item.status === "reviewing" || item.status === "completed")
if (latest) setRun(latest)
})
.catch(() => undefined)
}, [])
useEffect(() => {
if (!run || run.status === "completed" || run.status === "failed") return
const timer = window.setInterval(async () => {
try {
const res = await fetch(`${API_BASE}/agent-runs/${run.id}`, { cache: "no-store" })
if (!res.ok) return
const next = await res.json()
setRun(next)
} catch {
/* keep current state */
}
}, 2000)
return () => window.clearInterval(timer)
}, [run?.id, run?.status])
useEffect(() => {
const el = terminalRef.current
if (el) el.scrollTop = el.scrollHeight
}, [run?.logs.length])
async function submit() {
setError("")
if (!url.trim()) {
setError("需要 TikTok 链接")
return
}
setSubmitting(true)
try {
const form = new FormData()
form.append("tk_url", url.trim())
files.slice(0, 6).forEach((file) => form.append("product_files", file))
const res = await fetch(`${API_BASE}/agent-runs`, { method: "POST", body: form })
if (!res.ok) {
const text = await res.text().catch(() => "")
throw new Error(text.slice(0, 260) || `HTTP ${res.status}`)
}
const created = await res.json()
setRun(created)
setRecent((prev) => [created, ...prev.filter((item) => item.id !== created.id)].slice(0, 8))
} catch (e) {
setError(e instanceof Error ? e.message : String(e))
} finally {
setSubmitting(false)
}
}
const activeStageIndex = run ? Math.max(0, STAGES.findIndex((item) => item.key === run.stage)) : -1
const canStart = !!url.trim() && !submitting
const videoSrc = runVideoUrl(run)
const contactSrc = runContactUrl(run)
return (
<main className="min-h-screen bg-[#f3f4f7] text-[#111318]">
<div className="mx-auto flex min-h-screen w-full max-w-[1720px] flex-col gap-5 px-5 py-5">
<header className="flex items-center justify-between rounded-[28px] border border-black/5 bg-white/80 px-5 py-4 shadow-[0_24px_80px_rgba(20,25,38,0.08)] backdrop-blur-xl">
<div>
<div className="text-[12px] font-semibold uppercase tracking-[0.18em] text-[#7b8190]">SKG Agent Cut</div>
<h1 className="mt-1 text-[26px] font-semibold tracking-normal text-[#111318]"></h1>
</div>
<div className="hidden items-center gap-2 rounded-full bg-[#111318] px-3 py-2 text-[12px] font-medium text-white md:flex">
<TerminalSquare className="h-4 w-4 text-[#81d4ff]" />
{run ? `${run.status} · ${run.progress}%` : "standby"}
</div>
</header>
<section className="grid min-h-[calc(100vh-128px)] grid-cols-1 gap-5 xl:grid-cols-[390px_minmax(520px,1fr)_420px]">
<aside className="flex flex-col gap-4 rounded-[30px] border border-black/5 bg-white/85 p-4 shadow-[0_24px_80px_rgba(20,25,38,0.08)] backdrop-blur-xl">
<div className="rounded-[24px] border border-[#dfe3ea] bg-[#f8f9fb] p-4">
<label className="mb-2 flex items-center gap-2 text-[13px] font-semibold text-[#2b3038]">
<Link2 className="h-4 w-4 text-[#0a84ff]" />
TikTok
</label>
<textarea
value={url}
onChange={(e) => setUrl(e.target.value)}
placeholder="https://www.tiktok.com/@..."
className="h-28 w-full resize-none rounded-[18px] border border-[#d9dee8] bg-white px-4 py-3 text-[14px] leading-relaxed text-[#111318] outline-none transition focus:border-[#0a84ff] focus:ring-4 focus:ring-[#0a84ff]/10"
/>
</div>
<div className="rounded-[24px] border border-[#dfe3ea] bg-[#f8f9fb] p-4">
<label className="mb-3 flex items-center gap-2 text-[13px] font-semibold text-[#2b3038]">
<ImagePlus className="h-4 w-4 text-[#34c759]" />
</label>
<label className="flex h-32 cursor-pointer flex-col items-center justify-center rounded-[20px] border border-dashed border-[#c7ceda] bg-white text-center transition hover:border-[#0a84ff] hover:bg-[#f7fbff]">
<Upload className="mb-2 h-6 w-6 text-[#7b8190]" />
<span className="text-[13px] font-medium text-[#2b3038]"></span>
<span className="mt-1 text-[12px] text-[#7b8190]"> 6 </span>
<input
type="file"
accept="image/*"
multiple
className="hidden"
onChange={(e) => {
const next = Array.from(e.target.files ?? []).slice(0, 6)
setFiles(next)
}}
/>
</label>
{previews.length > 0 && (
<div className="mt-3 grid grid-cols-3 gap-2">
{previews.map((item) => (
<div key={`${item.file.name}-${item.file.size}`} className="aspect-square overflow-hidden rounded-[14px] border border-black/5 bg-white">
<img src={item.url} alt={item.file.name} className="h-full w-full object-contain" />
</div>
))}
</div>
)}
</div>
{error && (
<div className="rounded-[18px] border border-[#ff453a]/20 bg-[#ff453a]/10 px-4 py-3 text-[13px] text-[#9f1d17]">
{error}
</div>
)}
<button
type="button"
disabled={!canStart}
onClick={submit}
className="flex h-14 items-center justify-center gap-2 rounded-[20px] bg-[#111318] text-[15px] font-semibold text-white shadow-[0_16px_40px_rgba(17,19,24,0.18)] transition hover:bg-black disabled:cursor-not-allowed disabled:bg-[#b8bec8]"
>
{submitting ? <Loader2 className="h-5 w-5 animate-spin" /> : <Play className="h-5 w-5" />}
</button>
<div className="mt-auto rounded-[24px] border border-[#dfe3ea] bg-[#f8f9fb] p-3">
<div className="mb-2 text-[12px] font-semibold text-[#7b8190]"></div>
<div className="space-y-2">
{recent.slice(0, 4).map((item) => (
<button
key={item.id}
type="button"
onClick={() => setRun(item)}
className="flex w-full items-center justify-between rounded-[16px] bg-white px-3 py-2 text-left text-[12px] text-[#2b3038] transition hover:bg-[#f1f5fb]"
>
<span className="font-medium">{item.id}</span>
<span className="text-[#7b8190]">{item.status}</span>
</button>
))}
</div>
</div>
</aside>
<section className="flex min-h-[680px] flex-col rounded-[30px] border border-black/5 bg-[#111318] p-4 shadow-[0_24px_80px_rgba(20,25,38,0.16)]">
<div className="mb-4 flex items-center justify-between">
<div className="flex items-center gap-3">
<div className="flex h-10 w-10 items-center justify-center rounded-[16px] bg-white/8">
<TerminalSquare className="h-5 w-5 text-[#81d4ff]" />
</div>
<div>
<h2 className="text-[16px] font-semibold text-white">Agent Terminal</h2>
<p className="text-[12px] text-white/45">{run ? `run ${run.id} · job ${run.job_id}` : "waiting for input"}</p>
</div>
</div>
{run?.status === "failed" ? (
<CircleAlert className="h-5 w-5 text-[#ff453a]" />
) : run?.status === "completed" ? (
<CheckCircle2 className="h-5 w-5 text-[#34c759]" />
) : (
<Loader2 className={`h-5 w-5 text-[#81d4ff] ${run ? "animate-spin" : ""}`} />
)}
</div>
<div className="mb-4 grid grid-cols-4 gap-2 lg:grid-cols-8">
{STAGES.map((stage, index) => {
const active = index <= activeStageIndex || run?.status === "completed"
return (
<div key={stage.key} className={`rounded-[14px] px-3 py-2 text-[12px] ${active ? "bg-white text-[#111318]" : "bg-white/6 text-white/40"}`}>
{stage.label}
</div>
)
})}
</div>
<div className="mb-4 h-2 overflow-hidden rounded-full bg-white/8">
<div className="h-full rounded-full bg-[#34c759] transition-all duration-700" style={{ width: `${run?.progress ?? 0}%` }} />
</div>
<div ref={terminalRef} className="min-h-0 flex-1 overflow-auto rounded-[22px] border border-white/8 bg-black px-4 py-4 font-mono text-[12px] leading-relaxed text-[#d8f3dc]">
{!run && <div className="text-white/35">$ idle</div>}
{run?.logs.map((log, index) => (
<div key={`${log.ts}-${index}`} className={log.level === "error" ? "text-[#ff8a80]" : log.level === "warn" ? "text-[#ffd166]" : "text-[#d8f3dc]"}>
<span className="text-white/30">[{formatClock(log.ts)}]</span> {log.message}
</div>
))}
</div>
</section>
<aside className="flex flex-col gap-4 rounded-[30px] border border-black/5 bg-white/85 p-4 shadow-[0_24px_80px_rgba(20,25,38,0.08)] backdrop-blur-xl">
<div className="flex items-center justify-between">
<div>
<div className="text-[12px] font-semibold uppercase tracking-[0.16em] text-[#7b8190]">Final</div>
<h2 className="mt-1 text-[18px] font-semibold text-[#111318]"></h2>
</div>
<Film className="h-5 w-5 text-[#ff9f0a]" />
</div>
<div className="aspect-[9/16] overflow-hidden rounded-[26px] border border-black/8 bg-[#111318]">
{videoSrc ? (
<video key={videoSrc} src={videoSrc} controls playsInline className="h-full w-full bg-black object-contain" />
) : (
<div className="flex h-full flex-col items-center justify-center gap-3 text-[#7b8190]">
<Film className="h-8 w-8" />
<span className="text-[13px]"></span>
</div>
)}
</div>
{contactSrc && (
<div className="overflow-hidden rounded-[18px] border border-black/8 bg-white">
<img src={contactSrc} alt="final contact sheet" className="w-full object-cover" />
</div>
)}
<div className="grid grid-cols-2 gap-2">
<a
href={videoSrc || undefined}
download
className={`flex h-11 items-center justify-center gap-2 rounded-[16px] text-[13px] font-semibold ${videoSrc ? "bg-[#0a84ff] text-white" : "pointer-events-none bg-[#dfe3ea] text-[#8d94a1]"}`}
>
<ArrowDownToLine className="h-4 w-4" />
</a>
<button
type="button"
onClick={() => {
setRun(null)
setError("")
}}
className="flex h-11 items-center justify-center gap-2 rounded-[16px] bg-[#eef1f6] text-[13px] font-semibold text-[#2b3038] transition hover:bg-[#e3e7ef]"
>
<RotateCcw className="h-4 w-4" />
</button>
</div>
</aside>
</section>
</div>
</main>
)
}

View File

@@ -3,7 +3,7 @@
import { type DragEvent as ReactDragEvent, type MouseEvent as ReactMouseEvent, type ReactNode, type RefObject, useEffect, useMemo, useRef, useState } from "react"
import { createPortal } from "react-dom"
import {
AlertTriangle, BookOpen, Check, ChevronDown, Circle, Film, FileText, Image as ImageIcon, Info, Link2, Loader2, Minus,
AlertTriangle, BookOpen, Check, ChevronDown, Circle, Download, Film, FileText, Image as ImageIcon, Info, Link2, Loader2, Minus,
MessageSquare, Mic, Moon, Package, PanelRight, Play, Plus, RefreshCw, Scissors, Send, Sparkles, Sun, Trash2, Upload, Wand2,
} from "lucide-react"
import { toast } from "sonner"
@@ -863,6 +863,17 @@ function videoSrc(video: GeneratedVideo) {
return apiAssetUrl(video.url)
}
function downloadMedia(url: string, filename: string) {
if (!url || typeof document === "undefined") return
const link = document.createElement("a")
link.href = url
link.download = filename
link.rel = "noreferrer"
document.body.appendChild(link)
link.click()
link.remove()
}
function audioPreview(job: Job | null) {
if (!job) return "粘贴 TK 链接或上传视频后,系统会先下载视频;下载完成后自动提取音频文案。"
const source = job.audio_script?.source_text?.trim() || job.audio_script?.source_zh?.trim()
@@ -1324,8 +1335,8 @@ function audioModelTrace(models?: RuntimeModels): ModelTraceSpec {
title: "音频解析",
model: modelList([models?.asr, models?.translate, models?.asr_fallback]),
chain: [
`ASR 转写:远端 ${remoteState},模型 ${modelValue(models?.asr)}${models?.asr_language ? `,语言 ${models.asr_language}` : ""};本机转写 ${localState},使用 ${localModel};多模态兜底${models?.asr_audio_fallback_enabled === false ? "关闭" : `${modelValue(models?.asr_fallback)}`},并拒绝假字幕/重复时间轴`,
`字幕翻译:${modelValue(models?.translate)} 按 ASR 段落输出中文;失败时保留原文时间轴,中文可为空`,
`ASR 转写:远端 ${remoteState},模型 ${modelValue(models?.asr)},语言 ${models?.asr_language || "auto"};本机转写 ${localState},使用 ${localModel} 自动识别中文/多语言;多模态兜底${models?.asr_audio_fallback_enabled === false ? "关闭" : `${modelValue(models?.asr_fallback)}`},并拒绝假字幕/重复时间轴`,
`字幕翻译:${modelValue(models?.translate)}原语言 ASR 段落输出中文;原文已是中文时保留为中文镜像,失败时保留原文时间轴`,
`讲话人 / 节奏 / 背景音:${modelValue(models?.asr_fallback)} 读取 audio.wav + 转写时间轴做多模态分析;失败时用本地时长/段落估算兜底`,
],
note: "点击“解析音频”后触发;开始任务下载完成后也会自动走这条链路。",
@@ -1373,7 +1384,7 @@ function scriptRewriteModelTrace(models?: RuntimeModels): ModelTraceSpec {
chain: [
`主改写:${modelValue(models?.audio_rewrite)} 根据原文案、当前分镜、作者想法生成新口播`,
`模型回退:依次尝试 ${modelValue(models?.asr_fallback)}${modelValue(models?.translate)};全部失败时用本地模板保留分镜可编辑`,
"返回结果写入当前分镜文案编辑框;点击保存规划后才写入 frame.storyboard.action",
"返回结果会先写入当前分镜文案编辑框;生成候选前会自动同步到分镜数据,无需单独保存视频",
],
}
}
@@ -1388,7 +1399,7 @@ function videoModelTrace(models: RuntimeModels | undefined, model: string): Mode
`服务商:${modelValue(models?.video_provider)} · ${modelValue(models?.video_base_url)}`,
"当前主工作台暂停直接提交视频;旧入口误触也会被页面层保护",
"开放后输入会包含已确认首尾帧、当前分镜文案、产品素材、相似主体资产和画面规划",
"输出为异步候选视频,完成后回填到对应分镜行Sora 已停用",
"输出为异步候选视频,提交后立即写入当前任务,完成后自动回填 mp4 到对应分镜行Sora 已停用",
],
}
}
@@ -5584,23 +5595,6 @@ function AudioStoryboardPlanPanel({
}
}
const selectVideoForRow = async (row: AudioStoryboardRow, frame: KeyFrame | null, videoId: string) => {
if (!job || !frame) return
const plannedRow = { ...planForRow(row, frame), skgCopy: copyForRow(row), skgCopyZh: copyZhForRow(row) }
try {
const legacyRowIndex = legacyRowIndexForFrame(frame.index)
const savedSceneForRow = storyboardSceneBelongsToRow(frame.storyboard, row.index, legacyRowIndex)
? frame.storyboard
: null
const scene = buildSceneForPlannedRow(plannedRow, frame, savedSceneForRow, videoId)
const updated = await updateStoryboard(job.id, frame.index, scene)
onJobUpdate?.(updated)
toast.success(`分镜 ${row.index + 1} 已选用该视频`)
} catch (e) {
toast.error("选用视频失败:" + (e instanceof Error ? e.message : String(e)))
}
}
const clearVideosForRow = (videos: GeneratedVideo[]) => {
if (!videos.length) return
for (const video of videos) onDeleteVideo?.(video.id)
@@ -5904,9 +5898,9 @@ function AudioStoryboardPlanPanel({
setStoryboardSaveBusyRow(row.index)
try {
await saveRowStoryboardDraft(row, frame)
toast.success("已保存本条三字段规划")
toast.success("本条三字段规划已同步")
} catch (e) {
toast.error("保存本条规划失败:" + (e instanceof Error ? e.message : String(e)))
toast.error("同步本条规划失败:" + (e instanceof Error ? e.message : String(e)))
} finally {
setStoryboardSaveBusyRow(null)
}
@@ -5918,7 +5912,7 @@ function AudioStoryboardPlanPanel({
.map((row) => ({ row: planForRow(row, referenceFrameForRow(row)), frame: referenceFrameForRow(row) }))
.filter((item): item is { row: AudioStoryboardRow; frame: KeyFrame } => !!item.frame)
if (!jobsToSubmit.length) {
if (!quiet) toast.warning("先完成前置抽帧,让每条分镜有可保存的承载位置")
if (!quiet) toast.warning("先完成前置抽帧,让每条分镜有可同步的承载位置")
return { ok: 0, failed: rows.length }
}
setBatchStoryboardSaveBusy(true)
@@ -5932,12 +5926,12 @@ function AudioStoryboardPlanPanel({
ok += 1
} catch (e) {
failed += 1
console.warn("批量保存分镜规划失败", item.row.index, e)
console.warn("批量同步分镜规划失败", item.row.index, e)
}
}
if (!quiet) {
if (failed) toast.warning(`保存 ${ok} 条规划,${failed} 条失败`)
else toast.success(`保存全部 ${ok} 条分镜规划`)
if (failed) toast.warning(`同步 ${ok} 条规划,${failed} 条失败`)
else toast.success(`同步全部 ${ok} 条分镜规划`)
}
} finally {
setStoryboardSaveBusyRow(null)
@@ -6307,10 +6301,11 @@ function AudioStoryboardPlanPanel({
type="button"
onClick={() => void saveAllStoryboardDrafts()}
disabled={batchStoryboardSaveBusy || !rows.length || !orderedFrames.length}
className="skg-primary-action inline-flex h-9 items-center justify-center gap-1 px-2.5 text-[11px] font-semibold transition disabled:cursor-not-allowed disabled:opacity-40"
title="只同步三字段规划;生成的视频会自动保存到当前任务"
className="skg-secondary-action inline-flex h-9 items-center justify-center gap-1 px-2.5 text-[11px] font-semibold transition disabled:cursor-not-allowed disabled:opacity-40"
>
{batchStoryboardSaveBusy ? <Loader2 className="h-3.5 w-3.5 animate-spin" /> : <Check className="h-3.5 w-3.5" />}
</button>
</div>
</div>
@@ -6468,7 +6463,6 @@ function AudioStoryboardPlanPanel({
job={job}
videos={rowVideos}
enabled={!!referenceFrame}
selectedVideoId={selectedVideoIdForRow(row, referenceFrame)}
busy={quickVideoBusyRow === row.index}
count={rowVideoCount}
onCountChange={(count) => patchRowVideoCount(row.index, count)}
@@ -6476,7 +6470,6 @@ function AudioStoryboardPlanPanel({
onReroll={() => void drawVideosForRow(plannedRow, referenceFrame, rowVideoCount)}
onRegenerate={() => void drawVideosForRow(plannedRow, referenceFrame, 1)}
onClear={() => clearVideosForRow(rowVideos)}
onSelect={(videoId) => void selectVideoForRow(plannedRow, referenceFrame, videoId)}
onDeleteVideo={onDeleteVideo}
/>
</div>
@@ -6697,7 +6690,6 @@ function AudioStoryboardPlanPanel({
videos={rowVideos}
enabled={!!referenceFrame}
expanded={videosOpen}
selectedVideoId={selectedVideoIdForRow(row, referenceFrame)}
busy={quickVideoBusyRow === row.index}
count={rowVideoCount}
onCountChange={(count) => patchRowVideoCount(row.index, count)}
@@ -6706,7 +6698,6 @@ function AudioStoryboardPlanPanel({
onReroll={() => void drawVideosForRow(plannedRow, referenceFrame, rowVideoCount)}
onRegenerate={() => void drawVideosForRow(plannedRow, referenceFrame, 1)}
onClear={() => clearVideosForRow(rowVideos)}
onSelect={(videoId) => void selectVideoForRow(plannedRow, referenceFrame, videoId)}
onDeleteVideo={onDeleteVideo}
/>
<div className="mt-1 flex items-center justify-between gap-2">
@@ -6720,10 +6711,11 @@ function AudioStoryboardPlanPanel({
type="button"
onClick={() => void saveSingleRowStoryboardDraft(plannedRow, referenceFrame)}
disabled={!referenceFrame || savingStoryboard}
className="skg-primary-action mt-1.5 inline-flex h-8 w-full items-center justify-center gap-1 px-2 text-[11px] font-semibold transition disabled:cursor-not-allowed disabled:opacity-40"
title="只同步本条三字段规划;生成的视频会自动保存到当前任务"
className="mt-1.5 inline-flex h-8 w-full items-center justify-center gap-1 rounded-md border border-white/10 bg-white/[0.04] px-2 text-[11px] font-semibold text-white/58 transition hover:border-cyan-300/35 hover:text-cyan-100 disabled:cursor-not-allowed disabled:opacity-40"
>
{savingStoryboard ? <Loader2 className="h-3.5 w-3.5 animate-spin" /> : <Check className="h-3.5 w-3.5" />}
</button>
</StoryboardPlanCell>
</div>
@@ -7062,7 +7054,6 @@ function StoryboardVideoSlots({
job,
videos,
enabled,
selectedVideoId = "",
busy = false,
count = 4,
onCountChange,
@@ -7070,14 +7061,12 @@ function StoryboardVideoSlots({
onReroll,
onRegenerate,
onClear,
onSelect,
onDeleteVideo,
}: {
job: Job
videos: GeneratedVideo[]
enabled: boolean
expanded?: boolean
selectedVideoId?: string
busy?: boolean
count?: number
onCountChange?: (count: number) => void
@@ -7086,12 +7075,10 @@ function StoryboardVideoSlots({
onReroll?: () => void
onRegenerate?: () => void
onClear?: () => void
onSelect?: (videoId: string) => void
onDeleteVideo?: (videoId: string) => void
}) {
const visible = videos
const runningCount = videos.filter((video) => video.status === "queued" || video.status === "in_progress").length
const selectedVideo = selectedVideoId ? videos.find((video) => video.id === selectedVideoId) : null
const targetCount = clampVideoCount(count)
const emptyCount = visible.length ? 0 : Math.max(1, targetCount)
return (
@@ -7103,7 +7090,9 @@ function StoryboardVideoSlots({
<span className="shrink-0 text-[10px] text-white/34">
{videos.length ? `${videos.length}${runningCount ? ` · ${runningCount} 生成中` : ""}` : enabled ? "待生成" : "待抽帧"}
</span>
{selectedVideo ? <span className="rounded border border-emerald-300/20 bg-emerald-300/[0.08] px-1.5 py-0.5 text-[10px] text-emerald-100/72"> {shortId(selectedVideo.id)}</span> : null}
{videos.length ? (
<span className="rounded border border-emerald-300/18 bg-emerald-300/[0.07] px-1.5 py-0.5 text-[10px] text-emerald-100/70"></span>
) : null}
</div>
<div className="flex flex-wrap items-center gap-1.5">
<label className="inline-flex h-7 items-center gap-1 rounded-md border border-white/10 bg-black/36 px-1.5 text-[10px] font-semibold text-white/48">
@@ -7146,9 +7135,7 @@ function StoryboardVideoSlots({
key={video.id}
job={job}
video={video}
selected={selectedVideoId === video.id}
className="h-[168px] w-[94px]"
onSelect={onSelect ? () => onSelect(video.id) : undefined}
onRegenerate={onRegenerate}
onDelete={onDeleteVideo ? () => onDeleteVideo(video.id) : undefined}
/>
@@ -7265,40 +7252,50 @@ function StoryboardVideoPreview({
job,
video,
className = "h-20 w-12",
selected = false,
onSelect,
onRegenerate,
onDelete,
}: {
job: Job
video: GeneratedVideo
className?: string
selected?: boolean
onSelect?: () => void
onRegenerate?: () => void
onDelete?: () => void
}) {
const src = videoSrc(video)
const playableSrc = src && video.status === "completed" ? src : ""
const poster = videoPoster(job, video)
const running = video.status === "queued" || video.status === "in_progress"
return (
<MediaAssetTile
kind="video"
src={src && video.status === "completed" ? src : undefined}
src={playableSrc || undefined}
poster={poster}
href={onSelect ? undefined : src || undefined}
href={playableSrc || undefined}
alt={`片段 ${shortId(video.id)}`}
label={`${shortId(video.id)} · ${video.model}`}
meta={video.status}
className={`shrink-0 bg-black/45 ${className}`}
objectFit="cover"
selected={selected}
onClick={onSelect}
title={`${video.model} · ${video.status}`}
title={playableSrc ? "点击打开视频预览" : `${video.model} · ${video.status}`}
bottom={<span className="block truncate rounded bg-black/70 px-1 py-0.5 text-center font-mono text-[9px] text-white/62">{running ? "生成中" : video.status === "failed" ? "失败" : shortId(video.id)}</span>}
topLeft={selected ? <span className="inline-flex h-5 w-5 items-center justify-center rounded-full bg-emerald-400 text-black"><Check className="h-3 w-3" /></span> : undefined}
topRight={running ? <Loader2 className="h-3 w-3 animate-spin text-cyan-100" /> : undefined}
actions={onRegenerate ? [{ key: "regen", label: "重生一个候选", icon: <RefreshCw className="h-3 w-3" />, onClick: onRegenerate, tone: "cyan" }] : []}
actions={[
...(playableSrc ? [{
key: "download",
label: "下载视频",
icon: <Download className="h-3 w-3" />,
onClick: () => downloadMedia(playableSrc, `skg-storyboard-${shortId(video.id)}.mp4`),
tone: "cyan" as const,
}] : []),
...(onRegenerate ? [{
key: "regen",
label: "重生一个候选",
icon: <RefreshCw className="h-3 w-3" />,
onClick: onRegenerate,
tone: "neutral" as const,
}] : []),
]}
actionsAlwaysVisible={!!playableSrc}
onDelete={onDelete}
deleteLabel="删除这个视频候选"
/>
@@ -7921,13 +7918,13 @@ function ComposeSummary({
<PanelRight className="h-4 w-4 text-rose-200" />
<div>
<div className="text-[13px] font-semibold text-white"></div>
<div className="text-[11px] text-white/40">广</div>
<div className="text-[11px] text-white/40">广</div>
</div>
</div>
<div className="flex items-center gap-2 text-[11px] text-white/52">
<Requirement label="音频" ready={audioReady} detail={audioReady ? "已生成" : "待解析"} />
<Requirement label="候选" ready={generatedVideoCount > 0} detail={`${generatedVideoCount}`} />
<Requirement label="已选" ready={selectedVideoCount > 0} detail={`${selectedVideoCount}`} />
<Requirement label="自动存" ready={generatedVideoCount > 0} detail={generatedVideoCount > 0 ? "已保存" : "待候选"} />
<button type="button" disabled className="inline-flex h-10 cursor-not-allowed items-center justify-center gap-2 rounded-md border border-white/10 bg-white/[0.04] px-3 text-[12px] font-semibold text-white/34">
<Film className="h-4 w-4" />
@@ -8177,19 +8174,31 @@ function VideoCandidate({
const src = videoSrc(video)
const poster = videoPoster(job, video)
const running = video.status === "queued" || video.status === "in_progress"
const playableSrc = src && video.status === "completed" ? src : ""
const thumb = (
<>
{playableSrc ? (
<video src={playableSrc} poster={poster} muted playsInline className="h-full w-full object-cover" />
) : poster ? (
<img src={poster} alt={`片段 ${shortId(video.id)}`} className="h-full w-full object-cover opacity-80" />
) : (
<div className="flex h-full w-full items-center justify-center text-white/30"><Film className="h-4 w-4" /></div>
)}
<div className="absolute right-1 top-1 rounded-full bg-black/70 p-0.5">{selected ? <Check className="h-3 w-3 text-rose-200" /> : <Circle className="h-3 w-3 text-white/55" />}</div>
</>
)
return (
<div className={`rounded-lg border p-2 transition ${selected ? "border-rose-400/70 bg-rose-500/10" : "border-white/10 bg-black/30"}`}>
<div className="flex gap-2">
<button type="button" onClick={onToggle} className="relative h-24 w-14 shrink-0 overflow-hidden rounded-md border border-white/10 bg-black">
{src && video.status === "completed" ? (
<video src={src} poster={poster} muted playsInline className="h-full w-full object-cover" />
) : poster ? (
<img src={poster} alt={`片段 ${shortId(video.id)}`} className="h-full w-full object-cover opacity-80" />
) : (
<div className="flex h-full w-full items-center justify-center text-white/30"><Film className="h-4 w-4" /></div>
)}
<div className="absolute right-1 top-1 rounded-full bg-black/70 p-0.5">{selected ? <Check className="h-3 w-3 text-rose-200" /> : <Circle className="h-3 w-3 text-white/55" />}</div>
</button>
{playableSrc ? (
<a href={playableSrc} target="_blank" rel="noreferrer" className="relative h-24 w-14 shrink-0 overflow-hidden rounded-md border border-white/10 bg-black" title="打开视频预览">
{thumb}
</a>
) : (
<div className="relative h-24 w-14 shrink-0 overflow-hidden rounded-md border border-white/10 bg-black">
{thumb}
</div>
)}
<div className="min-w-0 flex-1">
<div className="flex items-center justify-between gap-2">
<div className="truncate font-mono text-[12px] text-white/80">{shortId(video.id)} · {video.model}</div>
@@ -8204,11 +8213,17 @@ function VideoCandidate({
<span>{video.progress}%</span>
</div>
{video.error && <div className="mt-1 line-clamp-2 text-[11px] text-rose-200/80">{video.error}</div>}
{src && video.status === "completed" && (
<a href={src} target="_blank" rel="noreferrer" className="mt-2 inline-flex items-center gap-1 text-[11px] font-medium text-cyan-200 hover:text-cyan-100">
<Play className="h-3 w-3" />
</a>
{playableSrc && (
<div className="mt-2 flex flex-wrap items-center gap-2">
<a href={playableSrc} target="_blank" rel="noreferrer" className="inline-flex items-center gap-1 text-[11px] font-medium text-cyan-200 hover:text-cyan-100">
<Play className="h-3 w-3" />
</a>
<a href={playableSrc} download={`skg-storyboard-${shortId(video.id)}.mp4`} className="inline-flex items-center gap-1 text-[11px] font-medium text-emerald-200 hover:text-emerald-100">
<Download className="h-3 w-3" />
</a>
</div>
)}
</div>
</div>

View File

@@ -319,7 +319,7 @@ export function AudioStrip({ job, open, onClose }: { job: Job | null; open: bool
</div>
) : (
<div className="flex h-full items-center justify-center rounded-lg border border-dashed border-white/12 text-[12px] text-white/45">
</div>
)}
</div>

View File

@@ -46,6 +46,7 @@ type MediaAssetTileProps = {
deleting?: boolean
deleteDisabled?: boolean
actions?: MediaAssetAction[]
actionsAlwaysVisible?: boolean
disablePreview?: boolean
}
@@ -106,6 +107,7 @@ export function MediaAssetTile({
deleting = false,
deleteDisabled = false,
actions = [],
actionsAlwaysVisible = false,
disablePreview = false,
}: MediaAssetTileProps) {
const [position, setPosition] = useState<{ left: number; top: number; width: number } | null>(null)
@@ -200,7 +202,7 @@ export function MediaAssetTile({
{topRight ? <div className="pointer-events-none absolute right-1 top-1 z-10">{topRight}</div> : null}
{bottom ? <div className="pointer-events-none absolute bottom-1 left-1 right-1 z-10">{bottom}</div> : null}
{(actions.length || onDelete) ? (
<div className="absolute right-1 top-1 z-20 flex flex-col gap-0.5 opacity-0 transition group-hover:opacity-100 group-focus-within:opacity-100">
<div className={`absolute right-1 top-1 z-20 flex flex-col gap-0.5 transition ${actionsAlwaysVisible ? "opacity-100" : "opacity-0 group-hover:opacity-100 group-focus-within:opacity-100"}`}>
{actions.map((action) => (
<button
key={action.key}

View File

@@ -2017,7 +2017,7 @@ export function ASRNode({ data, selected }: any) {
onTogglePin={() => d.onToggleNodePin?.("asr")}
>
<div className="text-[11.5px] text-[var(--text-soft)]">
OpenAI-compatible ASR ·
OpenAI-compatible ASR ·
</div>
{d.job && d.job.transcript.length > 0 && (
<div className="mt-2 max-h-24 overflow-y-auto text-[11px] space-y-1 text-[var(--text-strong)]">