23 Commits

Author SHA1 Message Date
c60cb47ee1 chore: record retry analysis worklog 2026-05-18 16:52:23 +08:00
061eb7d867 fix: allow retrying failed source analysis 2026-05-18 16:49:51 +08:00
07384c5e19 chore: record tiktok cookies worklog 2026-05-18 16:43:07 +08:00
4280624810 feat: support tiktok download cookies 2026-05-18 16:39:20 +08:00
028718df0b chore: record voice cleanup worklog 2026-05-18 15:53:26 +08:00
a6eddf1c14 chore: remove personal voice channel remnants 2026-05-18 15:52:44 +08:00
9e307e307c chore: record database backfill worklog 2026-05-18 15:41:25 +08:00
c2e9558f5b fix: backfill database on startup 2026-05-18 15:40:58 +08:00
c626ec51d6 chore: record database worklog 2026-05-18 15:35:53 +08:00
1ac9b1bde3 feat: add backend document database 2026-05-18 15:34:15 +08:00
1c451c6ab3 auto-save 2026-05-18 15:29 (+1, ~5) 2026-05-18 15:29:47 +08:00
408c5fca47 feat: use random subject frame extraction 2026-05-18 15:17:37 +08:00
2a1aa4c994 auto-save 2026-05-18 15:13 (~8) 2026-05-18 15:13:30 +08:00
ebac2e86b5 auto-save 2026-05-18 15:07 (~5) 2026-05-18 15:08:05 +08:00
47653ee319 chore: record voice worklog 2026-05-18 14:52:56 +08:00
4d2a4a0299 fix: force azure openai tts voice path 2026-05-18 14:49:53 +08:00
e6387cf7af auto-save 2026-05-18 14:46 (~7) 2026-05-18 14:46:24 +08:00
fde94f4698 chore: record worklog update 2026-05-18 14:42:13 +08:00
dddf410dcb chore: update development worklog 2026-05-18 14:39:23 +08:00
301ec4fc3b docs: refresh current project status 2026-05-18 14:38:02 +08:00
2cfd7de5d5 chore: force gpt routing for vision and rewrite 2026-05-18 14:34:36 +08:00
a2897ef2be chore: switch vision and rewrite models to gpt 2026-05-18 14:31:59 +08:00
e6a5ea46a6 auto-save 2026-05-18 14:30 (~5) 2026-05-18 14:30:08 +08:00
14 changed files with 1282 additions and 464 deletions

View File

@@ -1,146 +1,110 @@
# SKG TK 二创验证 — 当前状态2026-05-13 # SKG TK 二创验证 — 当前状态2026-05-18
## 一句话 ## 一句话
SKG AI 素材生产管线第二条思路TK 链接/上传 → 拆轨 → 抽关键帧5 张+手动加)→ Vision 识别 → 改写文案 → 生图 → 生视频 → 合成。**MVP 通到生图,剩余 3 个节点占位** 当前产品方向已收窄为“信息流广告快速复刻”TK 链接 / 上传视频后,先下载源视频,再并行跑音频文案路和视频视觉路;视频视觉路自动抽 6 张人物定向随机参考帧;产品素材独立成池,自动识别视角并补缺角度;分镜工作台按逐句时间轴写新口播、人物/产品需求和首尾帧规划。当前暂停直接提交视频模型,先逐条生成并审核首帧 / 尾帧
## 路径 / 端口 ## 路径 / 端口
- 路径:`~/Projects/business/20260512-20260512-skg-tk-二创验证/` - 当前工作树:`/Users/kangwan/Projects/business/20260512-20260512-skg-tk-二创验证-backend/`
- web dev`cd web && pnpm dev`(端口 **4290** - 主项目路径:`/Users/kangwan/Projects/business/20260512-20260512-skg-tk-二创验证/`
- api dev`cd api && source .venv/bin/activate && uvicorn main:app --port 4291 --reload` - 后台启动:`./scripts/start-dev-background.sh`(前端 4290后端 4291launchd 托管)
- 测试 job`?job=c6767f3a166b`chrisorb 71s 竖屏 TK - 后台停止:`./scripts/stop-dev-background.sh`
- web dev`cd web && npm run dev`
- api dev`cd api && uvicorn main:app --host 127.0.0.1 --port 4291`
- 注意:后端不要带 `--reload` 跑下载、抽帧、音频和生图等长任务。
## SKG 网关能力(实测 · 关键!) ## 当前模型分工
`base_url: https://ai.skg.com/ezlink/v1` `LLM_BASE_URL` 默认走 `https://ai.skg.com/ezlink/v1`,图片同样默认走 `IMAGE_BASE_URL=https://ai.skg.com/ezlink/v1`,语音默认走 `https://ai.skg.com/azure`,生产视频默认走 `https://ai.skg.com/doubao`
key 写在 `api/.env``LLM_API_KEY`
| 端点 / 字段 | 状态 | 用途 | | 任务 | 当前模型 / 通道 | 备注 |
|---|---|---| |---|---|---|
| `/v1/chat/completions` text-only | ✅ 通 | translate / rewrite | | TK 下载 | `yt-dlp` + 可选 cookies | 公开视频裸下载;受限视频可配 `YTDLP_COOKIES_FILE``YTDLP_COOKIES_FROM_BROWSER`,也可直接上传 MP4。 |
| `/v1/chat/completions` + image_url | ✅ **通**(之前误判为不通,是 dog.jpg 那张图损坏) | vision 识别图片gemini-2.5-flash 推荐) | | 远端 ASR | `ASR_MODEL=whisper-1` | 失败后进本机 ASR再进多模态兜底。 |
| `/v1/chat/completions` + input_audio | ❌ 不通 | ASR 不能走这条 | | 本机 ASR | `LOCAL_ASR_MODEL=mlx-community/whisper-tiny` | 默认二级兜底,优先产出真实逐句时间轴。 |
| `/v1/audio/transcriptions` (whisper) | ❌ 404 | 整个 audio 端点都没暴露 | | ASR 兜底 / 音频分析 | `ASR_FALLBACK_MODEL=gemini-2.5-flash` | 多模态音频兜底;后端会拒绝假字幕、重复文本和覆盖率过低结果。 |
| `/v1/audio/speech` (tts) | ❌ 404 | | | 字幕翻译 | `TRANSLATE_MODEL=gemini-2.5-flash` | 保留 Gemini。 |
| `/v1/images/generations` (text→image) | ✅ 通 | 生图gemini-3-pro-image-preview = nano-banana-pro | | 画面理解 | `VISION_MODEL=gpt-4o` | 关键帧 Vision 已切 GPT旧环境若写 `gemini-*` 会自动归一化到 `GPT_TEXT_MODEL` |
| `/v1/images/generations` + image 参数 | ✅ **通**image-to-image | 实测能传 reference image关键的发现 | | 通用改写 / 分镜描述 | `REWRITE_MODEL=gpt-4o` | 已切 GPT旧 Gemini 覆盖值会自动归一化。 |
| `/v1/images/edits` | ❌ 404 | | | 新口播改写 | `AUDIO_REWRITE_MODEL=gpt-4o` | 默认跟随 `REWRITE_MODEL`;旧 Gemini 覆盖值会自动归一化。 |
| `/v1/videos/*` (sora-2) | ❌ 404 | 视频生成需要 IT 开通或外部 key | | 产品视角识别 | `PRODUCT_VIEW_MODEL=gpt-image-2` | 产品图批量识别视角、左右 / 上下 / 内外侧、用途和风险。 |
| `/v1/files` | ❌ 403 "必须指定渠道" | | | 所有生图 / 修图 | `gpt-image-2` | 服务端硬锁,无图片模型 fallback覆盖关键帧生图、水印清理、元素提取、主体资产包、产品补角度、首尾帧。 |
| 配音 | `VOICE_PROVIDER=azure_openai` + `AZURE_TTS_MODEL=gpt-4o-mini-tts` | 语音固定 Azure OpenAI TTS。后端会按 `AZURE_TTS_PATHS` 依次尝试路径,便于区分路径错误和整条语音服务不可用。 |
| 视频 | `VIDEO_MODEL=seedance` | 当前主流程暂停直接提交;生产通道默认 `ai.skg.com/doubao`Seedance 真实 ID 由 `VIDEO_MODEL_SEEDANCE` 配置。 |
**网关后端 = one-hub 多渠道代理**。当前 key 分组叫「纯OpenAI+AWSClaude+Gemini官方」缺 audio 渠道(`gpt-4o-audio-preview` 503 "无可用渠道")和 video 渠道。 ## 当前主流程
| 步 | 模块 | 状态 | 备注 |
## 模型选型(已写入 api/.env
```
ASR_MODEL=whisper-1 # ⚠️ 端点 404ASR 还没真跑通
TRANSLATE_MODEL=gemini-2.5-flash # ✅ text 已通
REWRITE_MODEL=gemini-2.5-pro # 占位
VISION_MODEL=gemini-2.5-flash # ✅ 识别已通
IMAGE_MODEL=gemini-3-pro-image-preview # ✅ nano-banana-proi2i 已通
```
## Pipeline 状态8 节点合并版)
原 10 节点已合并input + download + split 合一translate 合到 transcriptvideogen 和 compose 占位。
| 步 | 节点 | 状态 | 备注 |
|---|---|---|---| |---|---|---|---|
| 1 | **输入·Input**(合并下载+拆分) | | yt-dlp 真下 + ffmpeg 拆 wav | | 1 | 输入 / 下载 | 已通 | TK 链接或上传视频创建 job下载完成后进入分析队列。 |
| 2 | **关键帧·Keyframes** | | D 启发式:候选 30 张 → pHash 去重 + Laplacian variance 评分 + 时序分桶 → 5 张;手动加帧 OK | | 2 | 音频文案路 | 已通 | `audio.wav`ASR、翻译、讲话人 / 节奏 / 背景音分析;结果默认折叠展示。 |
| 3 | **转录·ASR** | ❌ 阻塞 | SKG 网关 audio 不通;待 IT 开 audio 渠道 / 外部 key | | 3 | 视频视觉路 | 已通 | 自动抽 6 张人物定向随机参考帧;当前工作区按 9:16 原视频播放秒数手动补帧。 |
| 4 | **翻译·Translate** | ❌ 阻塞 | 依赖 ASR | | 4 | 相似主体资产 | 已通 | 用关键帧和可选内置角色生成同一主体的 10 张白底视图。 |
| 5 | **改写·Rewrite** | ⏳ 占位 | 等用户给产品信息模板 | | 5 | 产品资产池 | 已通 | 上传 / 内置产品图统一入池,自动识别视角、结构点、用途、风险,缺角度可补图。 |
| 6 | **生图·Image Gen** | ✅ **刚做完** | nano-banana-pro i2i + 正负 prompt | | 6 | 分镜工作台 | 已通 | 按逐句时间轴编辑新口播、镜头类型、人物 / 产品开关、首帧 / 尾帧规划。 |
| 7 | **生视频·Video Gen** | ⏳ 占位 | sora-2 端点不通 | | 7 | 首尾帧闸门 | 已通 | 每条分镜先用相似主体视图和产品素材生成首帧 / 尾帧,审核后保存。 |
| 8 | **合成·Compose** | ⏳ 占位 | 本地 ffmpeg + 字幕 + TTS | | 8 | 视频候选 | 暂停直提 | 历史候选保留展示;当前不再一键打 Seedance等首尾帧审核后再开放单条提交。 |
## UI 架构(重要)
- **左侧 sidebar**108px 极窄8 个 stage tile 竖排 + DAG 路径分叉表达
- **主区 ReactFlow**8 节点 DAGinput → keyframe/asr → ... → compose
- **点 sidebar tile**:从左滑出 drawer panel粉/紫/橙 Kanban 风格)
- **关键帧 lightbox****embedded 嵌入到 keyframe drawer**(不全屏)—— `<FrameLightbox embedded ... />`drawer 宽度有 expandedFrame 时 760无时 400
- **Input 节点上方**:多视频缩略图浮条 + 「+」加新视频
- **关键帧节点上方**5+ 张缩略图按视频原比例aspect-ratio: width/height
- **缩略图 hover**:弹大图静态(关键帧是垫图素材,不放视频)
- **缩略图点击**:打开 keyframe drawer 内的 lightbox左大图 + 右识别面板)
## 数据模型(关键 typescript / pydantic
```typescript
KeyFrame {
index: number // 稳定 ID不连续frames 数组按 timestamp 排序)
timestamp: number
url: string
description?: {
scene, objects: [{name, position, color, extract_prompt}],
style, suggested_prompt
}
generated_images?: [{ id, prompt, model, mode, url, selected, created_at }]
}
Job { frames: KeyFrame[] ... }
```
**前端取帧必须用 `frames.find(x => x.index === activeIndex)` 不能用数组下标**(之前的 bug
## 关键文件 ## 关键文件
- `web/app/page.tsx` — 多 job state 管理jobs[] + activeJobId8 节点 LAYOUT - `api/main.py` — FastAPI 后端、模型路由、任务状态、ASR/翻译/音频分析、生图、产品识别、首尾帧和视频接口。
- `web/components/dashboard.tsx` — sidebar + drawer + 9 个 Kanban sectioninput/keyframe/asr/translate/rewrite/imagegen/videogen/compose`ImageGenCard` 子组件 - `api/database.py` — 后端数据库层;当前用 SQLite 保存 document / job / media asset 元数据,媒体文件仍在 `jobs/<jobId>/`
- `web/components/lightbox.tsx``FrameLightbox` 支持 `embedded` prop - `api/.env.example` — 本地模型和网关模板;已包含 `GPT_TEXT_MODEL=gpt-4o`
- `web/components/video-lightbox.tsx` — Input 节点点视频缩略图弹的播放器 - `deploy/.env.production.example` — 生产环境模板;视频默认 SKG Doubao / Seedance 网关。
- `web/components/nodes/index.tsx` — ReactFlow 8 节点定义 - `RULES.md` — 启动、部署事实、模型环境变量和项目规则。
- `web/lib/api.ts` — API client - `docs/source-analysis.html` — 源码解析页;任何影响产品理解、接口、模型分工或操作路径的改动都要同步这里。
- `api/main.py` — FastAPI 所有端点KeyFrame/GeneratedImage 模型 - `web/components/ad-recreation-board.tsx` — 当前信息流复刻主工作台。
- `web/components/media-asset-tile.tsx` — 统一媒体素材缩略图、hover 放大、删除和状态遮罩组件。
- `web/lib/api.ts` — 前端 API client 和运行模型标注类型。
## 已通的 API 端点 ## 主要 API
``` ```
POST /jobs 创建 job链接
POST /jobs/upload 上传视频
GET /jobs/{id} job 状态
POST /jobs/{id}/analyze?frames=5 拆轨+抽帧+ASR 自动一气呵成
POST /jobs/{id}/frames?t=<sec> 手动按时间戳加帧
POST /jobs/{id}/frames/{idx}/describe ✅ Vision 识别3 次重试 + reasoning_content 兜底)
POST /jobs/{id}/frames/{idx}/generate ✅ 生图i2i / text-only, 含 negative_prompt
GET /jobs/{id}/frames/{idx}/gen/{gen_id}.jpg 生成图二进制
POST /jobs/{id}/frames/{idx}/gen/{gen_id}/select 选用某 gen 给下游
GET /jobs/{id}/video.mp4 原视频
GET /jobs/{id}/frames/{idx}.jpg 关键帧 jpg
GET /health GET /health
GET /documents
POST /jobs
POST /jobs/{id}/download/retry
POST /jobs/upload
GET /jobs
GET /jobs/{id}
DELETE /jobs/{id}
POST /jobs/{id}/analyze
POST /jobs/{id}/transcribe
POST /jobs/{id}/frames?t=<sec>
DELETE /jobs/{id}/frames/{idx}
POST /jobs/{id}/frames/{idx}/describe
POST /jobs/{id}/frames/{idx}/cleanup
POST /jobs/{id}/frames/{idx}/cleanup/apply
POST /jobs/{id}/frames/{idx}/generate
POST /jobs/{id}/frames/{idx}/scene-asset
POST /jobs/{id}/frames/{idx}/elements
POST /jobs/{id}/frames/{idx}/elements/{element_id}/cutout
POST /jobs/{id}/frames/{idx}/elements/{element_id}/subject-assets
POST /jobs/{id}/assets
PUT /jobs/{id}/product-refs
POST /jobs/{id}/assets/product-views/analyze
POST /jobs/{id}/assets/product-angle
POST /jobs/{id}/script/rewrite
PUT /jobs/{id}/frames/{idx}/storyboard
POST /jobs/{id}/frames/{idx}/storyboard/video
``` ```
## 已知坑 / 不要 ## 当前约束 / 不要踩
1. **关键帧 index 不连续**:手动加帧后 frames 数组按 timestamp 排序index 是稳定 ID。lightbox 必须用 `frames.find(x => x.index === activeIndex)`**不要**用 `frames[activeIndex]` 1. 图片 / 视频 / 抽帧 / 产品图 / 生成图 / 首尾帧 / 视频候选缩略图默认复用 `web/components/media-asset-tile.tsx`
2. **SKG 网关 vision 之前测试结果错误**:用 `dog.jpg` 那张 wikipedia 200px 缩略图损坏 / metadata 异常,导致一直以为 image input 不通。用标准 PNG / 真实 jpeg 测就通了 2. 所有生图入口服务端只允许 `gpt-image-2`,不要重新加 Gemini 图片模型或其他 fallback
3. **Gemini 2.5 Flash 默认带 thinking**`content` 字段经常为空token 都给了 reasoning要从 `reasoning_content` 正则挖 JSON 兜底 3. 画面理解和文案改写默认归 GPT`VISION_MODEL``REWRITE_MODEL``AUDIO_REWRITE_MODEL` 会拦截旧 `gemini-*` 覆盖值
4. **缩略图 aspect-ratio**:必须用 `aspectRatio: ${job.width}/${job.height}` 自适应,不要强制 `aspect-video` 16:9竖屏视频会被裁切 4. Gemini 仍保留在 ASR fallback / 音频分析 / 翻译链路,不要误删
5. **ReactFlow `type="input"` / `"output"` 是 reserved**:自带白底默认样式,要 CSS 覆盖 `.react-flow .react-flow__node-input { background: transparent !important; ... }` 5. 语音只走 Azure OpenAI TTS不要新增或依赖其他配音通道配置
6. **ReactFlow 12 colorMode 独立于 next-themes**:必须 `<ReactFlow colorMode={resolvedTheme}>` 联动,否则节点白底 6. 当前主流程不直接批量提交视频;先走“分镜规划 → 首尾帧 → 人工审核”
7. **FastAPI BackgroundTasks 用法**`bg.add_task(func, arg)` 不能传 coroutine 7. 产品素材池默认是“同一产品”,不做不同产品身份判断;视角识别必须按佩戴者左 / 右、上 / 下、内 / 外侧描述
8. **ffmpeg 8 mjpeg encoder 拒绝 yuv420p**:抽帧必须加 `-pix_fmt yuvj420p`,且 `-vsync` `-fps_mode` 8. 自动抽帧默认是 `frames=6` + `target=random_subject` + `quality=accurate` + `mode=replace`;如果需要特定动作或表情,用“当前点抽帧”手动补
9. **抽帧速度**:场景切换检测(`select='gt(scene,0.4)'`超慢71s 视频要 30s+),换均匀采样 fast seek5 张 < 3 秒) 9. 文档是顶层业务归类:每个 TK 链接或上传视频默认一个 `document``job` 归属到 `document_id`DB 存元数据和文件索引,视频 / 图片 / 音频文件不进 DB
10. 后端长任务不要用 `--reload`
11. 关键帧 `index` 是稳定 ID不等于数组下标前端取帧用 `frames.find(x => x.index === idx)`
12. TikTok cookies 属于账号登录态,只能放本机 / 服务器私有环境;不要提交 cookies 文件或账号密码。
## 待办(按优先级) ## 最近变更
1. **ASR 阻塞**:找 SKG IT 开 audio 渠道,或给一个外部 ASR keyDeepgram / 讯飞 / OpenAI 直连) - 2026-05-18TK 链接下载新增 `YTDLP_COOKIES_FILE` / `YTDLP_COOKIES_FROM_BROWSER` 支持;受限视频失败时前端提示上传 MP4 或配置后端 cookies 登录态。
2. **生图测试反馈**:刚做完,等用户在浏览器试 → 调 negative prompt / 模型选型 - 2026-05-18素材输入端失败任务支持重新下载 / 重新解析;选中失败且无 `video_url` 的 TK 素材时调用后端重试接口,已有视频的失败任务会清掉自动触发标记并重新跑音频/视觉路。
3. **区域化修图inpainting**:用户讨论了,方案 A 纯 prompt / B 矩形框 / C 画笔 mask / D SAM暂时搁置 - 2026-05-18清理个人语音通道残留`/health`、前端类型、环境模板和文档不再暴露相关字段或配置。
4. **改写 Rewrite**:等用户给产品信息卡模板 - 2026-05-18新增后端数据库层SQLite 默认落在 `APP_DB_URL` / `DATABASE_URL``JOBS_DIR/app.db``/documents` 返回文档归类列表,`/health.database` 返回 DB 状态。
5. **视频生成**sora-2 走 SKG 端点不通;考虑外部 key (Runway/Kling/Veo3) - 2026-05-18`VISION_MODEL``REWRITE_MODEL``AUDIO_REWRITE_MODEL` 切到 GPT 默认模型 `gpt-4o`,并加旧 Gemini 环境变量归一化保护。
6. **合成 Compose**:全本地 ffmpeg + 字幕 + TTS - 2026-05-18语音通道固定 Azure OpenAI TTS并按 `AZURE_TTS_PATHS` 尝试语音路径。
- 2026-05-18当前主路径暂停直接提交视频改为逐条首尾帧闸门。
## 操作流(开发会话) - 2026-05-18媒体素材交互统一收口到 `MediaAssetTile`
```bash - 2026-05-18产品图视角识别和产品缺角度补图收敛到 `gpt-image-2`
# 1. 启动后端(如未跑)
cd ~/Projects/business/20260512-20260512-skg-tk-二创验证/api
source .venv/bin/activate
uvicorn main:app --port 4291 --reload
# 2. 启动前端(如未跑)
cd ../web
pnpm dev
# 3. 浏览器
open http://localhost:4290/?job=c6767f3a166b
```
## 用户偏好提醒feedback memory
- feedback_image-gen-model生图统一用 nano-banana-pro ✅
- feedback_keep-scope-small小需求小做
- feedback_flow-dont-stop连续执行到交付真分叉才问
- feedback_demand-before-infra基建前先反问谁/痛点/频率
- feedback_no-guessing-ports操作前先核实

View File

@@ -1,105 +1,5 @@
{ {
"entries": [ "entries": [
{
"files_changed": 5,
"hash": "d802701",
"message": "auto-save 2026-05-15 17:22 (~4, -1)",
"ts": "2026-05-15T17:22:54+08:00",
"type": "commit"
},
{
"files_changed": 2,
"message": "Codex 会话活跃 · 最近命令codex · 2 项未提交变更 · 最近提交auto-save 2026-05-15 17:22 (~4, -1)",
"ts": "2026-05-15T09:24:48Z",
"type": "session-heartbeat"
},
{
"files_changed": 3,
"hash": "dcd8560",
"message": "auto-save 2026-05-15 17:28 (~3)",
"ts": "2026-05-15T17:28:27+08:00",
"type": "commit"
},
{
"files_changed": 1,
"hash": "25c4723",
"message": "auto-save 2026-05-15 17:33 (~1)",
"ts": "2026-05-15T17:33:59+08:00",
"type": "commit"
},
{
"files_changed": 1,
"message": "Codex 会话活跃 · 最近命令codex · 1 项未提交变更 · 最近提交auto-save 2026-05-15 17:33 (~1)",
"ts": "2026-05-15T09:34:48Z",
"type": "session-heartbeat"
},
{
"files_changed": 1,
"hash": "1110500",
"message": "auto-save 2026-05-15 17:39 (~1)",
"ts": "2026-05-15T17:39:32+08:00",
"type": "commit"
},
{
"files_changed": 1,
"message": "Codex 会话活跃 · 最近命令codex · 1 项未提交变更 · 最近提交auto-save 2026-05-15 17:39 (~1)",
"ts": "2026-05-15T09:44:48Z",
"type": "session-heartbeat"
},
{
"files_changed": 1,
"hash": "0b97d03",
"message": "auto-save 2026-05-15 17:44 (~1)",
"ts": "2026-05-15T17:45:02+08:00",
"type": "commit"
},
{
"files_changed": 1,
"hash": "eeeaebd",
"message": "auto-save 2026-05-15 17:50 (~1)",
"ts": "2026-05-15T17:50:32+08:00",
"type": "commit"
},
{
"files_changed": 3,
"message": "Codex 会话活跃 · 最近命令codex · 3 项未提交变更 · 最近提交auto-save 2026-05-15 17:50 (~1)",
"ts": "2026-05-15T09:54:48Z",
"type": "session-heartbeat"
},
{
"files_changed": 4,
"hash": "a662130",
"message": "auto-save 2026-05-15 17:55 (+1, ~3)",
"ts": "2026-05-15T17:56:05+08:00",
"type": "commit"
},
{
"files_changed": 2,
"hash": "fae3fb3",
"message": "auto-save 2026-05-15 18:01 (~2)",
"ts": "2026-05-15T18:01:35+08:00",
"type": "commit"
},
{
"files_changed": 1,
"message": "Codex 会话活跃 · 最近命令codex · 1 项未提交变更 · 最近提交auto-save 2026-05-15 18:01 (~2)",
"ts": "2026-05-15T10:04:49Z",
"type": "session-heartbeat"
},
{
"files_changed": 1,
"hash": "84143bc",
"message": "auto-save 2026-05-15 18:06 (~1)",
"ts": "2026-05-15T18:07:06+08:00",
"type": "commit"
},
{
"files_changed": 1,
"hash": "6c8bc42",
"message": "auto-save 2026-05-15 18:12 (~1)",
"ts": "2026-05-15T18:12:39+08:00",
"type": "commit"
},
{ {
"files_changed": 4, "files_changed": 4,
"message": "Codex 会话活跃 · 最近命令codex · 4 项未提交变更 · 最近提交auto-save 2026-05-15 18:12 (~1)", "message": "Codex 会话活跃 · 最近命令codex · 4 项未提交变更 · 最近提交auto-save 2026-05-15 18:12 (~1)",
@@ -3254,6 +3154,111 @@
"message": "auto-save 2026-05-18 07:27 (~6)", "message": "auto-save 2026-05-18 07:27 (~6)",
"hash": "9790e5b", "hash": "9790e5b",
"files_changed": 6 "files_changed": 6
},
{
"ts": "2026-05-18T14:30:08+08:00",
"type": "commit",
"message": "auto-save 2026-05-18 14:30 (~5)",
"hash": "e6a5ea4",
"files_changed": 5
},
{
"ts": "2026-05-18T14:31:59+08:00",
"type": "commit",
"message": "chore: switch vision and rewrite models to gpt",
"hash": "a2897ef",
"files_changed": 0
},
{
"ts": "2026-05-18T14:34:36+08:00",
"type": "commit",
"message": "chore: force gpt routing for vision and rewrite",
"hash": "2cfd7de",
"files_changed": 5
},
{
"ts": "2026-05-18T14:38:02+08:00",
"type": "commit",
"message": "docs: refresh current project status",
"hash": "301ec4f",
"files_changed": 1
},
{
"ts": "2026-05-18T14:39:23+08:00",
"type": "commit",
"message": "chore: update development worklog",
"hash": "dddf410",
"files_changed": 1
},
{
"ts": "2026-05-18T14:46:24+08:00",
"type": "commit",
"message": "auto-save 2026-05-18 14:46 (~7)",
"hash": "e6387cf",
"files_changed": 7
},
{
"ts": "2026-05-18T14:49:53+08:00",
"type": "commit",
"message": "fix: force azure openai tts voice path",
"hash": "4d2a4a0",
"files_changed": 4
},
{
"ts": "2026-05-18T15:08:05+08:00",
"type": "commit",
"message": "auto-save 2026-05-18 15:07 (~5)",
"hash": "ebac2e8",
"files_changed": 5
},
{
"ts": "2026-05-18T15:13:30+08:00",
"type": "commit",
"message": "auto-save 2026-05-18 15:13 (~8)",
"hash": "2a1aa4c",
"files_changed": 8
},
{
"ts": "2026-05-18T15:29:47+08:00",
"type": "commit",
"message": "auto-save 2026-05-18 15:29 (+1, ~5)",
"hash": "1c451c6",
"files_changed": 6
},
{
"ts": "2026-05-18T15:34:15+08:00",
"type": "commit",
"message": "feat: add backend document database",
"hash": "1ac9b1b",
"files_changed": 4
},
{
"ts": "2026-05-18T15:40:58+08:00",
"type": "commit",
"message": "fix: backfill database on startup",
"hash": "c2e9558",
"files_changed": 1
},
{
"ts": "2026-05-18T15:51:30+08:00",
"type": "commit",
"message": "chore: remove personal voice channel remnants",
"hash": "a6eddf1",
"files_changed": 7
},
{
"ts": "2026-05-18T16:35:29+08:00",
"type": "commit",
"message": "feat: support tiktok download cookies",
"hash": "4280624",
"files_changed": 9
},
{
"ts": "2026-05-18T16:49:51+08:00",
"type": "commit",
"message": "fix: allow retrying failed source analysis",
"hash": "061eb7d",
"files_changed": 6
} }
] ]
} }

View File

@@ -11,7 +11,7 @@
- 详见 `CLAUDE.md` 立项决策段 + `.memory/plan.md` 七步管线拆解 - 详见 `CLAUDE.md` 立项决策段 + `.memory/plan.md` 七步管线拆解
- 风格:`04-Dark-Gallery-Ambient`(路径:`~/Projects/research/20260305-网页风格库/04-Dark-Gallery-Ambient.md` - 风格:`04-Dark-Gallery-Ambient`(路径:`~/Projects/research/20260305-网页风格库/04-Dark-Gallery-Ambient.md`
- 第一冲刺:步骤 1-4下载 / 拆轨 / 关键帧 / ASR+翻译) - 第一冲刺:步骤 1-4下载 / 拆轨 / 关键帧 / ASR+翻译)
- 当前产品方向2026-05-18 再确认):先解决信息流广告快速复刻的第一步,不再沿用“开始后线性完成抽帧、分镜、元素生成、合成”的旧做法。主界面为“左侧素材输入列 + 右侧信息流复刻工作表”。用户粘贴 TK 链接或上传视频后点击“开始分析”,系统自动下载源视频;下载完成后并行启动两条路:音频文案路提取原音频文案/字幕,并分析讲话人、语速节奏、背景音乐/环境声/音效;视频视觉路自动抽取参考帧,供人工选择可用主体并生成相似主体白底视图。产品图上传后独立形成产品资产包,自动识别视角/结构/比例并补缺角度。分镜工作台按逐句时间轴规划新口播、镜头类型、首帧/尾帧、人物需求和产品出现方式;当前暂停直接调视频模型,先逐条用“相似主体视图 + 产品素材池 + 首尾帧文字规划”生成并审核首帧/尾帧,保存规划后再决定哪些分镜进入单条视频候选。 - 当前产品方向2026-05-18 再确认):先解决信息流广告快速复刻的第一步,不再沿用“开始后线性完成抽帧、分镜、元素生成、合成”的旧做法。主界面为“左侧素材输入列 + 右侧信息流复刻工作表”。用户粘贴 TK 链接或上传视频后点击“开始分析”,系统自动下载源视频;下载完成后并行启动两条路:音频文案路提取原音频文案/字幕,并分析讲话人、语速节奏、背景音乐/环境声/音效;视频视觉路自动抽取 6 张人物定向随机参考帧,供人工选择可用主体并生成相似主体白底视图。产品图上传后独立形成产品资产包,自动识别视角/结构/比例并补缺角度。分镜工作台按逐句时间轴规划新口播、镜头类型、首帧/尾帧、人物需求和产品出现方式;当前暂停直接调视频模型,先逐条用“相似主体视图 + 产品素材池 + 首尾帧文字规划”生成并审核首帧/尾帧,保存规划后再决定哪些分镜进入单条视频候选。
## 部署事实 ## 部署事实
- 平台VPS `76.13.31.179`Ubuntu 24.04 / Docker Compose / Coolify Traefik - 平台VPS `76.13.31.179`Ubuntu 24.04 / Docker Compose / Coolify Traefik
@@ -24,7 +24,7 @@
- 服务器目录:`/opt/skg-marketing-studio` - 服务器目录:`/opt/skg-marketing-studio`
- 生产启动:`docker compose -f docker-compose.prod.yml --env-file deploy/.env.production up -d --build` - 生产启动:`docker compose -f docker-compose.prod.yml --env-file deploy/.env.production up -d --build`
- 生产架构:`web` 容器用 Nginx 承载 Next 静态导出;`/login/``/_next/``/assets/``/skg-logo-black.svg``/oasis-source/` 等登录页必需静态资源公开访问;未登录访问工作台跳转 `/login/``/api/` 通过 Nginx `auth_request` 校验 FastAPI 会话 Cookie 后反代到 `skg-marketing-api:4291`Traefik 通过 `coolify` 外部网络接入 80/443 - 生产架构:`web` 容器用 Nginx 承载 Next 静态导出;`/login/``/_next/``/assets/``/skg-logo-black.svg``/oasis-source/` 等登录页必需静态资源公开访问;未登录访问工作台跳转 `/login/``/api/` 通过 Nginx `auth_request` 校验 FastAPI 会话 Cookie 后反代到 `skg-marketing-api:4291`Traefik 通过 `coolify` 外部网络接入 80/443
- 持久化目录:服务器 `./data/jobs` 挂载到后端 `/data/jobs` - 持久化目录:服务器 `./data/jobs` 挂载到后端 `/data/jobs`;默认后端数据库为 `APP_DB_URL=sqlite:////data/jobs/app.db`,只存文档 / job / 媒体资产元数据和文件索引,原视频、音频、抽帧、生图、视频候选仍放在 `/data/jobs/<jobId>/`
- 登录凭证:用户名写下方快捷登录;密码明文备份只放服务器 `/root/skg-marketing-studio-login.txt`,生产环境变量 `WEB_AUTH_PASSWORD` / `WEB_AUTH_SESSION_SECRET` 只放服务器 `deploy/.env.production` - 登录凭证:用户名写下方快捷登录;密码明文备份只放服务器 `/root/skg-marketing-studio-login.txt`,生产环境变量 `WEB_AUTH_PASSWORD` / `WEB_AUTH_SESSION_SECRET` 只放服务器 `deploy/.env.production`
## 快捷登录 ## 快捷登录
@@ -56,20 +56,21 @@
- `ASR_TIMEOUT_SECONDS`:远端 ASR / 音频分析单次请求超时,默认 45 秒,避免第一步长时间停在转录中 - `ASR_TIMEOUT_SECONDS`:远端 ASR / 音频分析单次请求超时,默认 45 秒,避免第一步长时间停在转录中
- `LOCAL_ASR_BIN` / `LOCAL_ASR_MODEL` / `LOCAL_ASR_TIMEOUT_SECONDS`:本机 ASR 兜底,默认使用 `/opt/homebrew/bin/mlx_whisper` + `mlx-community/whisper-tiny`,用于当前 SKG 网关 `/audio/transcriptions` 不可用时生成真实逐句时间轴 - `LOCAL_ASR_BIN` / `LOCAL_ASR_MODEL` / `LOCAL_ASR_TIMEOUT_SECONDS`:本机 ASR 兜底,默认使用 `/opt/homebrew/bin/mlx_whisper` + `mlx-community/whisper-tiny`,用于当前 SKG 网关 `/audio/transcriptions` 不可用时生成真实逐句时间轴
- `TRANSLATE_MODEL`:字幕翻译模型,默认 `gemini-2.5-flash` - `TRANSLATE_MODEL`:字幕翻译模型,默认 `gemini-2.5-flash`
- `REWRITE_MODEL`通用改写/分镜描述模型,默认 `gemini-2.5-pro` - `GPT_TEXT_MODEL`GPT 文本 / 视觉默认模型,默认 `gpt-4o`;用于兜底修正旧 Gemini 覆盖值
- `AUDIO_REWRITE_MODEL`后续音频口播改写模型,默认跟随 `REWRITE_MODEL`;当前第一步不默认调用口播改写,只保留原文案和声音分析 - `REWRITE_MODEL`通用改写/分镜描述模型,默认 `gpt-4o`;如果旧环境仍写 `gemini-*`,后端会自动改用 `GPT_TEXT_MODEL`
- `VISION_MODEL`:关键帧画面理解模型,默认 `gpt-4o`;如果旧环境仍写 `gemini-*`,后端会自动改用 `GPT_TEXT_MODEL`
- `AUDIO_REWRITE_MODEL`:后续音频口播改写模型,默认跟随 `REWRITE_MODEL`;如果旧环境仍写 `gemini-*`,后端会自动改用 `REWRITE_MODEL`
- `AUDIO_PRODUCT_BRIEF`:音频口播改写时注入的 SKG 产品卖点 - `AUDIO_PRODUCT_BRIEF`:音频口播改写时注入的 SKG 产品卖点
- `PRODUCT_VIEW_MODEL`:同一产品素材池的视角标注/自动识别模型;当前按项目要求强制使用 `gpt-image-2` - `PRODUCT_VIEW_MODEL`:同一产品素材池的视角标注/自动识别模型;当前按项目要求强制使用 `gpt-image-2`
- `IMAGE_BASE_URL` / `IMAGE_API_KEY` / `IMAGE_MODEL`OpenAI 兼容生图网关;当前所有生图入口一律强制使用 `gpt-image-2`,不做其他图片模型 fallback - `IMAGE_BASE_URL` / `IMAGE_API_KEY` / `IMAGE_MODEL`OpenAI 兼容生图网关;当前所有生图入口一律强制使用 `gpt-image-2`,不做其他图片模型 fallback
- `GPT_IMAGE_MODEL` / `SUBJECT_ASSET_IMAGE_MODEL` / `SUBJECT_ASSET_IMAGE_MODELS`:保留兼容旧环境变量名,但服务端会强制主体 6 视图和所有其他生图入口都只使用 `gpt-image-2` - `GPT_IMAGE_MODEL` / `SUBJECT_ASSET_IMAGE_MODEL` / `SUBJECT_ASSET_IMAGE_MODELS`:保留兼容旧环境变量名,但服务端会强制主体 6 视图和所有其他生图入口都只使用 `gpt-image-2`
- `AI_HTTP_PROXY` / `IMAGE_HTTP_PROXY`:可选的 AI 网关出站代理;本地 launchd 后台进程不一定继承 shell 的 `http_proxy/https_proxy`,如生图报 DNS / ConnectError可在本地 `api/.env` 配置后重启后端。`/health` 只回传是否配置代理,不回传代理地址。 - `AI_HTTP_PROXY` / `IMAGE_HTTP_PROXY`:可选的 AI 网关出站代理;本地 launchd 后台进程不一定继承 shell 的 `http_proxy/https_proxy`,如生图报 DNS / ConnectError可在本地 `api/.env` 配置后重启后端。`/health` 只回传是否配置代理,不回传代理地址。
- `VOICE_PROVIDER`:配音通道,当前固定使用 `azure_openai` - `YTDLP_COOKIES_FILE` / `YTDLP_COOKIES_FROM_BROWSER`:可选 TikTok 下载登录态;优先使用 cookies 文件,其次读取本机浏览器 cookies。cookies 文件属于敏感登录态,只能放本机或服务器私有路径,不允许入库。
- `VOICE_PROVIDER`:配音通道,服务端固定使用 `azure_openai`
- `AZURE_OPENAI_BASE_URL` / `AZURE_OPENAI_API_KEY`:微软 Azure OpenAI 协议配音网关;本地未单独配置 Key 时回退复用 `LLM_API_KEY` - `AZURE_OPENAI_BASE_URL` / `AZURE_OPENAI_API_KEY`:微软 Azure OpenAI 协议配音网关;本地未单独配置 Key 时回退复用 `LLM_API_KEY`
- `AZURE_TTS_MODEL` / `AZURE_TTS_VOICE_ID` / `AZURE_TTS_VOICE_POOL` / `AZURE_TTS_PATH`Azure OpenAI TTS 模型、默认音色、音色池和 OpenAI 协议语音路径 - `AZURE_TTS_MODEL` / `AZURE_TTS_VOICE_ID` / `AZURE_TTS_VOICE_POOL` / `AZURE_TTS_PATH` / `AZURE_TTS_PATHS`Azure OpenAI TTS 模型、默认音色、音色池和 OpenAI 协议语音路径;后端会按 `AZURE_TTS_PATHS` 依次尝试,便于区分路径不对和整条语音服务不可用
- `MINIMAX_API_KEY`MiniMax T2A 配音 Key只能放本地 `api/.env`,不能入库;当前第一步暂不默认调用
- `MINIMAX_TTS_BASE_URL` / `MINIMAX_TTS_MODEL` / `MINIMAX_TTS_VOICE_ID`MiniMax 旧配音端点、模型和兜底音色配置,仅作为保留兼容;当前不作为默认语音通道
- `MINIMAX_TTS_VOICE_POOL`MiniMax 英文随机音色池;当前默认男声 `English_magnetic_voiced_man`、女声 `English_Upbeat_Woman`、成熟声 `English_MaturePartner`,供后续新配音阶段使用
- `POE_API_KEY` / `VIDEO_API_KEY`:视频生成通道 Key只能放本地环境变量 - `POE_API_KEY` / `VIDEO_API_KEY`:视频生成通道 Key只能放本地环境变量
- `APP_DB_URL` / `DATABASE_URL`:后端元数据数据库;当前内置实现支持 `sqlite:///`,生产默认 `sqlite:////data/jobs/app.db`。文档归类以 `documents` 为顶层,一条 TK 链接或一次上传默认一个 document`jobs``media_assets` 归属到 `document_id`
- `WEB_AUTH_USERNAME` / `WEB_AUTH_PASSWORD` / `WEB_AUTH_SESSION_SECRET`:生产网页登录和会话签名配置;密码和 session secret 只放服务器环境变量,不入库 - `WEB_AUTH_USERNAME` / `WEB_AUTH_PASSWORD` / `WEB_AUTH_SESSION_SECRET`:生产网页登录和会话签名配置;密码和 session secret 只放服务器环境变量,不入库
- `FFMPEG_BIN` / `FFPROBE_BIN`:可选本地媒体二进制路径;本机 Homebrew ffmpeg 动态库损坏时,后端会自动跳过不可用的 PATH 版本并尝试本机静态 ffmpeg 备选,生产仍建议使用系统 ffmpeg/ffprobe - `FFMPEG_BIN` / `FFPROBE_BIN`:可选本地媒体二进制路径;本机 Homebrew ffmpeg 动态库损坏时,后端会自动跳过不可用的 PATH 版本并尝试本机静态 ffmpeg 备选,生产仍建议使用系统 ffmpeg/ffprobe
- 生产环境变量:服务器只使用 `deploy/.env.production`,模板为 `deploy/.env.production.example`;真实 Key 不入库 - 生产环境变量:服务器只使用 `deploy/.env.production`,模板为 `deploy/.env.production.example`;真实 Key 不入库

View File

@@ -17,7 +17,9 @@ LOCAL_ASR_BIN=/opt/homebrew/bin/mlx_whisper
LOCAL_ASR_MODEL=mlx-community/whisper-tiny LOCAL_ASR_MODEL=mlx-community/whisper-tiny
LOCAL_ASR_TIMEOUT_SECONDS=180 LOCAL_ASR_TIMEOUT_SECONDS=180
TRANSLATE_MODEL=gemini-2.5-flash TRANSLATE_MODEL=gemini-2.5-flash
REWRITE_MODEL=gemini-2.5-pro GPT_TEXT_MODEL=gpt-4o
REWRITE_MODEL=gpt-4o
VISION_MODEL=gpt-4o
PRODUCT_VIEW_MODEL=gpt-image-2 PRODUCT_VIEW_MODEL=gpt-image-2
IMAGE_BASE_URL=https://ai.skg.com/ezlink/v1 IMAGE_BASE_URL=https://ai.skg.com/ezlink/v1
IMAGE_API_KEY= IMAGE_API_KEY=
@@ -27,14 +29,17 @@ SUBJECT_ASSET_IMAGE_MODEL=gpt-image-2
SUBJECT_ASSET_IMAGE_MODELS=gpt-image-2 SUBJECT_ASSET_IMAGE_MODELS=gpt-image-2
# 可选:本地网络需要代理访问 ai.skg.com 时配置launchd 不一定继承 shell 代理变量。 # 可选:本地网络需要代理访问 ai.skg.com 时配置launchd 不一定继承 shell 代理变量。
AI_HTTP_PROXY= AI_HTTP_PROXY=
YTDLP_COOKIES_FILE=
YTDLP_COOKIES_FROM_BROWSER=
VIDEO_MODEL=seedance VIDEO_MODEL=seedance
VIDEO_MODEL_SEEDANCE=seedance-2-fast VIDEO_MODEL_SEEDANCE=seedance-2-fast
VIDEO_MODEL_KLING=kling-omni VIDEO_MODEL_KLING=kling-omni
VIDEO_MODEL_VEO3=veo-3.1-fast VIDEO_MODEL_VEO3=veo-3.1-fast
# 音频文案改写 + Azure OpenAI 配音 # 音频文案改写 + Azure OpenAI 配音
AUDIO_REWRITE_MODEL=gemini-2.5-pro AUDIO_REWRITE_MODEL=gpt-4o
AUDIO_PRODUCT_BRIEF="SKG 智能按摩产品,主打日常肩颈、腰背、眼部、膝盖或足部放松;广告表达要高级、干净、可信,不做医疗疗效承诺。" AUDIO_PRODUCT_BRIEF="SKG 智能按摩产品,主打日常肩颈、腰背、眼部、膝盖或足部放松;广告表达要高级、干净、可信,不做医疗疗效承诺。"
# 语音通道服务端固定为 Azure OpenAI。
VOICE_PROVIDER=azure_openai VOICE_PROVIDER=azure_openai
AZURE_OPENAI_BASE_URL=https://ai.skg.com/azure AZURE_OPENAI_BASE_URL=https://ai.skg.com/azure
AZURE_OPENAI_API_KEY= AZURE_OPENAI_API_KEY=
@@ -42,13 +47,7 @@ AZURE_TTS_MODEL=gpt-4o-mini-tts
AZURE_TTS_VOICE_ID=alloy AZURE_TTS_VOICE_ID=alloy
AZURE_TTS_VOICE_POOL=alloy,verse,shimmer AZURE_TTS_VOICE_POOL=alloy,verse,shimmer
AZURE_TTS_PATH=/audio/speech AZURE_TTS_PATH=/audio/speech
AZURE_TTS_PATHS=/audio/speech,/v1/audio/speech
# MiniMax 旧配音通道,保留兼容;默认不走
MINIMAX_API_KEY=
MINIMAX_TTS_BASE_URL=https://api.minimax.io
MINIMAX_TTS_MODEL=speech-2.8-turbo
MINIMAX_TTS_VOICE_ID=English_expressive_narrator
MINIMAX_TTS_VOICE_POOL=English_magnetic_voiced_man,English_Upbeat_Woman,English_MaturePartner
# Poe 视频 API优先用于 Seedance / Kling / Veo # Poe 视频 API优先用于 Seedance / Kling / Veo
POE_API_BASE_URL=https://api.poe.com/v1 POE_API_BASE_URL=https://api.poe.com/v1
@@ -80,7 +79,8 @@ VIDEO_DURATION_FIELD=seconds
VIDEO_POLL_TIMEOUT_SECONDS=900 VIDEO_POLL_TIMEOUT_SECONDS=900
# 工作目录 # 工作目录
KEYFRAME_COUNT=12 APP_DB_URL=sqlite:///./jobs/app.db
KEYFRAME_COUNT=6
JOBS_DIR=./jobs JOBS_DIR=./jobs
# CORS # CORS

View File

@@ -1,6 +1,6 @@
# SKG TK 二创 API # SKG TK 二创 API
FastAPI 后端,跑 yt-dlp + ffmpeg + ASR/翻译/英文 SKG 产品介绍文案 + MiniMax 英文配音管线。 FastAPI 后端,跑 yt-dlp + ffmpeg + ASR/翻译/英文 SKG 产品介绍文案 + Azure OpenAI 英文配音管线。
## 启动 ## 启动
@@ -9,7 +9,7 @@ cd api
python3 -m venv .venv python3 -m venv .venv
source .venv/bin/activate source .venv/bin/activate
pip install -r requirements.txt pip install -r requirements.txt
cp .env.example .env # 按需填 LLM_API_KEY / MINIMAX_API_KEY cp .env.example .env # 按需填 LLM_API_KEY / AZURE_OPENAI_API_KEY
uvicorn main:app --host 127.0.0.1 --port 4291 uvicorn main:app --host 127.0.0.1 --port 4291
``` ```
@@ -18,21 +18,23 @@ uvicorn main:app --host 127.0.0.1 --port 4291
## 路由 ## 路由
- `GET /health` — 健康检查 + 配置状态 - `GET /health` — 健康检查 + 配置状态
- `GET /documents` — 后端数据库里的文档归类列表;一条 TK 链接或一次上传视频默认一个 document
- `POST /jobs` `{url}` — 创建 job后台下载源视频视频就绪后可手动解析或提取音频 - `POST /jobs` `{url}` — 创建 job后台下载源视频视频就绪后可手动解析或提取音频
- `GET /jobs/{id}` — 当前状态 + 产物;若原始音轨已拆出,会返回 `source_audio_url` - `GET /jobs/{id}` — 当前状态 + 产物;若原始音轨已拆出,会返回 `source_audio_url`
- `POST /jobs/{id}/transcribe` — 触发音频提取 + ASR + 翻译 + SKG 英文产品介绍文案;文案长度按原音频时长估算,配置 MiniMax 后从英文随机音色池生成配音。前端 Audio 节点提供“提取音频 / 重新提取音频”按钮,可与抽帧并行,不自动触发 - `POST /jobs/{id}/transcribe` — 触发音频提取 + ASR + 翻译 + SKG 英文产品介绍文案;文案长度按原音频时长估算,配置 Azure OpenAI TTS 后从 Azure 音色池生成配音。前端 Audio 节点提供“提取音频 / 重新提取音频”按钮,可与抽帧并行,不自动触发
- `GET /jobs/{id}/video.mp4` — 原视频 - `GET /jobs/{id}/video.mp4` — 原视频
- `GET /jobs/{id}/audio.wav` — 拆轨后的原始音频,供前端底部音频条生成波形 - `GET /jobs/{id}/audio.wav` — 拆轨后的原始音频,供前端底部音频条生成波形
- `GET /jobs/{id}/audio-script.mp3` — 英文改写文案的 MiniMax 配音 - `GET /jobs/{id}/audio-script.mp3` — 英文改写文案的 Azure OpenAI TTS 配音
- `GET /jobs/{id}/frames/{i}.jpg` — 第 i 张关键帧0-9 - `GET /jobs/{id}/frames/{i}.jpg` — 第 i 张关键帧0-9
## Mock 模式 ## Mock 模式
未设 `LLM_API_KEY` 时,转录走本地 mock便于 UI 联调;未设 `MINIMAX_API_KEY` 时只生成改写文案,不生成配音文件。 未设 `LLM_API_KEY` 时,转录走本地 mock便于 UI 联调;未设 `AZURE_OPENAI_API_KEY` 且无法复用 `LLM_API_KEY` 时只生成改写文案,不生成配音文件。
## 依赖 ## 依赖
- `ffmpeg` 系统二进制(拆轨 / 抽帧) - `ffmpeg` 系统二进制(拆轨 / 抽帧)
- `yt-dlp` 系统二进制(也可走 Python 包) - `yt-dlp` 系统二进制(也可走 Python 包)
- SQLite 元数据数据库(默认 `APP_DB_URL=sqlite:///./jobs/app.db`);只存 document / job / media asset 元数据,原视频、音频、抽帧和生成文件继续放 `jobs/<jobId>/`
- OpenAI 兼容 LLM 网关ASR / 翻译 / 文案改写);如果 `/audio/transcriptions` 不可用,会用 `ASR_FALLBACK_MODEL` 走 Gemini 多模态音频识别 - OpenAI 兼容 LLM 网关ASR / 翻译 / 文案改写);如果 `/audio/transcriptions` 不可用,会用 `ASR_FALLBACK_MODEL` 走 Gemini 多模态音频识别
- MiniMax T2A HTTP(英文产品介绍文案配音,使用 `MINIMAX_API_KEY`;默认随机音色池 `English_magnetic_voiced_man,English_Upbeat_Woman,English_MaturePartner` - Azure OpenAI TTS(英文产品介绍文案配音,使用 `AZURE_OPENAI_API_KEY` 或回退复用 `LLM_API_KEY`;默认音色池 `alloy,verse,shimmer`

536
api/database.py Normal file
View File

@@ -0,0 +1,536 @@
from __future__ import annotations
import json
import os
import sqlite3
import time
from pathlib import Path
from typing import Any
SCHEMA_VERSION = 1
def default_database_url(jobs_dir: Path) -> str:
return os.getenv("APP_DB_URL") or os.getenv("DATABASE_URL") or f"sqlite:///{jobs_dir / 'app.db'}"
def redact_database_url(url: str) -> str:
if "://" not in url or "@" not in url:
return url
scheme, rest = url.split("://", 1)
_, host = rest.rsplit("@", 1)
return f"{scheme}://***@{host}"
def infer_source_kind(url: str) -> str:
if url.startswith("upload://"):
return "upload"
if url.startswith("http://") or url.startswith("https://"):
return "tiktok_link"
return "unknown"
def default_workflow_mode(source_kind: str) -> str:
if source_kind == "upload":
return "uploaded_reference"
return "feed_recreation"
def document_title(url: str, source_kind: str, fallback: str) -> str:
if source_kind == "upload":
return url.replace("upload://", "", 1).strip() or fallback
if url:
return url.strip()[:120]
return fallback
def storage_prefix(document_id: str, source_kind: str, workflow_mode: str) -> str:
source = source_kind or "unknown"
mode = workflow_mode or default_workflow_mode(source)
return f"{mode}/{source}/{document_id}"
class AppDatabase:
def __init__(self, url: str, jobs_dir: Path):
self.url = url
self.jobs_dir = jobs_dir
self.path = self._sqlite_path(url)
self.enabled = True
self.error = ""
@staticmethod
def _sqlite_path(url: str) -> Path:
if url == ":memory:":
return Path(":memory:")
if not url.startswith("sqlite:///"):
raise RuntimeError("当前内置数据库层只支持 sqlite:/// URLPostgres 迁移会复用同一张表语义。")
raw = url[len("sqlite:///"):]
return Path(raw).expanduser().resolve()
def connect(self) -> sqlite3.Connection:
if str(self.path) != ":memory:":
self.path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(str(self.path))
conn.row_factory = sqlite3.Row
conn.execute("PRAGMA foreign_keys = ON")
return conn
def init(self) -> None:
with self.connect() as conn:
conn.executescript(
"""
CREATE TABLE IF NOT EXISTS schema_meta (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS documents (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
source_kind TEXT NOT NULL,
workflow_mode TEXT NOT NULL,
source_url TEXT NOT NULL DEFAULT '',
primary_job_id TEXT NOT NULL DEFAULT '',
status TEXT NOT NULL DEFAULT 'created',
storage_prefix TEXT NOT NULL,
metadata_json TEXT NOT NULL DEFAULT '{}',
created_at REAL NOT NULL,
updated_at REAL NOT NULL
);
CREATE TABLE IF NOT EXISTS jobs (
id TEXT PRIMARY KEY,
document_id TEXT NOT NULL,
source_kind TEXT NOT NULL,
workflow_mode TEXT NOT NULL,
source_url TEXT NOT NULL DEFAULT '',
status TEXT NOT NULL,
progress INTEGER NOT NULL DEFAULT 0,
message TEXT NOT NULL DEFAULT '',
storage_path TEXT NOT NULL,
state_path TEXT NOT NULL,
video_url TEXT NOT NULL DEFAULT '',
duration REAL NOT NULL DEFAULT 0,
width INTEGER NOT NULL DEFAULT 0,
height INTEGER NOT NULL DEFAULT 0,
frame_count INTEGER NOT NULL DEFAULT 0,
video_count INTEGER NOT NULL DEFAULT 0,
error TEXT NOT NULL DEFAULT '',
metadata_json TEXT NOT NULL DEFAULT '{}',
created_at REAL NOT NULL,
updated_at REAL NOT NULL,
FOREIGN KEY(document_id) REFERENCES documents(id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS media_assets (
id TEXT PRIMARY KEY,
document_id TEXT NOT NULL,
job_id TEXT NOT NULL,
kind TEXT NOT NULL,
role TEXT NOT NULL,
path TEXT NOT NULL DEFAULT '',
url TEXT NOT NULL DEFAULT '',
frame_index INTEGER,
timestamp REAL,
width INTEGER NOT NULL DEFAULT 0,
height INTEGER NOT NULL DEFAULT 0,
duration REAL NOT NULL DEFAULT 0,
metadata_json TEXT NOT NULL DEFAULT '{}',
created_at REAL NOT NULL,
updated_at REAL NOT NULL,
FOREIGN KEY(document_id) REFERENCES documents(id) ON DELETE CASCADE,
FOREIGN KEY(job_id) REFERENCES jobs(id) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_documents_updated_at ON documents(updated_at DESC);
CREATE INDEX IF NOT EXISTS idx_documents_source_kind ON documents(source_kind);
CREATE INDEX IF NOT EXISTS idx_documents_workflow_mode ON documents(workflow_mode);
CREATE INDEX IF NOT EXISTS idx_jobs_document_id ON jobs(document_id);
CREATE INDEX IF NOT EXISTS idx_jobs_updated_at ON jobs(updated_at DESC);
CREATE INDEX IF NOT EXISTS idx_assets_document_id ON media_assets(document_id);
CREATE INDEX IF NOT EXISTS idx_assets_job_id ON media_assets(job_id);
CREATE INDEX IF NOT EXISTS idx_assets_role ON media_assets(role);
"""
)
conn.execute(
"INSERT OR REPLACE INTO schema_meta(key, value) VALUES('schema_version', ?)",
(str(SCHEMA_VERSION),),
)
def normalize_job_document(self, job: dict[str, Any]) -> dict[str, Any]:
job_id = str(job.get("id") or "")
source_url = str(job.get("url") or "")
source_kind = str(job.get("source_kind") or "") or infer_source_kind(source_url)
workflow_mode = str(job.get("workflow_mode") or "") or default_workflow_mode(source_kind)
document_id = str(job.get("document_id") or "") or job_id
prefix = str(job.get("storage_prefix") or "") or storage_prefix(document_id, source_kind, workflow_mode)
return {
"document_id": document_id,
"source_kind": source_kind,
"workflow_mode": workflow_mode,
"storage_prefix": prefix,
"title": document_title(source_url, source_kind, document_id),
}
def sync_job(self, job: dict[str, Any], job_path: Path) -> None:
if not self.enabled:
return
now = time.time()
job_id = str(job.get("id") or "")
if not job_id:
return
doc = self.normalize_job_document(job)
state_path = job_path / "state.json"
frames = list(job.get("frames") or [])
generated_videos = list(job.get("generated_videos") or [])
metadata = {
"audio_segment_count": len(job.get("transcript") or []),
"product_ref_count": len(job.get("product_refs") or []),
"storyboard_image_count": len(job.get("storyboard_images") or []),
}
with self.connect() as conn:
existing = conn.execute(
"SELECT created_at FROM documents WHERE id = ?",
(doc["document_id"],),
).fetchone()
created_at = float(existing["created_at"]) if existing else now
conn.execute(
"""
INSERT INTO documents(
id, title, source_kind, workflow_mode, source_url, primary_job_id,
status, storage_prefix, metadata_json, created_at, updated_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(id) DO UPDATE SET
title = excluded.title,
source_kind = excluded.source_kind,
workflow_mode = excluded.workflow_mode,
source_url = excluded.source_url,
primary_job_id = excluded.primary_job_id,
status = excluded.status,
storage_prefix = excluded.storage_prefix,
metadata_json = excluded.metadata_json,
updated_at = excluded.updated_at
""",
(
doc["document_id"],
doc["title"],
doc["source_kind"],
doc["workflow_mode"],
str(job.get("url") or ""),
job_id,
str(job.get("status") or "created"),
doc["storage_prefix"],
json.dumps(metadata, ensure_ascii=False),
created_at,
now,
),
)
existing_job = conn.execute("SELECT created_at FROM jobs WHERE id = ?", (job_id,)).fetchone()
job_created_at = float(existing_job["created_at"]) if existing_job else now
conn.execute(
"""
INSERT INTO jobs(
id, document_id, source_kind, workflow_mode, source_url, status,
progress, message, storage_path, state_path, video_url, duration,
width, height, frame_count, video_count, error, metadata_json,
created_at, updated_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(id) DO UPDATE SET
document_id = excluded.document_id,
source_kind = excluded.source_kind,
workflow_mode = excluded.workflow_mode,
source_url = excluded.source_url,
status = excluded.status,
progress = excluded.progress,
message = excluded.message,
storage_path = excluded.storage_path,
state_path = excluded.state_path,
video_url = excluded.video_url,
duration = excluded.duration,
width = excluded.width,
height = excluded.height,
frame_count = excluded.frame_count,
video_count = excluded.video_count,
error = excluded.error,
metadata_json = excluded.metadata_json,
updated_at = excluded.updated_at
""",
(
job_id,
doc["document_id"],
doc["source_kind"],
doc["workflow_mode"],
str(job.get("url") or ""),
str(job.get("status") or "created"),
int(job.get("progress") or 0),
str(job.get("message") or ""),
str(job_path),
str(state_path),
str(job.get("video_url") or ""),
float(job.get("duration") or 0),
int(job.get("width") or 0),
int(job.get("height") or 0),
len(frames),
len(generated_videos),
str(job.get("error") or ""),
json.dumps(metadata, ensure_ascii=False),
job_created_at,
now,
),
)
conn.execute("DELETE FROM media_assets WHERE job_id = ?", (job_id,))
for asset in self._job_assets(job, job_path, doc["document_id"]):
conn.execute(
"""
INSERT INTO media_assets(
id, document_id, job_id, kind, role, path, url, frame_index,
timestamp, width, height, duration, metadata_json, created_at, updated_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
asset["id"],
asset["document_id"],
asset["job_id"],
asset["kind"],
asset["role"],
asset.get("path", ""),
asset.get("url", ""),
asset.get("frame_index"),
asset.get("timestamp"),
int(asset.get("width") or 0),
int(asset.get("height") or 0),
float(asset.get("duration") or 0),
json.dumps(asset.get("metadata") or {}, ensure_ascii=False),
now,
now,
),
)
def _job_assets(self, job: dict[str, Any], job_path: Path, document_id: str) -> list[dict[str, Any]]:
job_id = str(job.get("id") or "")
items: list[dict[str, Any]] = []
def add(
asset_id: str,
kind: str,
role: str,
path: Path | str = "",
url: str = "",
frame_index: int | None = None,
timestamp: float | None = None,
width: int = 0,
height: int = 0,
duration: float = 0.0,
metadata: dict[str, Any] | None = None,
) -> None:
items.append({
"id": asset_id,
"document_id": document_id,
"job_id": job_id,
"kind": kind,
"role": role,
"path": str(path) if path else "",
"url": url,
"frame_index": frame_index,
"timestamp": timestamp,
"width": width,
"height": height,
"duration": duration,
"metadata": metadata or {},
})
if (job_path / "source.mp4").exists() or job.get("video_url"):
add(
f"{job_id}:source_video",
"video",
"source_video",
job_path / "source.mp4",
str(job.get("video_url") or f"/jobs/{job_id}/video.mp4"),
duration=float(job.get("duration") or 0),
width=int(job.get("width") or 0),
height=int(job.get("height") or 0),
)
if (job_path / "audio.wav").exists() or job.get("source_audio_url"):
add(
f"{job_id}:source_audio",
"audio",
"source_audio",
job_path / "audio.wav",
str(job.get("source_audio_url") or f"/jobs/{job_id}/audio.wav"),
duration=float(job.get("duration") or 0),
)
for frame in job.get("frames") or []:
idx = int(frame.get("index") or 0)
add(
f"{job_id}:frame:{idx}",
"image",
"keyframe",
job_path / "frames" / f"{idx:03d}.jpg",
str(frame.get("url") or f"/jobs/{job_id}/frames/{idx}.jpg"),
frame_index=idx,
timestamp=float(frame.get("timestamp") or 0),
metadata={"quality_report": frame.get("quality_report")},
)
if frame.get("cleaned_url"):
add(
f"{job_id}:frame:{idx}:cleaned",
"image",
"cleaned_keyframe",
job_path / "cleaned" / f"{idx:03d}.jpg",
str(frame.get("cleaned_url")),
frame_index=idx,
timestamp=float(frame.get("timestamp") or 0),
)
for generated in frame.get("generated_images") or []:
gen_id = str(generated.get("id") or "")
if gen_id:
add(
f"{job_id}:generated_image:{idx}:{gen_id}",
"image",
"generated_image",
job_path / "gen" / f"{idx:03d}_{gen_id}.jpg",
str(generated.get("url") or ""),
frame_index=idx,
metadata={"model": generated.get("model"), "mode": generated.get("mode")},
)
for scene_asset in frame.get("scene_assets") or []:
asset_id = str(scene_asset.get("id") or "")
if asset_id:
add(
f"{job_id}:scene_asset:{asset_id}",
"image",
str(scene_asset.get("asset_role") or "scene_asset"),
job_path / "assets" / f"{asset_id}.jpg",
str(scene_asset.get("url") or ""),
frame_index=idx,
width=int(scene_asset.get("width") or 0),
height=int(scene_asset.get("height") or 0),
metadata={"label": scene_asset.get("label"), "scene_mode": scene_asset.get("scene_mode")},
)
for element in frame.get("elements") or []:
element_id = str(element.get("id") or "")
cutout_ids = list(element.get("cutouts") or [])
legacy_cutout = element.get("cutout_id")
if legacy_cutout and legacy_cutout not in cutout_ids:
cutout_ids.append(legacy_cutout)
for cutout_id in cutout_ids:
add(
f"{job_id}:cutout:{idx}:{element_id}:{cutout_id}",
"image",
"element_cutout",
job_path / "elements" / f"{idx:03d}_{element_id}_{cutout_id}.jpg",
f"/jobs/{job_id}/frames/{idx}/elements/{element_id}/cutouts/{cutout_id}.jpg",
frame_index=idx,
metadata={"element_id": element_id, "name_zh": element.get("name_zh")},
)
for subject_asset in element.get("subject_assets") or []:
asset_id = str(subject_asset.get("id") or "")
if asset_id:
add(
f"{job_id}:subject_asset:{asset_id}",
"image",
"subject_asset",
job_path / "assets" / f"{asset_id}.jpg",
str(subject_asset.get("url") or ""),
frame_index=idx,
width=int(subject_asset.get("width") or 0),
height=int(subject_asset.get("height") or 0),
metadata={"view": subject_asset.get("view"), "label": subject_asset.get("label")},
)
for ref in job.get("product_refs") or []:
asset_id = str(ref.get("id") or ref.get("asset_id") or ref.get("url") or "")
if asset_id:
add(
f"{job_id}:product_ref:{asset_id}",
"image",
"product_ref",
self._path_from_job_url(job_path, job_id, str(ref.get("url") or "")),
str(ref.get("url") or ""),
metadata=ref,
)
for video in job.get("generated_videos") or []:
video_id = str(video.get("id") or "")
if video_id:
add(
f"{job_id}:generated_video:{video_id}",
"video",
"generated_video",
job_path / "videos" / f"{video_id}.mp4",
str(video.get("url") or ""),
frame_index=video.get("frame_idx"),
duration=float(video.get("duration") or 0),
metadata={"status": video.get("status"), "model": video.get("model"), "error": video.get("error")},
)
return items
def _path_from_job_url(self, job_path: Path, job_id: str, url: str) -> str:
prefix = f"/jobs/{job_id}/"
if not url.startswith(prefix):
return ""
tail = url[len(prefix):]
if tail == "video.mp4":
return str(job_path / "source.mp4")
return str(job_path / tail)
def delete_job(self, job_id: str) -> None:
if not self.enabled:
return
with self.connect() as conn:
row = conn.execute("SELECT document_id FROM jobs WHERE id = ?", (job_id,)).fetchone()
conn.execute("DELETE FROM jobs WHERE id = ?", (job_id,))
if row:
remaining = conn.execute(
"SELECT COUNT(*) AS c FROM jobs WHERE document_id = ?",
(row["document_id"],),
).fetchone()
if int(remaining["c"] or 0) == 0:
conn.execute("DELETE FROM documents WHERE id = ?", (row["document_id"],))
def list_documents(self, limit: int | None = None) -> list[dict[str, Any]]:
sql = """
SELECT
d.*,
COUNT(DISTINCT j.id) AS job_count,
COUNT(DISTINCT a.id) AS asset_count
FROM documents d
LEFT JOIN jobs j ON j.document_id = d.id
LEFT JOIN media_assets a ON a.document_id = d.id
GROUP BY d.id
ORDER BY d.updated_at DESC
"""
params: tuple[Any, ...] = ()
if limit is not None and limit > 0:
sql += " LIMIT ?"
params = (limit,)
with self.connect() as conn:
rows = conn.execute(sql, params).fetchall()
return [dict(row) for row in rows]
def health(self) -> dict[str, Any]:
if not self.enabled:
return {"enabled": False, "url": redact_database_url(self.url), "error": self.error}
try:
with self.connect() as conn:
docs = conn.execute("SELECT COUNT(*) AS c FROM documents").fetchone()["c"]
jobs = conn.execute("SELECT COUNT(*) AS c FROM jobs").fetchone()["c"]
assets = conn.execute("SELECT COUNT(*) AS c FROM media_assets").fetchone()["c"]
return {
"enabled": True,
"url": redact_database_url(self.url),
"schema_version": SCHEMA_VERSION,
"documents": int(docs or 0),
"jobs": int(jobs or 0),
"assets": int(assets or 0),
}
except Exception as e:
return {"enabled": False, "url": redact_database_url(self.url), "error": str(e)}
def create_database(url: str, jobs_dir: Path) -> AppDatabase:
db = AppDatabase(url, jobs_dir)
db.init()
return db

View File

@@ -25,10 +25,19 @@ from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import FileResponse from fastapi.responses import FileResponse
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from database import create_database, default_database_url, default_workflow_mode, infer_source_kind, storage_prefix
load_dotenv() load_dotenv()
JOBS_DIR = Path(os.getenv("JOBS_DIR", "./jobs")).resolve() JOBS_DIR = Path(os.getenv("JOBS_DIR", "./jobs")).resolve()
JOBS_DIR.mkdir(parents=True, exist_ok=True) JOBS_DIR.mkdir(parents=True, exist_ok=True)
DATABASE_URL = default_database_url(JOBS_DIR)
DB_INIT_ERROR = ""
try:
DB = create_database(DATABASE_URL, JOBS_DIR)
except Exception as e:
DB = None
DB_INIT_ERROR = str(e)
CORS_ORIGINS = [o.strip() for o in os.getenv("CORS_ORIGINS", "http://localhost:4290,http://127.0.0.1:4290").split(",") if o.strip()] CORS_ORIGINS = [o.strip() for o in os.getenv("CORS_ORIGINS", "http://localhost:4290,http://127.0.0.1:4290").split(",") if o.strip()]
PRODUCT_LIBRARY_DIR = Path( PRODUCT_LIBRARY_DIR = Path(
os.getenv("PRODUCT_LIBRARY_DIR", Path(__file__).resolve().parent / "product_library" / "skg-products") os.getenv("PRODUCT_LIBRARY_DIR", Path(__file__).resolve().parent / "product_library" / "skg-products")
@@ -48,8 +57,18 @@ LOCAL_ASR_BIN = os.getenv("LOCAL_ASR_BIN", "").strip()
LOCAL_ASR_MODEL = os.getenv("LOCAL_ASR_MODEL", "mlx-community/whisper-tiny").strip() or "mlx-community/whisper-tiny" LOCAL_ASR_MODEL = os.getenv("LOCAL_ASR_MODEL", "mlx-community/whisper-tiny").strip() or "mlx-community/whisper-tiny"
LOCAL_ASR_TIMEOUT_SECONDS = max(30, int(os.getenv("LOCAL_ASR_TIMEOUT_SECONDS", "180"))) LOCAL_ASR_TIMEOUT_SECONDS = max(30, int(os.getenv("LOCAL_ASR_TIMEOUT_SECONDS", "180")))
TRANSLATE_MODEL = os.getenv("TRANSLATE_MODEL", "gemini-2.5-flash") TRANSLATE_MODEL = os.getenv("TRANSLATE_MODEL", "gemini-2.5-flash")
REWRITE_MODEL = os.getenv("REWRITE_MODEL", "gemini-2.5-pro") DEFAULT_GPT_TEXT_MODEL = os.getenv("GPT_TEXT_MODEL", "gpt-4o").strip() or "gpt-4o"
VISION_MODEL = os.getenv("VISION_MODEL", "gemini-2.5-flash")
def gpt_model_env(name: str, default: str | None = None) -> str:
value = os.getenv(name, default or DEFAULT_GPT_TEXT_MODEL).strip()
if not value or value.lower().startswith("gemini-"):
return default or DEFAULT_GPT_TEXT_MODEL
return value
REWRITE_MODEL = gpt_model_env("REWRITE_MODEL")
VISION_MODEL = gpt_model_env("VISION_MODEL")
IMAGE_BASE_URL = os.getenv("IMAGE_BASE_URL", LLM_BASE_URL).strip() IMAGE_BASE_URL = os.getenv("IMAGE_BASE_URL", LLM_BASE_URL).strip()
IMAGE_API_KEY = os.getenv("IMAGE_API_KEY", LLM_API_KEY).strip() IMAGE_API_KEY = os.getenv("IMAGE_API_KEY", LLM_API_KEY).strip()
AI_HTTP_PROXY = ( AI_HTTP_PROXY = (
@@ -73,29 +92,14 @@ PRODUCT_ASSET_MIN_LONG_SIDE = max(512, int(os.getenv("PRODUCT_ASSET_MIN_LONG_SID
PRODUCT_ASSET_MIN_SHORT_SIDE = max(320, int(os.getenv("PRODUCT_ASSET_MIN_SHORT_SIDE", "600"))) PRODUCT_ASSET_MIN_SHORT_SIDE = max(320, int(os.getenv("PRODUCT_ASSET_MIN_SHORT_SIDE", "600")))
PRODUCT_ASSET_JPEG_QUALITY = max(80, min(95, int(os.getenv("PRODUCT_ASSET_JPEG_QUALITY", "92")))) PRODUCT_ASSET_JPEG_QUALITY = max(80, min(95, int(os.getenv("PRODUCT_ASSET_JPEG_QUALITY", "92"))))
VIDEO_MODEL = os.getenv("VIDEO_MODEL", "seedance").strip() or "seedance" VIDEO_MODEL = os.getenv("VIDEO_MODEL", "seedance").strip() or "seedance"
YTDLP_COOKIES_FILE = os.getenv("YTDLP_COOKIES_FILE", "").strip()
YTDLP_COOKIES_FROM_BROWSER = os.getenv("YTDLP_COOKIES_FROM_BROWSER", "").strip()
AUDIO_PRODUCT_BRIEF = os.getenv( AUDIO_PRODUCT_BRIEF = os.getenv(
"AUDIO_PRODUCT_BRIEF", "AUDIO_PRODUCT_BRIEF",
"SKG 智能按摩产品,主打日常肩颈、腰背、眼部、膝盖或足部放松;广告表达要高级、干净、可信,不做医疗疗效承诺。", "SKG 智能按摩产品,主打日常肩颈、腰背、眼部、膝盖或足部放松;广告表达要高级、干净、可信,不做医疗疗效承诺。",
).strip() ).strip()
AUDIO_REWRITE_MODEL = os.getenv("AUDIO_REWRITE_MODEL", REWRITE_MODEL).strip() or REWRITE_MODEL AUDIO_REWRITE_MODEL = gpt_model_env("AUDIO_REWRITE_MODEL", REWRITE_MODEL)
MINIMAX_API_KEY = os.getenv("MINIMAX_API_KEY", "").strip() VOICE_PROVIDER = "azure_openai"
MINIMAX_TTS_BASE_URL = os.getenv("MINIMAX_TTS_BASE_URL", "https://api.minimax.io").strip().rstrip("/")
MINIMAX_TTS_MODEL = os.getenv("MINIMAX_TTS_MODEL", "speech-2.8-turbo").strip() or "speech-2.8-turbo"
MINIMAX_TTS_VOICE_ID = os.getenv(
"MINIMAX_TTS_VOICE_ID",
"English_expressive_narrator",
).strip() or "English_expressive_narrator"
DEFAULT_MINIMAX_TTS_VOICE_POOL = [
"English_magnetic_voiced_man",
"English_Upbeat_Woman",
"English_MaturePartner",
]
MINIMAX_TTS_VOICE_POOL = [
v.strip()
for v in os.getenv("MINIMAX_TTS_VOICE_POOL", ",".join(DEFAULT_MINIMAX_TTS_VOICE_POOL)).split(",")
if v.strip()
]
VOICE_PROVIDER = os.getenv("VOICE_PROVIDER", "azure_openai").strip().lower() or "azure_openai"
AZURE_OPENAI_BASE_URL = os.getenv("AZURE_OPENAI_BASE_URL", "https://ai.skg.com/azure").strip().rstrip("/") AZURE_OPENAI_BASE_URL = os.getenv("AZURE_OPENAI_BASE_URL", "https://ai.skg.com/azure").strip().rstrip("/")
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY", LLM_API_KEY).strip() AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY", LLM_API_KEY).strip()
AZURE_TTS_MODEL = os.getenv("AZURE_TTS_MODEL", "gpt-4o-mini-tts").strip() or "gpt-4o-mini-tts" AZURE_TTS_MODEL = os.getenv("AZURE_TTS_MODEL", "gpt-4o-mini-tts").strip() or "gpt-4o-mini-tts"
@@ -107,6 +111,11 @@ AZURE_TTS_VOICE_POOL = [
if v.strip() if v.strip()
] ]
AZURE_TTS_PATH = os.getenv("AZURE_TTS_PATH", "/audio/speech").strip() or "/audio/speech" AZURE_TTS_PATH = os.getenv("AZURE_TTS_PATH", "/audio/speech").strip() or "/audio/speech"
AZURE_TTS_PATHS = [
p.strip()
for p in os.getenv("AZURE_TTS_PATHS", f"{AZURE_TTS_PATH},/audio/speech,/v1/audio/speech").split(",")
if p.strip()
]
POE_API_BASE_URL = os.getenv("POE_API_BASE_URL", "https://api.poe.com/v1").strip() or "https://api.poe.com/v1" POE_API_BASE_URL = os.getenv("POE_API_BASE_URL", "https://api.poe.com/v1").strip() or "https://api.poe.com/v1"
POE_API_KEY = os.getenv("POE_API_KEY", "").strip() POE_API_KEY = os.getenv("POE_API_KEY", "").strip()
@@ -238,8 +247,8 @@ JobStatus = Literal[
"transcribing", "transcribed", "failed", "transcribing", "transcribed", "failed",
] ]
KEYFRAME_COUNT = int(os.getenv("KEYFRAME_COUNT", "12")) KEYFRAME_COUNT = int(os.getenv("KEYFRAME_COUNT", "6"))
FrameExtractTarget = Literal["transparent_human", "balanced", "subject", "transition", "expression", "motion"] FrameExtractTarget = Literal["random_subject", "transparent_human", "balanced", "subject", "transition", "expression", "motion"]
FrameExtractMode = Literal["replace", "append"] FrameExtractMode = Literal["replace", "append"]
FrameExtractQuality = Literal["auto", "fast", "accurate", "ultra"] FrameExtractQuality = Literal["auto", "fast", "accurate", "ultra"]
AnalyzeTask = tuple[str, int, FrameExtractTarget, FrameExtractMode, FrameExtractQuality] AnalyzeTask = tuple[str, int, FrameExtractTarget, FrameExtractMode, FrameExtractQuality]
@@ -252,6 +261,7 @@ SceneMode = Literal["remove_subject", "similar", "style"]
SceneStyle = Literal["source", "premium_product", "clean_studio", "warm_lifestyle", "cinematic"] SceneStyle = Literal["source", "premium_product", "clean_studio", "warm_lifestyle", "cinematic"]
SceneAssetRole = Literal["scene", "first_frame", "last_frame"] SceneAssetRole = Literal["scene", "first_frame", "last_frame"]
FRAME_TARGET_LABELS: dict[FrameExtractTarget, str] = { FRAME_TARGET_LABELS: dict[FrameExtractTarget, str] = {
"random_subject": "人物随机",
"transparent_human": "透明骨架人", "transparent_human": "透明骨架人",
"balanced": "综合关键帧", "balanced": "综合关键帧",
"subject": "清晰主体", "subject": "清晰主体",
@@ -541,6 +551,10 @@ class AudioScript(BaseModel):
class Job(BaseModel): class Job(BaseModel):
id: str id: str
url: str url: str
document_id: str = ""
source_kind: Literal["tiktok_link", "upload", "unknown"] = "unknown"
workflow_mode: Literal["feed_recreation", "uploaded_reference"] = "feed_recreation"
storage_prefix: str = ""
status: JobStatus = "created" status: JobStatus = "created"
progress: int = 0 progress: int = 0
message: str = "" message: str = ""
@@ -640,8 +654,26 @@ def job_with_artifacts(job: Job) -> Job:
return job.model_copy(update=updates) return job.model_copy(update=updates)
def ensure_job_document_fields(job: Job) -> Job:
source_kind = job.source_kind if job.source_kind != "unknown" else infer_source_kind(job.url)
workflow_mode = job.workflow_mode or default_workflow_mode(source_kind)
document_id = job.document_id or job.id
job.source_kind = source_kind if source_kind in {"tiktok_link", "upload"} else "unknown"
job.workflow_mode = workflow_mode if workflow_mode in {"feed_recreation", "uploaded_reference"} else "feed_recreation"
job.document_id = document_id
job.storage_prefix = job.storage_prefix or storage_prefix(document_id, job.source_kind, job.workflow_mode)
return job
def save_state(job: Job) -> None: def save_state(job: Job) -> None:
(job_dir(job.id) / "state.json").write_text(job.model_dump_json(indent=2)) ensure_job_document_fields(job)
d = job_dir(job.id)
(d / "state.json").write_text(job.model_dump_json(indent=2))
if DB:
try:
DB.sync_job(job.model_dump(mode="json"), d)
except Exception as e:
print(f"[database sync failed] job={job.id} error={e}", flush=True)
def update(job: Job, **kw) -> None: def update(job: Job, **kw) -> None:
@@ -884,6 +916,12 @@ async def lifespan(_: FastAPI):
message="服务重启 · 上次音频处理已中断,可重新处理", message="服务重启 · 上次音频处理已中断,可重新处理",
) )
JOBS[p.name] = job JOBS[p.name] = job
ensure_job_document_fields(job)
if DB:
try:
DB.sync_job(job.model_dump(mode="json"), p)
except Exception as e:
print(f"[database restore sync failed] job={job.id} error={e}", flush=True)
except Exception: except Exception:
pass pass
yield yield
@@ -995,6 +1033,35 @@ def run(cmd: list[str], cwd: Path | None = None) -> str:
return res.stdout return res.stdout
def ytdlp_cookie_args() -> list[str]:
if YTDLP_COOKIES_FILE:
cookies = Path(YTDLP_COOKIES_FILE).expanduser()
if not cookies.exists():
raise RuntimeError("TikTok cookies 文件不可用,请检查 YTDLP_COOKIES_FILE 配置。")
return ["--cookies", str(cookies)]
if YTDLP_COOKIES_FROM_BROWSER:
return ["--cookies-from-browser", YTDLP_COOKIES_FROM_BROWSER]
return []
def normalize_download_error(error: Exception) -> str:
raw = str(error)
lower = raw.lower()
auth_required = (
"log in for access" in lower
or "login" in lower and "cookies" in lower
or "cookies-from-browser" in lower
or "sign in" in lower and "tiktok" in lower
)
if auth_required:
return (
"TikTok 下载需要登录态。请上传视频文件,或在后端配置 "
"YTDLP_COOKIES_FILE / YTDLP_COOKIES_FROM_BROWSER 后重试。"
f"原始错误:{raw}"
)
return raw
# ---- 启发式选帧工具 ---- # ---- 启发式选帧工具 ----
import imagehash import imagehash
import numpy as np import numpy as np
@@ -1408,7 +1475,10 @@ def _target_score(item: dict, target: FrameExtractTarget) -> float:
scene = float(item.get("scene_score_n", 0.0)) scene = float(item.get("scene_score_n", 0.0))
motion = float(item.get("motion_n", 0.0)) motion = float(item.get("motion_n", 0.0))
if target == "transparent_human": if target == "random_subject":
# 人物定向随机抽帧先用中心主体/清晰度形成候选池,再在池内随机取样。
score = center * 0.52 + sharp * 0.24 + contrast * 0.14 + color * 0.10
elif target == "transparent_human":
# 当前抽帧阶段走本地算力:优先清晰中心主体、高对比、适度色彩和时间覆盖。 # 当前抽帧阶段走本地算力:优先清晰中心主体、高对比、适度色彩和时间覆盖。
# 透明骨架人的语义判断留给后续审核/识别,不在抽帧阶段逐帧调用 Vision。 # 透明骨架人的语义判断留给后续审核/识别,不在抽帧阶段逐帧调用 Vision。
score = center * 0.45 + sharp * 0.30 + contrast * 0.15 + color * 0.10 score = center * 0.45 + sharp * 0.30 + contrast * 0.15 + color * 0.10
@@ -1460,6 +1530,15 @@ def _select_keyframes(candidates: list[dict], n: int, target: FrameExtractTarget
elif it["score"] > dup["score"]: elif it["score"] > dup["score"]:
deduped[deduped.index(dup)] = it deduped[deduped.index(dup)] = it
if target == "random_subject":
# 人物定向随机:从清晰、中心主体更强的候选池里随机抽,不再按动作峰值排序。
ranked = sorted(deduped, key=lambda x: -float(x.get("score", 0.0)))
pool_size = min(len(ranked), max(n * 6, n + 8))
pool = ranked[:pool_size] if pool_size > 0 else ranked
selected = random.sample(pool, k=min(n, len(pool))) if len(pool) > n else list(pool)
selected.sort(key=lambda x: x["idx"])
return selected
# 时序分桶:把候选时间轴等分 n 段,每段取当前目标下最优的 # 时序分桶:把候选时间轴等分 n 段,每段取当前目标下最优的
total = len(candidates) total = len(candidates)
buckets: list[list[dict]] = [[] for _ in range(n)] buckets: list[list[dict]] = [[] for _ in range(n)]
@@ -1648,13 +1727,15 @@ def pipeline_download(job_id: str) -> None:
update(job, status="downloading", message="本地上传 · 跳过下载", progress=15) update(job, status="downloading", message="本地上传 · 跳过下载", progress=15)
else: else:
update(job, status="downloading", message="yt-dlp 下载中…", progress=5) update(job, status="downloading", message="yt-dlp 下载中…", progress=5)
run([ cmd = [
"yt-dlp", "-f", "best[ext=mp4]/best", "yt-dlp", "-f", "best[ext=mp4]/best",
"-o", str(mp4), "-o", str(mp4),
"--no-warnings", "--no-playlist", "--no-warnings", "--no-playlist",
"--retries", "3", "--retries", "3",
*ytdlp_cookie_args(),
job.url, job.url,
]) ]
run(cmd)
if not mp4.exists(): if not mp4.exists():
raise RuntimeError("下载完成但找不到 source.mp4") raise RuntimeError("下载完成但找不到 source.mp4")
@@ -1677,13 +1758,13 @@ def pipeline_download(job_id: str) -> None:
) )
except Exception as e: except Exception as e:
message = "视频元数据解析失败" if stage == "metadata" else "下载失败" message = "视频元数据解析失败" if stage == "metadata" else "下载失败"
update(job, status="failed", error=str(e), message=message) update(job, status="failed", error=normalize_download_error(e), message=message)
def pipeline_analyze( def pipeline_analyze(
job_id: str, job_id: str,
frame_count: int = KEYFRAME_COUNT, frame_count: int = KEYFRAME_COUNT,
target: FrameExtractTarget = "transparent_human", target: FrameExtractTarget = "random_subject",
mode: FrameExtractMode = "replace", mode: FrameExtractMode = "replace",
quality: FrameExtractQuality = "auto", quality: FrameExtractQuality = "auto",
) -> None: ) -> None:
@@ -1849,7 +1930,7 @@ def analyze_queue_worker() -> None:
ANALYZE_WORKER_RUNNING = False ANALYZE_WORKER_RUNNING = False
# ---------- 音频转写 + 翻译 + SKG 改写 + MiniMax 配音 ---------- # ---------- 音频转写 + 翻译 + SKG 改写 + Azure OpenAI 配音 ----------
class TranscriptionUnavailable(RuntimeError): class TranscriptionUnavailable(RuntimeError):
pass pass
@@ -2305,18 +2386,6 @@ def _rewrite_audio_script_sync(segments: list[TranscriptSegment], target_seconds
return fallback, f"改写失败,使用本地模板:{e}" return fallback, f"改写失败,使用本地模板:{e}"
def _minimax_tts_url() -> str:
if MINIMAX_TTS_BASE_URL.endswith("/v1/t2a_v2"):
return MINIMAX_TTS_BASE_URL
return f"{MINIMAX_TTS_BASE_URL}/v1/t2a_v2"
def _choose_minimax_voice_id() -> str:
if MINIMAX_TTS_VOICE_POOL:
return random.choice(MINIMAX_TTS_VOICE_POOL)
return MINIMAX_TTS_VOICE_ID
def _choose_azure_voice_id() -> str: def _choose_azure_voice_id() -> str:
if AZURE_TTS_VOICE_POOL: if AZURE_TTS_VOICE_POOL:
return random.choice(AZURE_TTS_VOICE_POOL) return random.choice(AZURE_TTS_VOICE_POOL)
@@ -2324,9 +2393,7 @@ def _choose_azure_voice_id() -> str:
def _choose_tts_voice_id() -> str: def _choose_tts_voice_id() -> str:
if VOICE_PROVIDER == "azure_openai":
return _choose_azure_voice_id() return _choose_azure_voice_id()
return _choose_minimax_voice_id()
def _voice_speed_for(voice_id: str, target_seconds: float, text: str) -> float: def _voice_speed_for(voice_id: str, target_seconds: float, text: str) -> float:
@@ -2343,60 +2410,22 @@ def _voice_speed_for(voice_id: str, target_seconds: float, text: str) -> float:
return 0.99 return 0.99
def _minimax_tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> str: def _azure_tts_url_for(path_value: str) -> str:
if not MINIMAX_API_KEY: path = path_value if path_value.startswith("/") else f"/{path_value}"
raise RuntimeError("MINIMAX_API_KEY 未配置,未生成配音")
if not text.strip():
raise RuntimeError("改写文案为空,未生成配音")
payload = {
"model": MINIMAX_TTS_MODEL,
"text": text.strip()[:9500],
"stream": False,
"language_boost": "English",
"output_format": "hex",
"voice_setting": {
"voice_id": voice_id,
"speed": _voice_speed_for(voice_id, target_seconds, text),
"vol": 1,
"pitch": 0,
},
"audio_setting": {
"sample_rate": 32000,
"bitrate": 128000,
"format": "mp3",
"channel": 1,
},
}
resp = httpx.post(
_minimax_tts_url(),
headers={"Authorization": f"Bearer {MINIMAX_API_KEY}", "Content-Type": "application/json"},
json=payload,
timeout=90,
)
resp.raise_for_status()
data = resp.json()
base_resp = data.get("base_resp") or {}
if int(base_resp.get("status_code", 0) or 0) != 0:
raise RuntimeError(base_resp.get("status_msg") or "MiniMax TTS 返回失败")
audio_hex = ((data.get("data") or {}).get("audio") or "").strip()
if not audio_hex:
raise RuntimeError("MiniMax TTS 未返回 audio hex")
try:
audio_bytes = bytes.fromhex(audio_hex)
except ValueError as e:
raise RuntimeError(f"MiniMax TTS audio hex 无法解析:{e}") from e
out = job_dir(job_id) / "audio_script.mp3"
out.write_bytes(audio_bytes)
return f"/jobs/{job_id}/audio-script.mp3"
def _azure_tts_url() -> str:
path = AZURE_TTS_PATH if AZURE_TTS_PATH.startswith("/") else f"/{AZURE_TTS_PATH}"
if AZURE_OPENAI_BASE_URL.endswith(path): if AZURE_OPENAI_BASE_URL.endswith(path):
return AZURE_OPENAI_BASE_URL return AZURE_OPENAI_BASE_URL
return f"{AZURE_OPENAI_BASE_URL}{path}" return f"{AZURE_OPENAI_BASE_URL}{path}"
def _azure_tts_urls() -> list[str]:
urls: list[str] = []
for path in AZURE_TTS_PATHS or [AZURE_TTS_PATH]:
url = _azure_tts_url_for(path)
if url not in urls:
urls.append(url)
return urls
def _azure_openai_tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> str: def _azure_openai_tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> str:
if not AZURE_OPENAI_API_KEY: if not AZURE_OPENAI_API_KEY:
raise RuntimeError("AZURE_OPENAI_API_KEY 或 LLM_API_KEY 未配置,未生成配音") raise RuntimeError("AZURE_OPENAI_API_KEY 或 LLM_API_KEY 未配置,未生成配音")
@@ -2409,18 +2438,32 @@ def _azure_openai_tts_sync(job_id: str, text: str, voice_id: str, target_seconds
"response_format": "mp3", "response_format": "mp3",
"speed": _voice_speed_for(voice_id, target_seconds, text), "speed": _voice_speed_for(voice_id, target_seconds, text),
} }
resp = httpx.post(
_azure_tts_url(),
headers = { headers = {
"Authorization": f"Bearer {AZURE_OPENAI_API_KEY}", "Authorization": f"Bearer {AZURE_OPENAI_API_KEY}",
"api-key": AZURE_OPENAI_API_KEY, "api-key": AZURE_OPENAI_API_KEY,
"Content-Type": "application/json", "Content-Type": "application/json",
}, }
json=payload, resp: httpx.Response | None = None
timeout=120, errors: list[str] = []
) with ai_http_client(timeout=120) as client:
for url in _azure_tts_urls():
try:
current = client.post(url, headers=headers, json=payload)
except Exception as e:
errors.append(f"{url}: {type(e).__name__}: {e}")
continue
if current.status_code < 400:
resp = current
break
errors.append(f"{url}: HTTP {current.status_code}: {current.text[:180]}")
if current.status_code not in {404, 405}:
resp = current
break
if resp is None:
raise RuntimeError("Azure OpenAI TTS 不可用;已尝试 " + " | ".join(errors))
if resp.status_code >= 400: if resp.status_code >= 400:
raise RuntimeError(f"Azure OpenAI TTS HTTP {resp.status_code}: {resp.text[:300]}") detail = " | ".join(errors) or resp.text[:300]
raise RuntimeError(f"Azure OpenAI TTS HTTP {resp.status_code}: {detail[:600]}")
audio_bytes = resp.content audio_bytes = resp.content
if not audio_bytes: if not audio_bytes:
raise RuntimeError("Azure OpenAI TTS 未返回音频内容") raise RuntimeError("Azure OpenAI TTS 未返回音频内容")
@@ -2437,9 +2480,7 @@ def _azure_openai_tts_sync(job_id: str, text: str, voice_id: str, target_seconds
def _tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> tuple[str, str, str]: def _tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> tuple[str, str, str]:
if VOICE_PROVIDER == "azure_openai":
return _azure_openai_tts_sync(job_id, text, voice_id, target_seconds), "azure_openai", AZURE_TTS_MODEL return _azure_openai_tts_sync(job_id, text, voice_id, target_seconds), "azure_openai", AZURE_TTS_MODEL
return _minimax_tts_sync(job_id, text, voice_id, target_seconds), "minimax", MINIMAX_TTS_MODEL
def _build_audio_script_sync(job_id: str, segments: list[TranscriptSegment], target_seconds: float = 12.0) -> AudioScript: def _build_audio_script_sync(job_id: str, segments: list[TranscriptSegment], target_seconds: float = 12.0) -> AudioScript:
@@ -2451,8 +2492,8 @@ def _build_audio_script_sync(job_id: str, segments: list[TranscriptSegment], tar
speaker_profile, rhythm_profile = _audio_delivery_profile(segments, duration, selected_voice_id) speaker_profile, rhythm_profile = _audio_delivery_profile(segments, duration, selected_voice_id)
voice_url = "" voice_url = ""
voice_error = "" voice_error = ""
voice_provider = "azure_openai" if VOICE_PROVIDER == "azure_openai" else "minimax" voice_provider = "azure_openai"
voice_model = AZURE_TTS_MODEL if voice_provider == "azure_openai" else MINIMAX_TTS_MODEL voice_model = AZURE_TTS_MODEL
try: try:
voice_url, voice_provider, voice_model = _tts_sync(job_id, rewritten, selected_voice_id, duration) voice_url, voice_provider, voice_model = _tts_sync(job_id, rewritten, selected_voice_id, duration)
except Exception as e: except Exception as e:
@@ -3050,7 +3091,8 @@ def health() -> dict:
"auth_configured": WEB_AUTH_CONFIGURED, "auth_configured": WEB_AUTH_CONFIGURED,
"base_url": LLM_BASE_URL or "openai-default", "base_url": LLM_BASE_URL or "openai-default",
"image_base_url": IMAGE_BASE_URL or LLM_BASE_URL or "openai-default", "image_base_url": IMAGE_BASE_URL or LLM_BASE_URL or "openai-default",
"voice_base_url": AZURE_OPENAI_BASE_URL if VOICE_PROVIDER == "azure_openai" else MINIMAX_TTS_BASE_URL, "voice_base_url": AZURE_OPENAI_BASE_URL,
"database": DB.health() if DB else {"enabled": False, "url": DATABASE_URL, "error": DB_INIT_ERROR},
"models": { "models": {
"asr": ASR_MODEL, "asr": ASR_MODEL,
"local_asr": LOCAL_ASR_MODEL, "local_asr": LOCAL_ASR_MODEL,
@@ -3067,15 +3109,12 @@ def health() -> dict:
"subject_image": SUBJECT_ASSET_IMAGE_MODEL, "subject_image": SUBJECT_ASSET_IMAGE_MODEL,
"subject_image_fallbacks": SUBJECT_ASSET_IMAGE_MODELS, "subject_image_fallbacks": SUBJECT_ASSET_IMAGE_MODELS,
"voice_provider": VOICE_PROVIDER, "voice_provider": VOICE_PROVIDER,
"voice_base_url": AZURE_OPENAI_BASE_URL if VOICE_PROVIDER == "azure_openai" else MINIMAX_TTS_BASE_URL, "voice_base_url": AZURE_OPENAI_BASE_URL,
"voice_tts": AZURE_TTS_MODEL if VOICE_PROVIDER == "azure_openai" else MINIMAX_TTS_MODEL, "voice_tts": AZURE_TTS_MODEL,
"voice_id": AZURE_TTS_VOICE_ID if VOICE_PROVIDER == "azure_openai" else MINIMAX_TTS_VOICE_ID, "voice_tts_paths": AZURE_TTS_PATHS,
"voice_pool": AZURE_TTS_VOICE_POOL if VOICE_PROVIDER == "azure_openai" else (MINIMAX_TTS_VOICE_POOL or [MINIMAX_TTS_VOICE_ID]), "voice_id": AZURE_TTS_VOICE_ID,
"voice_configured": bool(AZURE_OPENAI_API_KEY) if VOICE_PROVIDER == "azure_openai" else bool(MINIMAX_API_KEY), "voice_pool": AZURE_TTS_VOICE_POOL,
"minimax_tts": MINIMAX_TTS_MODEL, "voice_configured": bool(AZURE_OPENAI_API_KEY),
"minimax_voice": MINIMAX_TTS_VOICE_ID,
"minimax_voice_pool": MINIMAX_TTS_VOICE_POOL or [MINIMAX_TTS_VOICE_ID],
"minimax_configured": bool(MINIMAX_API_KEY),
"video": VIDEO_MODEL, "video": VIDEO_MODEL,
"video_aliases": VIDEO_MODEL_ALIASES, "video_aliases": VIDEO_MODEL_ALIASES,
"video_provider": video_provider_name(), "video_provider": video_provider_name(),
@@ -3088,6 +3127,9 @@ def health() -> dict:
class JobSummary(BaseModel): class JobSummary(BaseModel):
id: str id: str
document_id: str = ""
source_kind: str = "unknown"
workflow_mode: str = "feed_recreation"
url: str url: str
status: JobStatus status: JobStatus
progress: int = 0 progress: int = 0
@@ -3103,6 +3145,29 @@ class JobSummary(BaseModel):
mtime: float = 0.0 mtime: float = 0.0
class DocumentSummary(BaseModel):
id: str
title: str
source_kind: str
workflow_mode: str
source_url: str = ""
primary_job_id: str = ""
status: str = "created"
storage_prefix: str = ""
job_count: int = 0
asset_count: int = 0
created_at: float = 0.0
updated_at: float = 0.0
@app.get("/documents", response_model=list[DocumentSummary])
def list_documents(limit: int | None = None) -> list[DocumentSummary]:
if not DB:
return []
rows = DB.list_documents(limit)
return [DocumentSummary(**row) for row in rows]
@app.get("/jobs", response_model=list[JobSummary]) @app.get("/jobs", response_model=list[JobSummary])
def list_jobs(limit: int | None = None) -> list[JobSummary]: def list_jobs(limit: int | None = None) -> list[JobSummary]:
"""所有 job 的精简列表,按磁盘 state.json mtime 倒序(最新优先)。前端无 ?job= 时用它回填历史。""" """所有 job 的精简列表,按磁盘 state.json mtime 倒序(最新优先)。前端无 ?job= 时用它回填历史。"""
@@ -3111,8 +3176,12 @@ def list_jobs(limit: int | None = None) -> list[JobSummary]:
state_path = JOBS_DIR / job_id / "state.json" state_path = JOBS_DIR / job_id / "state.json"
mtime = state_path.stat().st_mtime if state_path.exists() else 0.0 mtime = state_path.stat().st_mtime if state_path.exists() else 0.0
thumb = f"/jobs/{job_id}/frames/{job.frames[0].index}.jpg" if job.frames else "" thumb = f"/jobs/{job_id}/frames/{job.frames[0].index}.jpg" if job.frames else ""
ensure_job_document_fields(job)
items.append(JobSummary( items.append(JobSummary(
id=job.id, id=job.id,
document_id=job.document_id,
source_kind=job.source_kind,
workflow_mode=job.workflow_mode,
url=job.url, url=job.url,
status=job.status, status=job.status,
progress=job.progress, progress=job.progress,
@@ -3138,13 +3207,38 @@ async def create_job(req: CreateJobReq, bg: BackgroundTasks) -> Job:
if not req.url.strip(): if not req.url.strip():
raise HTTPException(400, "url required") raise HTTPException(400, "url required")
job_id = uuid.uuid4().hex[:12] job_id = uuid.uuid4().hex[:12]
job = Job(id=job_id, url=req.url.strip()) job = Job(id=job_id, url=req.url.strip(), document_id=job_id, source_kind="tiktok_link", workflow_mode="feed_recreation")
JOBS[job_id] = job JOBS[job_id] = job
save_state(job) save_state(job)
bg.add_task(pipeline_download, job_id) bg.add_task(pipeline_download, job_id)
return job return job
@app.post("/jobs/{job_id}/download/retry", response_model=Job)
async def retry_job_download(job_id: str, bg: BackgroundTasks) -> Job:
job = JOBS.get(job_id)
if not job:
raise HTTPException(404, "job not found")
if job.source_kind == "upload" or job.url.startswith("upload://"):
raise HTTPException(409, "uploaded videos cannot be redownloaded; upload the file again")
if job.status in {"downloading", "splitting", "transcribing"}:
raise HTTPException(409, f"job is busy: {job.status}")
mp4 = job_dir(job_id) / "source.mp4"
if mp4.exists() and mp4.stat().st_size == 0:
mp4.unlink()
update(
job,
status="downloading",
progress=1,
error="",
message="重新提交下载…",
video_url="",
)
bg.add_task(pipeline_download, job_id)
return job
@app.post("/jobs/upload", response_model=Job) @app.post("/jobs/upload", response_model=Job)
async def create_job_from_upload(bg: BackgroundTasks, file: UploadFile = File(...)) -> Job: async def create_job_from_upload(bg: BackgroundTasks, file: UploadFile = File(...)) -> Job:
if not file.filename: if not file.filename:
@@ -3162,7 +3256,7 @@ async def create_job_from_upload(bg: BackgroundTasks, file: UploadFile = File(..
if not mp4.exists() or mp4.stat().st_size == 0: if not mp4.exists() or mp4.stat().st_size == 0:
raise HTTPException(500, "upload failed") raise HTTPException(500, "upload failed")
job = Job(id=job_id, url=f"upload://{file.filename}") job = Job(id=job_id, url=f"upload://{file.filename}", document_id=job_id, source_kind="upload", workflow_mode="uploaded_reference")
JOBS[job_id] = job JOBS[job_id] = job
save_state(job) save_state(job)
bg.add_task(pipeline_download, job_id) bg.add_task(pipeline_download, job_id)
@@ -3174,7 +3268,7 @@ async def trigger_analyze(
job_id: str, job_id: str,
bg: BackgroundTasks, bg: BackgroundTasks,
frames: int = KEYFRAME_COUNT, frames: int = KEYFRAME_COUNT,
target: FrameExtractTarget = "transparent_human", target: FrameExtractTarget = "random_subject",
mode: FrameExtractMode = "replace", mode: FrameExtractMode = "replace",
quality: FrameExtractQuality = "auto", quality: FrameExtractQuality = "auto",
) -> Job: ) -> Job:
@@ -3252,6 +3346,11 @@ def delete_job(job_id: str) -> dict[str, bool | str]:
job = JOBS.pop(job_id, None) job = JOBS.pop(job_id, None)
if not job and not d.exists(): if not job and not d.exists():
raise HTTPException(404, "job not found") raise HTTPException(404, "job not found")
if DB:
try:
DB.delete_job(job_id)
except Exception as e:
print(f"[database delete failed] job={job_id} error={e}", flush=True)
if d.exists(): if d.exists():
shutil.rmtree(d) shutil.rmtree(d)
return {"ok": True, "id": job_id} return {"ok": True, "id": job_id}

View File

@@ -3,7 +3,8 @@
# Runtime # Runtime
JOBS_DIR=/data/jobs JOBS_DIR=/data/jobs
KEYFRAME_COUNT=12 APP_DB_URL=sqlite:////data/jobs/app.db
KEYFRAME_COUNT=6
CORS_ORIGINS=https://marketing.skg.com CORS_ORIGINS=https://marketing.skg.com
API_PORT=4291 API_PORT=4291
@@ -22,7 +23,9 @@ LLM_API_KEY=
ASR_MODEL=whisper-1 ASR_MODEL=whisper-1
ASR_FALLBACK_MODEL=gemini-2.5-flash ASR_FALLBACK_MODEL=gemini-2.5-flash
TRANSLATE_MODEL=gemini-2.5-flash TRANSLATE_MODEL=gemini-2.5-flash
REWRITE_MODEL=gemini-2.5-pro GPT_TEXT_MODEL=gpt-4o
REWRITE_MODEL=gpt-4o
VISION_MODEL=gpt-4o
PRODUCT_VIEW_MODEL=gpt-image-2 PRODUCT_VIEW_MODEL=gpt-image-2
IMAGE_BASE_URL=https://ai.skg.com/ezlink/v1 IMAGE_BASE_URL=https://ai.skg.com/ezlink/v1
IMAGE_API_KEY= IMAGE_API_KEY=
@@ -33,9 +36,14 @@ SUBJECT_ASSET_IMAGE_MODELS=gpt-image-2
# Optional outbound proxy for AI gateway calls. Leave blank on normal VPS networking. # Optional outbound proxy for AI gateway calls. Leave blank on normal VPS networking.
AI_HTTP_PROXY= AI_HTTP_PROXY=
# Optional TikTok download login state for yt-dlp. Keep cookies files private.
YTDLP_COOKIES_FILE=
YTDLP_COOKIES_FROM_BROWSER=
# Audio rewrite and Azure OpenAI TTS # Audio rewrite and Azure OpenAI TTS
AUDIO_REWRITE_MODEL=gemini-2.5-pro AUDIO_REWRITE_MODEL=gpt-4o
AUDIO_PRODUCT_BRIEF="SKG smart massage products for daily neck, shoulder, back, eye, knee, and foot relaxation. Keep claims premium, clean, credible, and non-medical." AUDIO_PRODUCT_BRIEF="SKG smart massage products for daily neck, shoulder, back, eye, knee, and foot relaxation. Keep claims premium, clean, credible, and non-medical."
# Voice is fixed to Azure OpenAI in the backend.
VOICE_PROVIDER=azure_openai VOICE_PROVIDER=azure_openai
AZURE_OPENAI_BASE_URL=https://ai.skg.com/azure AZURE_OPENAI_BASE_URL=https://ai.skg.com/azure
AZURE_OPENAI_API_KEY= AZURE_OPENAI_API_KEY=
@@ -43,13 +51,7 @@ AZURE_TTS_MODEL=gpt-4o-mini-tts
AZURE_TTS_VOICE_ID=alloy AZURE_TTS_VOICE_ID=alloy
AZURE_TTS_VOICE_POOL=alloy,verse,shimmer AZURE_TTS_VOICE_POOL=alloy,verse,shimmer
AZURE_TTS_PATH=/audio/speech AZURE_TTS_PATH=/audio/speech
AZURE_TTS_PATHS=/audio/speech,/v1/audio/speech
# Legacy MiniMax TTS fallback; not the default voice provider.
MINIMAX_API_KEY=
MINIMAX_TTS_BASE_URL=https://api.minimax.io
MINIMAX_TTS_MODEL=speech-2.8-turbo
MINIMAX_TTS_VOICE_ID=English_expressive_narrator
MINIMAX_TTS_VOICE_POOL=English_magnetic_voiced_man,English_Upbeat_Woman,English_MaturePartner
# Video generation. Use SKG Doubao / Seedance gateway in production. # Video generation. Use SKG Doubao / Seedance gateway in production.
POE_API_BASE_URL=https://api.poe.com/v1 POE_API_BASE_URL=https://api.poe.com/v1

File diff suppressed because one or more lines are too long

View File

@@ -19,6 +19,7 @@ import { AdRecreationBoard } from "@/components/ad-recreation-board"
import { import {
addManualFrame, analyzeJob, createJob, getJob, listJobs, uploadJob, deleteJob, deleteFrame, deleteGeneratedImage, addManualFrame, analyzeJob, createJob, getJob, listJobs, uploadJob, deleteJob, deleteFrame, deleteGeneratedImage,
deleteGeneratedVideo, deleteCutout, generateStoryboardVideo, triggerTranscribe, describeFrame, updateStoryboard, copyProductLibraryAsset, deleteGeneratedVideo, deleteCutout, generateStoryboardVideo, triggerTranscribe, describeFrame, updateStoryboard, copyProductLibraryAsset,
formatJobError, retryJobDownload,
type Job, type ImageRef, type KeyFrame, type ProductFusionShot, type StoryboardScene, type FrameExtractMode, type FrameExtractQuality, type FrameExtractTarget, type Job, type ImageRef, type KeyFrame, type ProductFusionShot, type StoryboardScene, type FrameExtractMode, type FrameExtractQuality, type FrameExtractTarget,
} from "@/lib/api" } from "@/lib/api"
import { TRANSPARENT_HUMAN_NEGATIVE_PROMPT, TRANSPARENT_HUMAN_VIDEO_PROMPT } from "@/lib/workflow-target" import { TRANSPARENT_HUMAN_NEGATIVE_PROMPT, TRANSPARENT_HUMAN_VIDEO_PROMPT } from "@/lib/workflow-target"
@@ -40,6 +41,7 @@ const VIDEO_FRAME_PANEL_ID = "video-frame-panel"
const FLOATING_PANEL_IDS = new Set([KEYFRAME_PANEL_ID, VIDEO_FRAME_PANEL_ID]) const FLOATING_PANEL_IDS = new Set([KEYFRAME_PANEL_ID, VIDEO_FRAME_PANEL_ID])
const DIRECT_VIDEO_GENERATION_PAUSED = true const DIRECT_VIDEO_GENERATION_PAUSED = true
const FRAME_TARGET_LABELS: Record<FrameExtractTarget, string> = { const FRAME_TARGET_LABELS: Record<FrameExtractTarget, string> = {
random_subject: "人物随机",
transparent_human: "透明骨架人", transparent_human: "透明骨架人",
balanced: "综合关键帧", balanced: "综合关键帧",
subject: "清晰主体", subject: "清晰主体",
@@ -242,8 +244,8 @@ export default function Home() {
const handleAnalyzeJob = useCallback(async (jobId: string, options?: { mode?: FrameExtractMode }) => { const handleAnalyzeJob = useCallback(async (jobId: string, options?: { mode?: FrameExtractMode }) => {
const targetJob = jobs.find((item) => item.id === jobId) const targetJob = jobs.find((item) => item.id === jobId)
if (!targetJob) return if (!targetJob) return
const frameTarget = frameTargets[jobId] ?? "transparent_human" const frameTarget = frameTargets[jobId] ?? "random_subject"
const frameCount = frameCounts[jobId] ?? 12 const frameCount = frameCounts[jobId] ?? 6
const frameQuality = frameQualities[jobId] ?? "auto" const frameQuality = frameQualities[jobId] ?? "auto"
const mode = options?.mode ?? (targetJob.frames.length > 0 ? "append" : "replace") const mode = options?.mode ?? (targetJob.frames.length > 0 ? "append" : "replace")
setActiveJobId(jobId) setActiveJobId(jobId)
@@ -487,8 +489,8 @@ export default function Home() {
const visualRunning = target.status === "splitting" const visualRunning = target.status === "splitting"
if (!hasVisualResult && !visualRunning && !autoTriggeredRef.current.has(visualKey)) { if (!hasVisualResult && !visualRunning && !autoTriggeredRef.current.has(visualKey)) {
autoTriggeredRef.current.add(visualKey) autoTriggeredRef.current.add(visualKey)
const frameTarget = frameTargets[target.id] ?? "motion" const frameTarget = frameTargets[target.id] ?? "random_subject"
const frameCount = frameCounts[target.id] ?? 12 const frameCount = frameCounts[target.id] ?? 6
const frameQuality = frameQualities[target.id] ?? "accurate" const frameQuality = frameQualities[target.id] ?? "accurate"
try { try {
const updated = await analyzeJob(target.id, frameCount, frameTarget, "replace", frameQuality) const updated = await analyzeJob(target.id, frameCount, frameTarget, "replace", frameQuality)
@@ -572,15 +574,30 @@ export default function Home() {
const handleStartProduction = useCallback(async (inputUrl?: string) => { const handleStartProduction = useCallback(async (inputUrl?: string) => {
const trimmed = inputUrl?.trim() const trimmed = inputUrl?.trim()
const created = trimmed ? await handleSubmit(trimmed) : undefined const created = trimmed ? await handleSubmit(trimmed) : undefined
const target = created ?? job let target = created ?? job
if (!target) { if (!target) {
toast.info("先粘贴视频链接或选择一个素材任务") toast.info("先粘贴视频链接或选择一个素材任务")
return return
} }
if (!created && target.status === "failed") {
autoTriggeredRef.current.delete(`${target.id}:audio`)
autoTriggeredRef.current.delete(`${target.id}:visual`)
}
if (!created && target.status === "failed" && !target.video_url) {
try {
target = await retryJobDownload(target.id)
updateJobInList(target)
toast.info("已重新提交下载;下载完成后会自动跑音频文案路和视觉抽帧路")
} catch (e) {
toast.error("重新下载失败:" + (e instanceof Error ? e.message : String(e)))
return
}
}
setProductionJobIds((prev) => new Set(prev).add(target.id)) setProductionJobIds((prev) => new Set(prev).add(target.id))
toast.success("已进入并行素材分析:下载完成后自动跑音频文案路和视觉抽帧路") if (target.video_url) toast.success("已进入并行素材分析:音频文案路和视觉抽帧路会同步推进")
else toast.success("已进入并行素材分析:下载完成后自动跑音频文案路和视觉抽帧路")
void startProductionLanesForJob(target) void startProductionLanesForJob(target)
}, [handleSubmit, job, startProductionLanesForJob]) }, [handleSubmit, job, startProductionLanesForJob, updateJobInList])
useEffect(() => { useEffect(() => {
if (productionJobIds.size === 0) return if (productionJobIds.size === 0) return
@@ -863,6 +880,9 @@ export default function Home() {
if (job?.status === "downloaded" && prevStatusRef.current !== "downloaded") { if (job?.status === "downloaded" && prevStatusRef.current !== "downloaded") {
toast.info("视频已下载,音频解析会自动开始;也可以在右侧手动重试", { duration: 6000 }) toast.info("视频已下载,音频解析会自动开始;也可以在右侧手动重试", { duration: 6000 })
} }
if (job?.status === "failed" && prevStatusRef.current !== "failed") {
toast.error(formatJobError(job.error) || "任务失败", { duration: 10000 })
}
prevStatusRef.current = job?.status ?? null prevStatusRef.current = job?.status ?? null
const TERMINAL: Job["status"][] = ["downloaded", "frames_extracted", "transcribed", "failed"] const TERMINAL: Job["status"][] = ["downloaded", "frames_extracted", "transcribed", "failed"]

View File

@@ -32,6 +32,7 @@ import {
cutoutElement, cutoutElement,
deleteSubjectAsset, deleteSubjectAsset,
effectiveFrameUrl, effectiveFrameUrl,
formatJobError,
generateSceneAsset, generateSceneAsset,
generateProductAngleAsset, generateProductAngleAsset,
generateSubjectAssets, generateSubjectAssets,
@@ -52,6 +53,7 @@ import { type NodeData } from "@/components/nodes"
import { MediaAssetTile } from "@/components/media-asset-tile" import { MediaAssetTile } from "@/components/media-asset-tile"
const TARGETS: Array<{ value: FrameExtractTarget; label: string }> = [ const TARGETS: Array<{ value: FrameExtractTarget; label: string }> = [
{ value: "random_subject", label: "人物随机" },
{ value: "balanced", label: "综合" }, { value: "balanced", label: "综合" },
{ value: "subject", label: "主体" }, { value: "subject", label: "主体" },
{ value: "motion", label: "动作" }, { value: "motion", label: "动作" },
@@ -1449,6 +1451,9 @@ function MaterialColumn({
onSubmitUrl: () => void onSubmitUrl: () => void
onStartProduction: () => void onStartProduction: () => void
}) { }) {
const actionLabel = !url.trim() && job?.status === "failed"
? job.video_url ? "重新解析" : "重新下载"
: "开始分析"
return ( return (
<section className="flex min-h-0 flex-col gap-3 rounded-lg border border-white/10 bg-white/[0.035] p-3 shadow-2xl"> <section className="flex min-h-0 flex-col gap-3 rounded-lg border border-white/10 bg-white/[0.035] p-3 shadow-2xl">
<header className="shrink-0 border-b border-white/10 pb-3"> <header className="shrink-0 border-b border-white/10 pb-3">
@@ -1474,7 +1479,7 @@ function MaterialColumn({
disabled={data.submitting || (!url.trim() && !job)} disabled={data.submitting || (!url.trim() && !job)}
className="inline-flex h-10 items-center justify-center rounded-md bg-rose-600 px-3 text-[13px] font-semibold text-white transition hover:bg-rose-500 disabled:cursor-not-allowed disabled:opacity-45" className="inline-flex h-10 items-center justify-center rounded-md bg-rose-600 px-3 text-[13px] font-semibold text-white transition hover:bg-rose-500 disabled:cursor-not-allowed disabled:opacity-45"
> >
{actionLabel}
</button> </button>
<button <button
type="button" type="button"
@@ -1875,11 +1880,11 @@ function SourceReferenceBuildPanel({
for (const frame of job.frames) { for (const frame of job.frames) {
if (selectedFrames.has(frame.index)) onToggleFrame(frame.index) if (selectedFrames.has(frame.index)) onToggleFrame(frame.index)
} }
const updated = await analyzeJob(job.id, 12, "motion", "replace", "accurate") const updated = await analyzeJob(job.id, 6, "random_subject", "replace", "accurate")
onJobUpdate(updated) onJobUpdate(updated)
toast.info("已按动作峰值逻辑重新抽取 12 张参考帧,完成后在这里人工选择主角参考。") toast.info("已按人物定向随机逻辑重新抽取 6 张参考帧,完成后在这里人工选择主角参考。")
} catch (e) { } catch (e) {
toast.error("12 张关键帧抽取失败:" + (e instanceof Error ? e.message : String(e))) toast.error("6 张关键帧抽取失败:" + (e instanceof Error ? e.message : String(e)))
} finally { } finally {
setExtracting(false) setExtracting(false)
} }
@@ -1887,7 +1892,7 @@ function SourceReferenceBuildPanel({
const generateSimilarActor = async () => { const generateSimilarActor = async () => {
if (!frames.length) { if (!frames.length) {
toast.warning("请先自动抽帧 12 张,或在原版视频上手动补帧。") toast.warning("请先自动抽帧 6 张,或在原版视频上手动补帧。")
return return
} }
const baseFrame = subjectReferenceFrames[0] const baseFrame = subjectReferenceFrames[0]
@@ -2000,11 +2005,11 @@ function SourceReferenceBuildPanel({
type="button" type="button"
onClick={() => void extractKeyframes()} onClick={() => void extractKeyframes()}
disabled={!job.video_url || extracting || job.status === "splitting"} disabled={!job.video_url || extracting || job.status === "splitting"}
title="自动按动作峰值抽 12 张参考帧,更偏向手势、表情变化、节奏点和镜头变化" title="自动按人物定向随机逻辑抽 6 张参考帧,保留手动当前点补帧"
className="inline-flex h-8 items-center justify-center gap-1 rounded-md bg-white px-3 text-[11px] font-semibold text-black transition hover:bg-white/90 disabled:cursor-not-allowed disabled:opacity-40" className="inline-flex h-8 items-center justify-center gap-1 rounded-md bg-white px-3 text-[11px] font-semibold text-black transition hover:bg-white/90 disabled:cursor-not-allowed disabled:opacity-40"
> >
{extracting || job.status === "splitting" ? <Loader2 className="h-3.5 w-3.5 animate-spin" /> : <Scissors className="h-3.5 w-3.5" />} {extracting || job.status === "splitting" ? <Loader2 className="h-3.5 w-3.5 animate-spin" /> : <Scissors className="h-3.5 w-3.5" />}
12 6
</button> </button>
</div> </div>
</div> </div>
@@ -2039,7 +2044,7 @@ function SourceReferenceBuildPanel({
})} })}
{!frames.length && ( {!frames.length && (
<div className="col-span-full flex h-[106px] items-center justify-center rounded border border-dashed border-white/12 text-[11px] text-white/34"> <div className="col-span-full flex h-[106px] items-center justify-center rounded border border-dashed border-white/12 text-[11px] text-white/34">
12 6
</div> </div>
)} )}
</div> </div>
@@ -3405,7 +3410,7 @@ function FrameExtractControls({
</div> </div>
<div className="grid grid-cols-[1fr_1fr_72px] gap-2"> <div className="grid grid-cols-[1fr_1fr_72px] gap-2">
<select <select
value={job ? data.frameTargets[job.id] ?? "transparent_human" : "balanced"} value={job ? data.frameTargets[job.id] ?? "random_subject" : "random_subject"}
onChange={(e) => job && data.onFrameTargetChange(job.id, e.target.value as FrameExtractTarget)} onChange={(e) => job && data.onFrameTargetChange(job.id, e.target.value as FrameExtractTarget)}
disabled={!job} disabled={!job}
className={controlClass} className={controlClass}
@@ -3424,8 +3429,8 @@ function FrameExtractControls({
type="number" type="number"
min={1} min={1}
max={20} max={20}
value={job ? data.frameCounts[job.id] ?? 12 : 12} value={job ? data.frameCounts[job.id] ?? 6 : 6}
onChange={(e) => job && data.onFrameCountChange(job.id, Number(e.target.value) || 12)} onChange={(e) => job && data.onFrameCountChange(job.id, Number(e.target.value) || 6)}
disabled={!job} disabled={!job}
className={`${controlClass} text-center`} className={`${controlClass} text-center`}
/> />
@@ -3858,6 +3863,7 @@ function MaterialCard({
onDelete?: () => void onDelete?: () => void
}) { }) {
const tone = statusTone(job) const tone = statusTone(job)
const errorText = formatJobError(job.error)
return ( return (
<button <button
type="button" type="button"
@@ -3879,6 +3885,12 @@ function MaterialCard({
<Metric label="文案" value={job.audio_script?.source_text || job.transcript.length ? "ready" : "-"} compact /> <Metric label="文案" value={job.audio_script?.source_text || job.transcript.length ? "ready" : "-"} compact />
<Metric label="段落" value={`${job.transcript.length}`} compact /> <Metric label="段落" value={`${job.transcript.length}`} compact />
</div> </div>
{job.status === "failed" && errorText && (
<div className="mt-2 flex gap-1.5 rounded-md border border-rose-300/18 bg-rose-500/[0.08] px-2 py-1.5 text-[11px] leading-snug text-rose-100/82">
<AlertTriangle className="mt-0.5 h-3.5 w-3.5 shrink-0" />
<span className="line-clamp-3">{errorText}</span>
</div>
)}
{onDelete && ( {onDelete && (
<span <span
role="button" role="button"

View File

@@ -641,15 +641,15 @@ export const Dashboard = forwardRef<DashboardHandle, Props>(function Dashboard({
</div> </div>
</KanbanCard> </KanbanCard>
<KanbanCard tone="green" tags={["配音"]} title={job?.audio_script?.voice_model || "MiniMax T2A"}> <KanbanCard tone="green" tags={["配音"]} title={job?.audio_script?.voice_model || "Azure OpenAI TTS"}>
{job?.audio_script?.voice_url ? ( {job?.audio_script?.voice_url ? (
<audio controls className="h-8 w-full" src={apiAssetUrl(job.audio_script.voice_url)} /> <audio controls className="h-8 w-full" src={apiAssetUrl(job.audio_script.voice_url)} />
) : ( ) : (
<div className="text-[11px] text-[var(--text-soft)]"> <div className="text-[11px] text-[var(--text-soft)]">
{job?.audio_script?.error || "配置 MiniMax 后自动生成配音文件"} {job?.audio_script?.error || "配置 Azure OpenAI TTS 后自动生成配音文件"}
</div> </div>
)} )}
<div className="kanban-meta">{job?.audio_script?.voice_id || "random English voice"}</div> <div className="kanban-meta">{job?.audio_script?.voice_id || "Azure voice"}</div>
</KanbanCard> </KanbanCard>
</> </>
)} )}

View File

@@ -133,6 +133,7 @@ function clamp(value: number, min: number, max: number) {
const THUMBNAIL_HEIGHT = 192 const THUMBNAIL_HEIGHT = 192
const FLOATING_PANEL_EDGE_INSET = 8 const FLOATING_PANEL_EDGE_INSET = 8
const FRAME_TARGET_OPTIONS: Array<{ value: FrameExtractTarget; label: string; hint: string }> = [ const FRAME_TARGET_OPTIONS: Array<{ value: FrameExtractTarget; label: string; hint: string }> = [
{ value: "random_subject", label: "人物随机", hint: "从清晰人物候选里随机抽取" },
{ value: "transparent_human", label: "透明骨架人", hint: "本地算力筛清晰主体,不逐帧调用 Vision" }, { value: "transparent_human", label: "透明骨架人", hint: "本地算力筛清晰主体,不逐帧调用 Vision" },
{ value: "balanced", label: "综合关键帧", hint: "清晰、去重、变化、时间覆盖" }, { value: "balanced", label: "综合关键帧", hint: "清晰、去重、变化、时间覆盖" },
{ value: "subject", label: "清晰主体", hint: "人物 / 产品主体更清楚" }, { value: "subject", label: "清晰主体", hint: "人物 / 产品主体更清楚" },
@@ -140,7 +141,7 @@ const FRAME_TARGET_OPTIONS: Array<{ value: FrameExtractTarget; label: string; hi
{ value: "expression", label: "表情瞬间", hint: "人物 / 动物表情倾向" }, { value: "expression", label: "表情瞬间", hint: "人物 / 动物表情倾向" },
{ value: "motion", label: "动作峰值", hint: "动作变化更明显" }, { value: "motion", label: "动作峰值", hint: "动作变化更明显" },
] ]
const FRAME_COUNT_OPTIONS = [12, 8, 5, 3] const FRAME_COUNT_OPTIONS = [6, 12, 8, 5, 3]
const FRAME_QUALITY_OPTIONS: Array<{ value: FrameExtractQuality; label: string; hint: string }> = [ const FRAME_QUALITY_OPTIONS: Array<{ value: FrameExtractQuality; label: string; hint: string }> = [
{ value: "auto", label: "自动", hint: "展示友好:按电脑性能选择,最高只到精细" }, { value: "auto", label: "自动", hint: "展示友好:按电脑性能选择,最高只到精细" },
{ value: "fast", label: "快速", hint: "2fps / 360px长视频省电" }, { value: "fast", label: "快速", hint: "2fps / 360px长视频省电" },
@@ -575,8 +576,8 @@ export function InputNode({ data, selected }: NodeProps<{ data: NodeData }> | an
const aspectStr = ready ? `${j.width}/${j.height}` : "9/16" const aspectStr = ready ? `${j.width}/${j.height}` : "9/16"
const thumbNaturalWidth = ready && j.height ? Math.max(96, Math.round(THUMBNAIL_HEIGHT * j.width / j.height)) : 96 const thumbNaturalWidth = ready && j.height ? Math.max(96, Math.round(THUMBNAIL_HEIGHT * j.width / j.height)) : 96
const toolWidth = Math.max(148, thumbNaturalWidth) const toolWidth = Math.max(148, thumbNaturalWidth)
const target = d.frameTargets[j.id] ?? "transparent_human" const target = d.frameTargets[j.id] ?? "random_subject"
const count = d.frameCounts[j.id] ?? 12 const count = d.frameCounts[j.id] ?? 6
const quality = d.frameQualities[j.id] ?? "auto" const quality = d.frameQualities[j.id] ?? "auto"
const jHasFrames = j.frames.length > 0 const jHasFrames = j.frames.length > 0
const jRunning = ["splitting", "transcribing"].includes(j.status) const jRunning = ["splitting", "transcribing"].includes(j.status)
@@ -815,8 +816,8 @@ export function VideoFramePanelNode({ data }: any) {
const duration = panelJob.duration ?? 0 const duration = panelJob.duration ?? 0
const frames = [...panelJob.frames].sort((a, b) => a.timestamp - b.timestamp) const frames = [...panelJob.frames].sort((a, b) => a.timestamp - b.timestamp)
const aspect = panelJob.width && panelJob.height ? `${panelJob.width}/${panelJob.height}` : "9/16" const aspect = panelJob.width && panelJob.height ? `${panelJob.width}/${panelJob.height}` : "9/16"
const panelTarget = d.frameTargets[panelJob.id] ?? "transparent_human" const panelTarget = d.frameTargets[panelJob.id] ?? "random_subject"
const panelCount = d.frameCounts[panelJob.id] ?? 12 const panelCount = d.frameCounts[panelJob.id] ?? 6
const panelQuality = d.frameQualities[panelJob.id] ?? "auto" const panelQuality = d.frameQualities[panelJob.id] ?? "auto"
const panelRunning = ["splitting", "transcribing"].includes(panelJob.status) const panelRunning = ["splitting", "transcribing"].includes(panelJob.status)
const dockText: Record<CanvasPanelDock, string> = { const dockText: Record<CanvasPanelDock, string> = {
@@ -2102,7 +2103,7 @@ export function RewriteNode({ data, selected }: any) {
} }
/* ============================================================ /* ============================================================
5b. AudioNode — 合并 ASR + 翻译 + 改写 + MiniMax 配音 5b. AudioNode — 合并 ASR + 翻译 + 改写 + Azure OpenAI 配音
============================================================ */ ============================================================ */
export function AudioNode({ data, selected }: any) { export function AudioNode({ data, selected }: any) {
const d: NodeData = data const d: NodeData = data
@@ -2152,9 +2153,9 @@ export function AudioNode({ data, selected }: any) {
}} }}
> >
<div> <div>
/ SKG MiniMax <br /> / SKG Azure OpenAI <br />
<span className="text-[var(--text-faint)] font-mono"> <span className="text-[var(--text-faint)] font-mono">
{audioScript?.rewrite_model || "AUDIO_REWRITE_MODEL"} {audioScript?.voice_model || "MiniMax T2A"} {audioScript?.rewrite_model || "AUDIO_REWRITE_MODEL"} {audioScript?.voice_model || "Azure OpenAI TTS"}
</span> </span>
</div> </div>
{job && ( {job && (
@@ -2195,7 +2196,7 @@ export function AudioNode({ data, selected }: any) {
)} )}
</div> </div>
)} )}
{voiceUrl && <div className="text-[10.5px] text-emerald-200/85">MiniMax natural English voice ready · </div>} {voiceUrl && <div className="text-[10.5px] text-emerald-200/85">Azure OpenAI English voice ready · </div>}
{isRewriting && ( {isRewriting && (
<div className="text-[10.5px] text-[var(--text-faint)]"></div> <div className="text-[10.5px] text-[var(--text-faint)]"></div>
)} )}

View File

@@ -172,10 +172,7 @@ export interface RuntimeModels {
voice_id?: string voice_id?: string
voice_pool?: string[] voice_pool?: string[]
voice_configured?: boolean voice_configured?: boolean
minimax_tts?: string voice_tts_paths?: string[]
minimax_voice?: string
minimax_voice_pool?: string[]
minimax_configured?: boolean
video?: string video?: string
video_aliases?: Record<string, string> video_aliases?: Record<string, string>
video_provider?: string video_provider?: string
@@ -189,6 +186,15 @@ export interface RuntimeHealth {
llm_configured?: boolean llm_configured?: boolean
auth_configured?: boolean auth_configured?: boolean
base_url?: string base_url?: string
database?: {
enabled: boolean
url?: string
schema_version?: number
documents?: number
jobs?: number
assets?: number
error?: string
}
models?: RuntimeModels models?: RuntimeModels
} }
@@ -419,7 +425,7 @@ export interface KeyFrame {
generated_images?: GeneratedImage[] generated_images?: GeneratedImage[]
} }
export type FrameExtractTarget = "transparent_human" | "balanced" | "subject" | "transition" | "expression" | "motion" export type FrameExtractTarget = "random_subject" | "transparent_human" | "balanced" | "subject" | "transition" | "expression" | "motion"
export type FrameExtractMode = "replace" | "append" export type FrameExtractMode = "replace" | "append"
export type FrameExtractQuality = "auto" | "fast" | "accurate" | "ultra" export type FrameExtractQuality = "auto" | "fast" | "accurate" | "ultra"
export type AssetBackground = "white" | "black" export type AssetBackground = "white" | "black"
@@ -574,6 +580,10 @@ export interface ProductRefStateItem {
export interface Job { export interface Job {
id: string id: string
url: string url: string
document_id?: string
source_kind?: "tiktok_link" | "upload" | "unknown"
workflow_mode?: "feed_recreation" | "uploaded_reference"
storage_prefix?: string
status: JobStatus status: JobStatus
progress: number progress: number
message?: string message?: string
@@ -596,14 +606,13 @@ export interface BackendHealth {
llm_configured: boolean llm_configured: boolean
auth_configured?: boolean auth_configured?: boolean
base_url: string base_url: string
database?: RuntimeHealth["database"]
models?: { models?: {
asr?: string asr?: string
translate?: string translate?: string
rewrite?: string rewrite?: string
audio_rewrite?: string audio_rewrite?: string
minimax_tts?: string voice_tts_paths?: string[]
minimax_voice?: string
minimax_configured?: boolean
video?: string video?: string
video_aliases?: Record<string, string> video_aliases?: Record<string, string>
video_base_url?: string video_base_url?: string
@@ -617,6 +626,25 @@ export function apiAssetUrl(path?: string | null): string {
return `${API_BASE}${path.startsWith("/") ? "" : "/"}${path}` return `${API_BASE}${path.startsWith("/") ? "" : "/"}${path}`
} }
export function isRestrictedDownloadError(error?: string | null): boolean {
const text = (error ?? "").toLowerCase()
return (
text.includes("tiktok 下载需要登录态") ||
text.includes("log in for access") ||
text.includes("cookies-from-browser") ||
text.includes("ytdlp_cookies_file") ||
(text.includes("tiktok") && text.includes("cookies"))
)
}
export function formatJobError(error?: string | null): string {
if (!error) return ""
if (isRestrictedDownloadError(error)) {
return "这个 TikTok 视频需要登录态。请上传 MP4或让后端配置 YTDLP_COOKIES_FROM_BROWSER / YTDLP_COOKIES_FILE 后重试。"
}
return error
}
export async function getHealth(): Promise<BackendHealth> { export async function getHealth(): Promise<BackendHealth> {
const res = await fetch(`${API_BASE}/health`) const res = await fetch(`${API_BASE}/health`)
if (!res.ok) throw new Error(`health ${res.status}`) if (!res.ok) throw new Error(`health ${res.status}`)
@@ -633,6 +661,15 @@ export async function createJob(tkUrl: string): Promise<Job> {
return res.json() return res.json()
} }
export async function retryJobDownload(id: string): Promise<Job> {
const res = await fetch(`${API_BASE}/jobs/${id}/download/retry`, { method: "POST" })
if (!res.ok) {
const text = await res.text().catch(() => "")
throw apiError("retryJobDownload", res.status, text)
}
return res.json()
}
export async function uploadJob(file: File): Promise<Job> { export async function uploadJob(file: File): Promise<Job> {
const fd = new FormData() const fd = new FormData()
fd.append("file", file) fd.append("file", file)
@@ -664,6 +701,9 @@ export async function deleteJob(id: string): Promise<{ ok: boolean; id: string }
export interface JobSummary { export interface JobSummary {
id: string id: string
document_id?: string
source_kind?: string
workflow_mode?: string
url: string url: string
status: JobStatus status: JobStatus
progress: number progress: number
@@ -679,6 +719,28 @@ export interface JobSummary {
mtime: number mtime: number
} }
export interface DocumentSummary {
id: string
title: string
source_kind: string
workflow_mode: string
source_url: string
primary_job_id: string
status: string
storage_prefix: string
job_count: number
asset_count: number
created_at: number
updated_at: number
}
export async function listDocuments(limit?: number): Promise<DocumentSummary[]> {
const qs = limit && limit > 0 ? `?limit=${limit}` : ""
const res = await fetch(`${API_BASE}/documents${qs}`)
if (!res.ok) throw new Error(`listDocuments ${res.status}`)
return res.json()
}
export async function listJobs(limit?: number): Promise<JobSummary[]> { export async function listJobs(limit?: number): Promise<JobSummary[]> {
const qs = limit && limit > 0 ? `?limit=${limit}` : "" const qs = limit && limit > 0 ? `?limit=${limit}` : ""
const res = await fetch(`${API_BASE}/jobs${qs}`) const res = await fetch(`${API_BASE}/jobs${qs}`)
@@ -694,8 +756,8 @@ export async function triggerTranscribe(id: string): Promise<Job> {
export async function analyzeJob( export async function analyzeJob(
id: string, id: string,
frames = 12, frames = 6,
target: FrameExtractTarget = "balanced", target: FrameExtractTarget = "random_subject",
mode: FrameExtractMode = "replace", mode: FrameExtractMode = "replace",
quality: FrameExtractQuality = "auto", quality: FrameExtractQuality = "auto",
): Promise<Job> { ): Promise<Job> {