Compare commits
23 Commits
22421eb117
...
backend-wo
| Author | SHA1 | Date | |
|---|---|---|---|
| c60cb47ee1 | |||
| 061eb7d867 | |||
| 07384c5e19 | |||
| 4280624810 | |||
| 028718df0b | |||
| a6eddf1c14 | |||
| 9e307e307c | |||
| c2e9558f5b | |||
| c626ec51d6 | |||
| 1ac9b1bde3 | |||
| 1c451c6ab3 | |||
| 408c5fca47 | |||
| 2a1aa4c994 | |||
| ebac2e86b5 | |||
| 47653ee319 | |||
| 4d2a4a0299 | |||
| e6387cf7af | |||
| fde94f4698 | |||
| dddf410dcb | |||
| 301ec4fc3b | |||
| 2cfd7de5d5 | |||
| a2897ef2be | |||
| e6a5ea46a6 |
@@ -1,146 +1,110 @@
|
||||
# SKG TK 二创验证 — 当前状态(2026-05-13)
|
||||
# SKG TK 二创验证 — 当前状态(2026-05-18)
|
||||
|
||||
## 一句话
|
||||
SKG AI 素材生产管线第二条思路:TK 链接/上传 → 拆轨 → 抽关键帧(5 张+手动加)→ Vision 识别 → 改写文案 → 生图 → 生视频 → 合成。**MVP 通到生图,剩余 3 个节点占位**。
|
||||
当前产品方向已收窄为“信息流广告快速复刻”:TK 链接 / 上传视频后,先下载源视频,再并行跑音频文案路和视频视觉路;视频视觉路自动抽 6 张人物定向随机参考帧;产品素材独立成池,自动识别视角并补缺角度;分镜工作台按逐句时间轴写新口播、人物/产品需求和首尾帧规划。当前暂停直接提交视频模型,先逐条生成并审核首帧 / 尾帧。
|
||||
|
||||
## 路径 / 端口
|
||||
- 路径:`~/Projects/business/20260512-20260512-skg-tk-二创验证/`
|
||||
- web dev:`cd web && pnpm dev`(端口 **4290**)
|
||||
- api dev:`cd api && source .venv/bin/activate && uvicorn main:app --port 4291 --reload`
|
||||
- 测试 job:`?job=c6767f3a166b`(chrisorb 71s 竖屏 TK)
|
||||
- 当前工作树:`/Users/kangwan/Projects/business/20260512-20260512-skg-tk-二创验证-backend/`
|
||||
- 主项目路径:`/Users/kangwan/Projects/business/20260512-20260512-skg-tk-二创验证/`
|
||||
- 后台启动:`./scripts/start-dev-background.sh`(前端 4290,后端 4291,launchd 托管)
|
||||
- 后台停止:`./scripts/stop-dev-background.sh`
|
||||
- web dev:`cd web && npm run dev`
|
||||
- api dev:`cd api && uvicorn main:app --host 127.0.0.1 --port 4291`
|
||||
- 注意:后端不要带 `--reload` 跑下载、抽帧、音频和生图等长任务。
|
||||
|
||||
## SKG 网关能力(实测 · 关键!)
|
||||
`base_url: https://ai.skg.com/ezlink/v1`
|
||||
key 写在 `api/.env` 的 `LLM_API_KEY`
|
||||
## 当前模型分工
|
||||
`LLM_BASE_URL` 默认走 `https://ai.skg.com/ezlink/v1`,图片同样默认走 `IMAGE_BASE_URL=https://ai.skg.com/ezlink/v1`,语音默认走 `https://ai.skg.com/azure`,生产视频默认走 `https://ai.skg.com/doubao`。
|
||||
|
||||
| 端点 / 字段 | 状态 | 用途 |
|
||||
| 任务 | 当前模型 / 通道 | 备注 |
|
||||
|---|---|---|
|
||||
| `/v1/chat/completions` text-only | ✅ 通 | translate / rewrite |
|
||||
| `/v1/chat/completions` + image_url | ✅ **通**(之前误判为不通,是 dog.jpg 那张图损坏) | vision 识别图片(gemini-2.5-flash 推荐) |
|
||||
| `/v1/chat/completions` + input_audio | ❌ 不通 | ASR 不能走这条 |
|
||||
| `/v1/audio/transcriptions` (whisper) | ❌ 404 | 整个 audio 端点都没暴露 |
|
||||
| `/v1/audio/speech` (tts) | ❌ 404 | |
|
||||
| `/v1/images/generations` (text→image) | ✅ 通 | 生图(gemini-3-pro-image-preview = nano-banana-pro) |
|
||||
| `/v1/images/generations` + image 参数 | ✅ **通**(image-to-image) | 实测能传 reference image,关键的发现 |
|
||||
| `/v1/images/edits` | ❌ 404 | |
|
||||
| `/v1/videos/*` (sora-2) | ❌ 404 | 视频生成需要 IT 开通或外部 key |
|
||||
| `/v1/files` | ❌ 403 "必须指定渠道" | |
|
||||
| TK 下载 | `yt-dlp` + 可选 cookies | 公开视频裸下载;受限视频可配 `YTDLP_COOKIES_FILE` 或 `YTDLP_COOKIES_FROM_BROWSER`,也可直接上传 MP4。 |
|
||||
| 远端 ASR | `ASR_MODEL=whisper-1` | 失败后进本机 ASR,再进多模态兜底。 |
|
||||
| 本机 ASR | `LOCAL_ASR_MODEL=mlx-community/whisper-tiny` | 默认二级兜底,优先产出真实逐句时间轴。 |
|
||||
| ASR 兜底 / 音频分析 | `ASR_FALLBACK_MODEL=gemini-2.5-flash` | 多模态音频兜底;后端会拒绝假字幕、重复文本和覆盖率过低结果。 |
|
||||
| 字幕翻译 | `TRANSLATE_MODEL=gemini-2.5-flash` | 保留 Gemini。 |
|
||||
| 画面理解 | `VISION_MODEL=gpt-4o` | 关键帧 Vision 已切 GPT;旧环境若写 `gemini-*` 会自动归一化到 `GPT_TEXT_MODEL`。 |
|
||||
| 通用改写 / 分镜描述 | `REWRITE_MODEL=gpt-4o` | 已切 GPT;旧 Gemini 覆盖值会自动归一化。 |
|
||||
| 新口播改写 | `AUDIO_REWRITE_MODEL=gpt-4o` | 默认跟随 `REWRITE_MODEL`;旧 Gemini 覆盖值会自动归一化。 |
|
||||
| 产品视角识别 | `PRODUCT_VIEW_MODEL=gpt-image-2` | 产品图批量识别视角、左右 / 上下 / 内外侧、用途和风险。 |
|
||||
| 所有生图 / 修图 | `gpt-image-2` | 服务端硬锁,无图片模型 fallback;覆盖关键帧生图、水印清理、元素提取、主体资产包、产品补角度、首尾帧。 |
|
||||
| 配音 | `VOICE_PROVIDER=azure_openai` + `AZURE_TTS_MODEL=gpt-4o-mini-tts` | 语音固定 Azure OpenAI TTS。后端会按 `AZURE_TTS_PATHS` 依次尝试路径,便于区分路径错误和整条语音服务不可用。 |
|
||||
| 视频 | `VIDEO_MODEL=seedance` | 当前主流程暂停直接提交;生产通道默认 `ai.skg.com/doubao`,Seedance 真实 ID 由 `VIDEO_MODEL_SEEDANCE` 配置。 |
|
||||
|
||||
**网关后端 = one-hub 多渠道代理**。当前 key 分组叫「纯OpenAI+AWSClaude+Gemini官方」,缺 audio 渠道(`gpt-4o-audio-preview` 503 "无可用渠道")和 video 渠道。
|
||||
|
||||
## 模型选型(已写入 api/.env)
|
||||
```
|
||||
ASR_MODEL=whisper-1 # ⚠️ 端点 404,ASR 还没真跑通
|
||||
TRANSLATE_MODEL=gemini-2.5-flash # ✅ text 已通
|
||||
REWRITE_MODEL=gemini-2.5-pro # 占位
|
||||
VISION_MODEL=gemini-2.5-flash # ✅ 识别已通
|
||||
IMAGE_MODEL=gemini-3-pro-image-preview # ✅ nano-banana-pro,i2i 已通
|
||||
```
|
||||
|
||||
## Pipeline 状态(8 节点合并版)
|
||||
原 10 节点已合并:input + download + split 合一;translate 合到 transcript;videogen 和 compose 占位。
|
||||
|
||||
| 步 | 节点 | 状态 | 备注 |
|
||||
## 当前主流程
|
||||
| 步 | 模块 | 状态 | 备注 |
|
||||
|---|---|---|---|
|
||||
| 1 | **输入·Input**(合并下载+拆分) | ✅ | yt-dlp 真下 + ffmpeg 拆 wav |
|
||||
| 2 | **关键帧·Keyframes** | ✅ | D 启发式:候选 30 张 → pHash 去重 + Laplacian variance 评分 + 时序分桶 → 5 张;手动加帧 OK |
|
||||
| 3 | **转录·ASR** | ❌ 阻塞 | SKG 网关 audio 不通;待 IT 开 audio 渠道 / 外部 key |
|
||||
| 4 | **翻译·Translate** | ❌ 阻塞 | 依赖 ASR |
|
||||
| 5 | **改写·Rewrite** | ⏳ 占位 | 等用户给产品信息模板 |
|
||||
| 6 | **生图·Image Gen** | ✅ **刚做完** | nano-banana-pro i2i + 正负 prompt |
|
||||
| 7 | **生视频·Video Gen** | ⏳ 占位 | sora-2 端点不通 |
|
||||
| 8 | **合成·Compose** | ⏳ 占位 | 本地 ffmpeg + 字幕 + TTS |
|
||||
|
||||
## UI 架构(重要)
|
||||
- **左侧 sidebar**(108px 极窄):8 个 stage tile 竖排 + DAG 路径分叉表达
|
||||
- **主区 ReactFlow**:8 节点 DAG(input → keyframe/asr → ... → compose)
|
||||
- **点 sidebar tile**:从左滑出 drawer panel(粉/紫/橙 Kanban 风格)
|
||||
- **关键帧 lightbox**:**embedded 嵌入到 keyframe drawer**(不全屏)—— `<FrameLightbox embedded ... />`,drawer 宽度有 expandedFrame 时 760,无时 400
|
||||
- **Input 节点上方**:多视频缩略图浮条 + 「+」加新视频
|
||||
- **关键帧节点上方**:5+ 张缩略图按视频原比例(aspect-ratio: width/height)
|
||||
- **缩略图 hover**:弹大图静态(关键帧是垫图素材,不放视频)
|
||||
- **缩略图点击**:打开 keyframe drawer 内的 lightbox(左大图 + 右识别面板)
|
||||
|
||||
## 数据模型(关键 typescript / pydantic)
|
||||
```typescript
|
||||
KeyFrame {
|
||||
index: number // 稳定 ID(不连续!frames 数组按 timestamp 排序)
|
||||
timestamp: number
|
||||
url: string
|
||||
description?: {
|
||||
scene, objects: [{name, position, color, extract_prompt}],
|
||||
style, suggested_prompt
|
||||
}
|
||||
generated_images?: [{ id, prompt, model, mode, url, selected, created_at }]
|
||||
}
|
||||
|
||||
Job { frames: KeyFrame[] ... }
|
||||
```
|
||||
|
||||
**前端取帧必须用 `frames.find(x => x.index === activeIndex)` 不能用数组下标**(之前的 bug)。
|
||||
| 1 | 输入 / 下载 | 已通 | TK 链接或上传视频创建 job,下载完成后进入分析队列。 |
|
||||
| 2 | 音频文案路 | 已通 | 拆 `audio.wav`,ASR、翻译、讲话人 / 节奏 / 背景音分析;结果默认折叠展示。 |
|
||||
| 3 | 视频视觉路 | 已通 | 自动抽 6 张人物定向随机参考帧;当前工作区按 9:16 原视频播放秒数手动补帧。 |
|
||||
| 4 | 相似主体资产 | 已通 | 用关键帧和可选内置角色生成同一主体的 10 张白底视图。 |
|
||||
| 5 | 产品资产池 | 已通 | 上传 / 内置产品图统一入池,自动识别视角、结构点、用途、风险,缺角度可补图。 |
|
||||
| 6 | 分镜工作台 | 已通 | 按逐句时间轴编辑新口播、镜头类型、人物 / 产品开关、首帧 / 尾帧规划。 |
|
||||
| 7 | 首尾帧闸门 | 已通 | 每条分镜先用相似主体视图和产品素材生成首帧 / 尾帧,审核后保存。 |
|
||||
| 8 | 视频候选 | 暂停直提 | 历史候选保留展示;当前不再一键打 Seedance,等首尾帧审核后再开放单条提交。 |
|
||||
|
||||
## 关键文件
|
||||
- `web/app/page.tsx` — 多 job state 管理(jobs[] + activeJobId),8 节点 LAYOUT
|
||||
- `web/components/dashboard.tsx` — sidebar + drawer + 9 个 Kanban section(input/keyframe/asr/translate/rewrite/imagegen/videogen/compose),含 `ImageGenCard` 子组件
|
||||
- `web/components/lightbox.tsx` — `FrameLightbox` 支持 `embedded` prop
|
||||
- `web/components/video-lightbox.tsx` — Input 节点点视频缩略图弹的播放器
|
||||
- `web/components/nodes/index.tsx` — ReactFlow 8 节点定义
|
||||
- `web/lib/api.ts` — API client
|
||||
- `api/main.py` — FastAPI 所有端点,KeyFrame/GeneratedImage 模型
|
||||
- `api/main.py` — FastAPI 后端、模型路由、任务状态、ASR/翻译/音频分析、生图、产品识别、首尾帧和视频接口。
|
||||
- `api/database.py` — 后端数据库层;当前用 SQLite 保存 document / job / media asset 元数据,媒体文件仍在 `jobs/<jobId>/`。
|
||||
- `api/.env.example` — 本地模型和网关模板;已包含 `GPT_TEXT_MODEL=gpt-4o`。
|
||||
- `deploy/.env.production.example` — 生产环境模板;视频默认 SKG Doubao / Seedance 网关。
|
||||
- `RULES.md` — 启动、部署事实、模型环境变量和项目规则。
|
||||
- `docs/source-analysis.html` — 源码解析页;任何影响产品理解、接口、模型分工或操作路径的改动都要同步这里。
|
||||
- `web/components/ad-recreation-board.tsx` — 当前信息流复刻主工作台。
|
||||
- `web/components/media-asset-tile.tsx` — 统一媒体素材缩略图、hover 放大、删除和状态遮罩组件。
|
||||
- `web/lib/api.ts` — 前端 API client 和运行模型标注类型。
|
||||
|
||||
## 已通的 API 端点
|
||||
## 主要 API
|
||||
```
|
||||
POST /jobs 创建 job(链接)
|
||||
POST /jobs/upload 上传视频
|
||||
GET /jobs/{id} job 状态
|
||||
POST /jobs/{id}/analyze?frames=5 拆轨+抽帧+ASR 自动一气呵成
|
||||
POST /jobs/{id}/frames?t=<sec> 手动按时间戳加帧
|
||||
POST /jobs/{id}/frames/{idx}/describe ✅ Vision 识别(3 次重试 + reasoning_content 兜底)
|
||||
POST /jobs/{id}/frames/{idx}/generate ✅ 生图(i2i / text-only, 含 negative_prompt)
|
||||
GET /jobs/{id}/frames/{idx}/gen/{gen_id}.jpg 生成图二进制
|
||||
POST /jobs/{id}/frames/{idx}/gen/{gen_id}/select 选用某 gen 给下游
|
||||
GET /jobs/{id}/video.mp4 原视频
|
||||
GET /jobs/{id}/frames/{idx}.jpg 关键帧 jpg
|
||||
GET /health
|
||||
GET /documents
|
||||
POST /jobs
|
||||
POST /jobs/{id}/download/retry
|
||||
POST /jobs/upload
|
||||
GET /jobs
|
||||
GET /jobs/{id}
|
||||
DELETE /jobs/{id}
|
||||
POST /jobs/{id}/analyze
|
||||
POST /jobs/{id}/transcribe
|
||||
POST /jobs/{id}/frames?t=<sec>
|
||||
DELETE /jobs/{id}/frames/{idx}
|
||||
POST /jobs/{id}/frames/{idx}/describe
|
||||
POST /jobs/{id}/frames/{idx}/cleanup
|
||||
POST /jobs/{id}/frames/{idx}/cleanup/apply
|
||||
POST /jobs/{id}/frames/{idx}/generate
|
||||
POST /jobs/{id}/frames/{idx}/scene-asset
|
||||
POST /jobs/{id}/frames/{idx}/elements
|
||||
POST /jobs/{id}/frames/{idx}/elements/{element_id}/cutout
|
||||
POST /jobs/{id}/frames/{idx}/elements/{element_id}/subject-assets
|
||||
POST /jobs/{id}/assets
|
||||
PUT /jobs/{id}/product-refs
|
||||
POST /jobs/{id}/assets/product-views/analyze
|
||||
POST /jobs/{id}/assets/product-angle
|
||||
POST /jobs/{id}/script/rewrite
|
||||
PUT /jobs/{id}/frames/{idx}/storyboard
|
||||
POST /jobs/{id}/frames/{idx}/storyboard/video
|
||||
```
|
||||
|
||||
## 已知坑 / 不要再踩
|
||||
1. **关键帧 index 不连续**:手动加帧后 frames 数组按 timestamp 排序,index 是稳定 ID。lightbox 必须用 `frames.find(x => x.index === activeIndex)`,**不要**用 `frames[activeIndex]`。
|
||||
2. **SKG 网关 vision 之前测试结果错误**:用 `dog.jpg` 那张 wikipedia 200px 缩略图损坏 / metadata 异常,导致一直以为 image input 不通。用标准 PNG / 真实 jpeg 测就通了。
|
||||
3. **Gemini 2.5 Flash 默认带 thinking**,`content` 字段经常为空(token 都给了 reasoning),要从 `reasoning_content` 正则挖 JSON 兜底。
|
||||
4. **缩略图 aspect-ratio**:必须用 `aspectRatio: ${job.width}/${job.height}` 自适应,不要强制 `aspect-video` 16:9(竖屏视频会被裁切)。
|
||||
5. **ReactFlow `type="input"` / `"output"` 是 reserved**:自带白底默认样式,要 CSS 覆盖 `.react-flow .react-flow__node-input { background: transparent !important; ... }`。
|
||||
6. **ReactFlow 12 colorMode 独立于 next-themes**:必须 `<ReactFlow colorMode={resolvedTheme}>` 联动,否则节点白底。
|
||||
7. **FastAPI BackgroundTasks 用法**:`bg.add_task(func, arg)` 不能传 coroutine。
|
||||
8. **ffmpeg 8 mjpeg encoder 拒绝 yuv420p**:抽帧必须加 `-pix_fmt yuvj420p`,且 `-vsync` 改 `-fps_mode`。
|
||||
9. **抽帧速度**:场景切换检测(`select='gt(scene,0.4)'`)超慢(71s 视频要 30s+),换均匀采样 fast seek(5 张 < 3 秒)。
|
||||
## 当前约束 / 不要踩
|
||||
1. 图片 / 视频 / 抽帧 / 产品图 / 生成图 / 首尾帧 / 视频候选缩略图默认复用 `web/components/media-asset-tile.tsx`。
|
||||
2. 所有生图入口服务端只允许 `gpt-image-2`,不要重新加 Gemini 图片模型或其他 fallback。
|
||||
3. 画面理解和文案改写默认归 GPT:`VISION_MODEL`、`REWRITE_MODEL`、`AUDIO_REWRITE_MODEL` 会拦截旧 `gemini-*` 覆盖值。
|
||||
4. Gemini 仍保留在 ASR fallback / 音频分析 / 翻译链路,不要误删。
|
||||
5. 语音只走 Azure OpenAI TTS;不要新增或依赖其他配音通道配置。
|
||||
6. 当前主流程不直接批量提交视频;先走“分镜规划 → 首尾帧 → 人工审核”。
|
||||
7. 产品素材池默认是“同一产品”,不做不同产品身份判断;视角识别必须按佩戴者左 / 右、上 / 下、内 / 外侧描述。
|
||||
8. 自动抽帧默认是 `frames=6` + `target=random_subject` + `quality=accurate` + `mode=replace`;如果需要特定动作或表情,用“当前点抽帧”手动补。
|
||||
9. 文档是顶层业务归类:每个 TK 链接或上传视频默认一个 `document`,`job` 归属到 `document_id`;DB 存元数据和文件索引,视频 / 图片 / 音频文件不进 DB。
|
||||
10. 后端长任务不要用 `--reload`。
|
||||
11. 关键帧 `index` 是稳定 ID,不等于数组下标;前端取帧用 `frames.find(x => x.index === idx)`。
|
||||
12. TikTok cookies 属于账号登录态,只能放本机 / 服务器私有环境;不要提交 cookies 文件或账号密码。
|
||||
|
||||
## 待办(按优先级)
|
||||
1. **ASR 阻塞**:找 SKG IT 开 audio 渠道,或给一个外部 ASR key(Deepgram / 讯飞 / OpenAI 直连)
|
||||
2. **生图测试反馈**:刚做完,等用户在浏览器试 → 调 negative prompt / 模型选型
|
||||
3. **区域化修图(inpainting)**:用户讨论了,方案 A 纯 prompt / B 矩形框 / C 画笔 mask / D SAM;暂时搁置
|
||||
4. **改写 Rewrite**:等用户给产品信息卡模板
|
||||
5. **视频生成**:sora-2 走 SKG 端点不通;考虑外部 key (Runway/Kling/Veo3)
|
||||
6. **合成 Compose**:全本地 ffmpeg + 字幕 + TTS
|
||||
|
||||
## 操作流(开发会话)
|
||||
```bash
|
||||
# 1. 启动后端(如未跑)
|
||||
cd ~/Projects/business/20260512-20260512-skg-tk-二创验证/api
|
||||
source .venv/bin/activate
|
||||
uvicorn main:app --port 4291 --reload
|
||||
|
||||
# 2. 启动前端(如未跑)
|
||||
cd ../web
|
||||
pnpm dev
|
||||
|
||||
# 3. 浏览器
|
||||
open http://localhost:4290/?job=c6767f3a166b
|
||||
```
|
||||
|
||||
## 用户偏好提醒(feedback memory)
|
||||
- feedback_image-gen-model:生图统一用 nano-banana-pro ✅
|
||||
- feedback_keep-scope-small:小需求小做
|
||||
- feedback_flow-dont-stop:连续执行到交付,真分叉才问
|
||||
- feedback_demand-before-infra:基建前先反问谁/痛点/频率
|
||||
- feedback_no-guessing-ports:操作前先核实
|
||||
## 最近变更
|
||||
- 2026-05-18:TK 链接下载新增 `YTDLP_COOKIES_FILE` / `YTDLP_COOKIES_FROM_BROWSER` 支持;受限视频失败时前端提示上传 MP4 或配置后端 cookies 登录态。
|
||||
- 2026-05-18:素材输入端失败任务支持重新下载 / 重新解析;选中失败且无 `video_url` 的 TK 素材时调用后端重试接口,已有视频的失败任务会清掉自动触发标记并重新跑音频/视觉路。
|
||||
- 2026-05-18:清理个人语音通道残留,`/health`、前端类型、环境模板和文档不再暴露相关字段或配置。
|
||||
- 2026-05-18:新增后端数据库层,SQLite 默认落在 `APP_DB_URL` / `DATABASE_URL` 或 `JOBS_DIR/app.db`;`/documents` 返回文档归类列表,`/health.database` 返回 DB 状态。
|
||||
- 2026-05-18:`VISION_MODEL`、`REWRITE_MODEL`、`AUDIO_REWRITE_MODEL` 切到 GPT 默认模型 `gpt-4o`,并加旧 Gemini 环境变量归一化保护。
|
||||
- 2026-05-18:语音通道固定 Azure OpenAI TTS,并按 `AZURE_TTS_PATHS` 尝试语音路径。
|
||||
- 2026-05-18:当前主路径暂停直接提交视频,改为逐条首尾帧闸门。
|
||||
- 2026-05-18:媒体素材交互统一收口到 `MediaAssetTile`。
|
||||
- 2026-05-18:产品图视角识别和产品缺角度补图收敛到 `gpt-image-2`。
|
||||
|
||||
@@ -1,105 +1,5 @@
|
||||
{
|
||||
"entries": [
|
||||
{
|
||||
"files_changed": 5,
|
||||
"hash": "d802701",
|
||||
"message": "auto-save 2026-05-15 17:22 (~4, -1)",
|
||||
"ts": "2026-05-15T17:22:54+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 2,
|
||||
"message": "Codex 会话活跃 · 最近命令:codex · 2 项未提交变更 · 最近提交:auto-save 2026-05-15 17:22 (~4, -1)",
|
||||
"ts": "2026-05-15T09:24:48Z",
|
||||
"type": "session-heartbeat"
|
||||
},
|
||||
{
|
||||
"files_changed": 3,
|
||||
"hash": "dcd8560",
|
||||
"message": "auto-save 2026-05-15 17:28 (~3)",
|
||||
"ts": "2026-05-15T17:28:27+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 1,
|
||||
"hash": "25c4723",
|
||||
"message": "auto-save 2026-05-15 17:33 (~1)",
|
||||
"ts": "2026-05-15T17:33:59+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 1,
|
||||
"message": "Codex 会话活跃 · 最近命令:codex · 1 项未提交变更 · 最近提交:auto-save 2026-05-15 17:33 (~1)",
|
||||
"ts": "2026-05-15T09:34:48Z",
|
||||
"type": "session-heartbeat"
|
||||
},
|
||||
{
|
||||
"files_changed": 1,
|
||||
"hash": "1110500",
|
||||
"message": "auto-save 2026-05-15 17:39 (~1)",
|
||||
"ts": "2026-05-15T17:39:32+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 1,
|
||||
"message": "Codex 会话活跃 · 最近命令:codex · 1 项未提交变更 · 最近提交:auto-save 2026-05-15 17:39 (~1)",
|
||||
"ts": "2026-05-15T09:44:48Z",
|
||||
"type": "session-heartbeat"
|
||||
},
|
||||
{
|
||||
"files_changed": 1,
|
||||
"hash": "0b97d03",
|
||||
"message": "auto-save 2026-05-15 17:44 (~1)",
|
||||
"ts": "2026-05-15T17:45:02+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 1,
|
||||
"hash": "eeeaebd",
|
||||
"message": "auto-save 2026-05-15 17:50 (~1)",
|
||||
"ts": "2026-05-15T17:50:32+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 3,
|
||||
"message": "Codex 会话活跃 · 最近命令:codex · 3 项未提交变更 · 最近提交:auto-save 2026-05-15 17:50 (~1)",
|
||||
"ts": "2026-05-15T09:54:48Z",
|
||||
"type": "session-heartbeat"
|
||||
},
|
||||
{
|
||||
"files_changed": 4,
|
||||
"hash": "a662130",
|
||||
"message": "auto-save 2026-05-15 17:55 (+1, ~3)",
|
||||
"ts": "2026-05-15T17:56:05+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 2,
|
||||
"hash": "fae3fb3",
|
||||
"message": "auto-save 2026-05-15 18:01 (~2)",
|
||||
"ts": "2026-05-15T18:01:35+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 1,
|
||||
"message": "Codex 会话活跃 · 最近命令:codex · 1 项未提交变更 · 最近提交:auto-save 2026-05-15 18:01 (~2)",
|
||||
"ts": "2026-05-15T10:04:49Z",
|
||||
"type": "session-heartbeat"
|
||||
},
|
||||
{
|
||||
"files_changed": 1,
|
||||
"hash": "84143bc",
|
||||
"message": "auto-save 2026-05-15 18:06 (~1)",
|
||||
"ts": "2026-05-15T18:07:06+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 1,
|
||||
"hash": "6c8bc42",
|
||||
"message": "auto-save 2026-05-15 18:12 (~1)",
|
||||
"ts": "2026-05-15T18:12:39+08:00",
|
||||
"type": "commit"
|
||||
},
|
||||
{
|
||||
"files_changed": 4,
|
||||
"message": "Codex 会话活跃 · 最近命令:codex · 4 项未提交变更 · 最近提交:auto-save 2026-05-15 18:12 (~1)",
|
||||
@@ -3254,6 +3154,111 @@
|
||||
"message": "auto-save 2026-05-18 07:27 (~6)",
|
||||
"hash": "9790e5b",
|
||||
"files_changed": 6
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T14:30:08+08:00",
|
||||
"type": "commit",
|
||||
"message": "auto-save 2026-05-18 14:30 (~5)",
|
||||
"hash": "e6a5ea4",
|
||||
"files_changed": 5
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T14:31:59+08:00",
|
||||
"type": "commit",
|
||||
"message": "chore: switch vision and rewrite models to gpt",
|
||||
"hash": "a2897ef",
|
||||
"files_changed": 0
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T14:34:36+08:00",
|
||||
"type": "commit",
|
||||
"message": "chore: force gpt routing for vision and rewrite",
|
||||
"hash": "2cfd7de",
|
||||
"files_changed": 5
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T14:38:02+08:00",
|
||||
"type": "commit",
|
||||
"message": "docs: refresh current project status",
|
||||
"hash": "301ec4f",
|
||||
"files_changed": 1
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T14:39:23+08:00",
|
||||
"type": "commit",
|
||||
"message": "chore: update development worklog",
|
||||
"hash": "dddf410",
|
||||
"files_changed": 1
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T14:46:24+08:00",
|
||||
"type": "commit",
|
||||
"message": "auto-save 2026-05-18 14:46 (~7)",
|
||||
"hash": "e6387cf",
|
||||
"files_changed": 7
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T14:49:53+08:00",
|
||||
"type": "commit",
|
||||
"message": "fix: force azure openai tts voice path",
|
||||
"hash": "4d2a4a0",
|
||||
"files_changed": 4
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T15:08:05+08:00",
|
||||
"type": "commit",
|
||||
"message": "auto-save 2026-05-18 15:07 (~5)",
|
||||
"hash": "ebac2e8",
|
||||
"files_changed": 5
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T15:13:30+08:00",
|
||||
"type": "commit",
|
||||
"message": "auto-save 2026-05-18 15:13 (~8)",
|
||||
"hash": "2a1aa4c",
|
||||
"files_changed": 8
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T15:29:47+08:00",
|
||||
"type": "commit",
|
||||
"message": "auto-save 2026-05-18 15:29 (+1, ~5)",
|
||||
"hash": "1c451c6",
|
||||
"files_changed": 6
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T15:34:15+08:00",
|
||||
"type": "commit",
|
||||
"message": "feat: add backend document database",
|
||||
"hash": "1ac9b1b",
|
||||
"files_changed": 4
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T15:40:58+08:00",
|
||||
"type": "commit",
|
||||
"message": "fix: backfill database on startup",
|
||||
"hash": "c2e9558",
|
||||
"files_changed": 1
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T15:51:30+08:00",
|
||||
"type": "commit",
|
||||
"message": "chore: remove personal voice channel remnants",
|
||||
"hash": "a6eddf1",
|
||||
"files_changed": 7
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T16:35:29+08:00",
|
||||
"type": "commit",
|
||||
"message": "feat: support tiktok download cookies",
|
||||
"hash": "4280624",
|
||||
"files_changed": 9
|
||||
},
|
||||
{
|
||||
"ts": "2026-05-18T16:49:51+08:00",
|
||||
"type": "commit",
|
||||
"message": "fix: allow retrying failed source analysis",
|
||||
"hash": "061eb7d",
|
||||
"files_changed": 6
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
19
RULES.md
19
RULES.md
@@ -11,7 +11,7 @@
|
||||
- 详见 `CLAUDE.md` 立项决策段 + `.memory/plan.md` 七步管线拆解
|
||||
- 风格:`04-Dark-Gallery-Ambient`(路径:`~/Projects/research/20260305-网页风格库/04-Dark-Gallery-Ambient.md`)
|
||||
- 第一冲刺:步骤 1-4(下载 / 拆轨 / 关键帧 / ASR+翻译)
|
||||
- 当前产品方向(2026-05-18 再确认):先解决信息流广告快速复刻的第一步,不再沿用“开始后线性完成抽帧、分镜、元素生成、合成”的旧做法。主界面为“左侧素材输入列 + 右侧信息流复刻工作表”。用户粘贴 TK 链接或上传视频后点击“开始分析”,系统自动下载源视频;下载完成后并行启动两条路:音频文案路提取原音频文案/字幕,并分析讲话人、语速节奏、背景音乐/环境声/音效;视频视觉路自动抽取参考帧,供人工选择可用主体并生成相似主体白底视图。产品图上传后独立形成产品资产包,自动识别视角/结构/比例并补缺角度。分镜工作台按逐句时间轴规划新口播、镜头类型、首帧/尾帧、人物需求和产品出现方式;当前暂停直接调视频模型,先逐条用“相似主体视图 + 产品素材池 + 首尾帧文字规划”生成并审核首帧/尾帧,保存规划后再决定哪些分镜进入单条视频候选。
|
||||
- 当前产品方向(2026-05-18 再确认):先解决信息流广告快速复刻的第一步,不再沿用“开始后线性完成抽帧、分镜、元素生成、合成”的旧做法。主界面为“左侧素材输入列 + 右侧信息流复刻工作表”。用户粘贴 TK 链接或上传视频后点击“开始分析”,系统自动下载源视频;下载完成后并行启动两条路:音频文案路提取原音频文案/字幕,并分析讲话人、语速节奏、背景音乐/环境声/音效;视频视觉路自动抽取 6 张人物定向随机参考帧,供人工选择可用主体并生成相似主体白底视图。产品图上传后独立形成产品资产包,自动识别视角/结构/比例并补缺角度。分镜工作台按逐句时间轴规划新口播、镜头类型、首帧/尾帧、人物需求和产品出现方式;当前暂停直接调视频模型,先逐条用“相似主体视图 + 产品素材池 + 首尾帧文字规划”生成并审核首帧/尾帧,保存规划后再决定哪些分镜进入单条视频候选。
|
||||
|
||||
## 部署事实
|
||||
- 平台:VPS `76.13.31.179`(Ubuntu 24.04 / Docker Compose / Coolify Traefik)
|
||||
@@ -24,7 +24,7 @@
|
||||
- 服务器目录:`/opt/skg-marketing-studio`
|
||||
- 生产启动:`docker compose -f docker-compose.prod.yml --env-file deploy/.env.production up -d --build`
|
||||
- 生产架构:`web` 容器用 Nginx 承载 Next 静态导出;`/login/`、`/_next/`、`/assets/`、`/skg-logo-black.svg`、`/oasis-source/` 等登录页必需静态资源公开访问;未登录访问工作台跳转 `/login/`,`/api/` 通过 Nginx `auth_request` 校验 FastAPI 会话 Cookie 后反代到 `skg-marketing-api:4291`;Traefik 通过 `coolify` 外部网络接入 80/443
|
||||
- 持久化目录:服务器 `./data/jobs` 挂载到后端 `/data/jobs`
|
||||
- 持久化目录:服务器 `./data/jobs` 挂载到后端 `/data/jobs`;默认后端数据库为 `APP_DB_URL=sqlite:////data/jobs/app.db`,只存文档 / job / 媒体资产元数据和文件索引,原视频、音频、抽帧、生图、视频候选仍放在 `/data/jobs/<jobId>/`
|
||||
- 登录凭证:用户名写下方快捷登录;密码明文备份只放服务器 `/root/skg-marketing-studio-login.txt`,生产环境变量 `WEB_AUTH_PASSWORD` / `WEB_AUTH_SESSION_SECRET` 只放服务器 `deploy/.env.production`
|
||||
|
||||
## 快捷登录
|
||||
@@ -56,20 +56,21 @@
|
||||
- `ASR_TIMEOUT_SECONDS`:远端 ASR / 音频分析单次请求超时,默认 45 秒,避免第一步长时间停在转录中
|
||||
- `LOCAL_ASR_BIN` / `LOCAL_ASR_MODEL` / `LOCAL_ASR_TIMEOUT_SECONDS`:本机 ASR 兜底,默认使用 `/opt/homebrew/bin/mlx_whisper` + `mlx-community/whisper-tiny`,用于当前 SKG 网关 `/audio/transcriptions` 不可用时生成真实逐句时间轴
|
||||
- `TRANSLATE_MODEL`:字幕翻译模型,默认 `gemini-2.5-flash`
|
||||
- `REWRITE_MODEL`:通用改写/分镜描述模型,默认 `gemini-2.5-pro`
|
||||
- `AUDIO_REWRITE_MODEL`:后续音频口播改写模型,默认跟随 `REWRITE_MODEL`;当前第一步不默认调用口播改写,只保留原文案和声音分析
|
||||
- `GPT_TEXT_MODEL`:GPT 文本 / 视觉默认模型,默认 `gpt-4o`;用于兜底修正旧 Gemini 覆盖值
|
||||
- `REWRITE_MODEL`:通用改写/分镜描述模型,默认 `gpt-4o`;如果旧环境仍写 `gemini-*`,后端会自动改用 `GPT_TEXT_MODEL`
|
||||
- `VISION_MODEL`:关键帧画面理解模型,默认 `gpt-4o`;如果旧环境仍写 `gemini-*`,后端会自动改用 `GPT_TEXT_MODEL`
|
||||
- `AUDIO_REWRITE_MODEL`:后续音频口播改写模型,默认跟随 `REWRITE_MODEL`;如果旧环境仍写 `gemini-*`,后端会自动改用 `REWRITE_MODEL`
|
||||
- `AUDIO_PRODUCT_BRIEF`:音频口播改写时注入的 SKG 产品卖点
|
||||
- `PRODUCT_VIEW_MODEL`:同一产品素材池的视角标注/自动识别模型;当前按项目要求强制使用 `gpt-image-2`
|
||||
- `IMAGE_BASE_URL` / `IMAGE_API_KEY` / `IMAGE_MODEL`:OpenAI 兼容生图网关;当前所有生图入口一律强制使用 `gpt-image-2`,不做其他图片模型 fallback
|
||||
- `GPT_IMAGE_MODEL` / `SUBJECT_ASSET_IMAGE_MODEL` / `SUBJECT_ASSET_IMAGE_MODELS`:保留兼容旧环境变量名,但服务端会强制主体 6 视图和所有其他生图入口都只使用 `gpt-image-2`
|
||||
- `AI_HTTP_PROXY` / `IMAGE_HTTP_PROXY`:可选的 AI 网关出站代理;本地 launchd 后台进程不一定继承 shell 的 `http_proxy/https_proxy`,如生图报 DNS / ConnectError,可在本地 `api/.env` 配置后重启后端。`/health` 只回传是否配置代理,不回传代理地址。
|
||||
- `VOICE_PROVIDER`:配音通道,当前固定使用 `azure_openai`
|
||||
- `YTDLP_COOKIES_FILE` / `YTDLP_COOKIES_FROM_BROWSER`:可选 TikTok 下载登录态;优先使用 cookies 文件,其次读取本机浏览器 cookies。cookies 文件属于敏感登录态,只能放本机或服务器私有路径,不允许入库。
|
||||
- `VOICE_PROVIDER`:配音通道,服务端固定使用 `azure_openai`
|
||||
- `AZURE_OPENAI_BASE_URL` / `AZURE_OPENAI_API_KEY`:微软 Azure OpenAI 协议配音网关;本地未单独配置 Key 时回退复用 `LLM_API_KEY`
|
||||
- `AZURE_TTS_MODEL` / `AZURE_TTS_VOICE_ID` / `AZURE_TTS_VOICE_POOL` / `AZURE_TTS_PATH`:Azure OpenAI TTS 模型、默认音色、音色池和 OpenAI 协议语音路径
|
||||
- `MINIMAX_API_KEY`:MiniMax T2A 配音 Key,只能放本地 `api/.env`,不能入库;当前第一步暂不默认调用
|
||||
- `MINIMAX_TTS_BASE_URL` / `MINIMAX_TTS_MODEL` / `MINIMAX_TTS_VOICE_ID`:MiniMax 旧配音端点、模型和兜底音色配置,仅作为保留兼容;当前不作为默认语音通道
|
||||
- `MINIMAX_TTS_VOICE_POOL`:MiniMax 英文随机音色池;当前默认男声 `English_magnetic_voiced_man`、女声 `English_Upbeat_Woman`、成熟声 `English_MaturePartner`,供后续新配音阶段使用
|
||||
- `AZURE_TTS_MODEL` / `AZURE_TTS_VOICE_ID` / `AZURE_TTS_VOICE_POOL` / `AZURE_TTS_PATH` / `AZURE_TTS_PATHS`:Azure OpenAI TTS 模型、默认音色、音色池和 OpenAI 协议语音路径;后端会按 `AZURE_TTS_PATHS` 依次尝试,便于区分路径不对和整条语音服务不可用
|
||||
- `POE_API_KEY` / `VIDEO_API_KEY`:视频生成通道 Key,只能放本地环境变量
|
||||
- `APP_DB_URL` / `DATABASE_URL`:后端元数据数据库;当前内置实现支持 `sqlite:///`,生产默认 `sqlite:////data/jobs/app.db`。文档归类以 `documents` 为顶层,一条 TK 链接或一次上传默认一个 document,`jobs` 和 `media_assets` 归属到 `document_id`。
|
||||
- `WEB_AUTH_USERNAME` / `WEB_AUTH_PASSWORD` / `WEB_AUTH_SESSION_SECRET`:生产网页登录和会话签名配置;密码和 session secret 只放服务器环境变量,不入库
|
||||
- `FFMPEG_BIN` / `FFPROBE_BIN`:可选本地媒体二进制路径;本机 Homebrew ffmpeg 动态库损坏时,后端会自动跳过不可用的 PATH 版本并尝试本机静态 ffmpeg 备选,生产仍建议使用系统 ffmpeg/ffprobe
|
||||
- 生产环境变量:服务器只使用 `deploy/.env.production`,模板为 `deploy/.env.production.example`;真实 Key 不入库
|
||||
|
||||
@@ -17,7 +17,9 @@ LOCAL_ASR_BIN=/opt/homebrew/bin/mlx_whisper
|
||||
LOCAL_ASR_MODEL=mlx-community/whisper-tiny
|
||||
LOCAL_ASR_TIMEOUT_SECONDS=180
|
||||
TRANSLATE_MODEL=gemini-2.5-flash
|
||||
REWRITE_MODEL=gemini-2.5-pro
|
||||
GPT_TEXT_MODEL=gpt-4o
|
||||
REWRITE_MODEL=gpt-4o
|
||||
VISION_MODEL=gpt-4o
|
||||
PRODUCT_VIEW_MODEL=gpt-image-2
|
||||
IMAGE_BASE_URL=https://ai.skg.com/ezlink/v1
|
||||
IMAGE_API_KEY=
|
||||
@@ -27,14 +29,17 @@ SUBJECT_ASSET_IMAGE_MODEL=gpt-image-2
|
||||
SUBJECT_ASSET_IMAGE_MODELS=gpt-image-2
|
||||
# 可选:本地网络需要代理访问 ai.skg.com 时配置;launchd 不一定继承 shell 代理变量。
|
||||
AI_HTTP_PROXY=
|
||||
YTDLP_COOKIES_FILE=
|
||||
YTDLP_COOKIES_FROM_BROWSER=
|
||||
VIDEO_MODEL=seedance
|
||||
VIDEO_MODEL_SEEDANCE=seedance-2-fast
|
||||
VIDEO_MODEL_KLING=kling-omni
|
||||
VIDEO_MODEL_VEO3=veo-3.1-fast
|
||||
|
||||
# 音频文案改写 + Azure OpenAI 配音
|
||||
AUDIO_REWRITE_MODEL=gemini-2.5-pro
|
||||
AUDIO_REWRITE_MODEL=gpt-4o
|
||||
AUDIO_PRODUCT_BRIEF="SKG 智能按摩产品,主打日常肩颈、腰背、眼部、膝盖或足部放松;广告表达要高级、干净、可信,不做医疗疗效承诺。"
|
||||
# 语音通道服务端固定为 Azure OpenAI。
|
||||
VOICE_PROVIDER=azure_openai
|
||||
AZURE_OPENAI_BASE_URL=https://ai.skg.com/azure
|
||||
AZURE_OPENAI_API_KEY=
|
||||
@@ -42,13 +47,7 @@ AZURE_TTS_MODEL=gpt-4o-mini-tts
|
||||
AZURE_TTS_VOICE_ID=alloy
|
||||
AZURE_TTS_VOICE_POOL=alloy,verse,shimmer
|
||||
AZURE_TTS_PATH=/audio/speech
|
||||
|
||||
# MiniMax 旧配音通道,保留兼容;默认不走
|
||||
MINIMAX_API_KEY=
|
||||
MINIMAX_TTS_BASE_URL=https://api.minimax.io
|
||||
MINIMAX_TTS_MODEL=speech-2.8-turbo
|
||||
MINIMAX_TTS_VOICE_ID=English_expressive_narrator
|
||||
MINIMAX_TTS_VOICE_POOL=English_magnetic_voiced_man,English_Upbeat_Woman,English_MaturePartner
|
||||
AZURE_TTS_PATHS=/audio/speech,/v1/audio/speech
|
||||
|
||||
# Poe 视频 API(优先用于 Seedance / Kling / Veo)
|
||||
POE_API_BASE_URL=https://api.poe.com/v1
|
||||
@@ -80,7 +79,8 @@ VIDEO_DURATION_FIELD=seconds
|
||||
VIDEO_POLL_TIMEOUT_SECONDS=900
|
||||
|
||||
# 工作目录
|
||||
KEYFRAME_COUNT=12
|
||||
APP_DB_URL=sqlite:///./jobs/app.db
|
||||
KEYFRAME_COUNT=6
|
||||
JOBS_DIR=./jobs
|
||||
|
||||
# CORS
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# SKG TK 二创 API
|
||||
|
||||
FastAPI 后端,跑 yt-dlp + ffmpeg + ASR/翻译/英文 SKG 产品介绍文案 + MiniMax 英文配音管线。
|
||||
FastAPI 后端,跑 yt-dlp + ffmpeg + ASR/翻译/英文 SKG 产品介绍文案 + Azure OpenAI 英文配音管线。
|
||||
|
||||
## 启动
|
||||
|
||||
@@ -9,7 +9,7 @@ cd api
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
cp .env.example .env # 按需填 LLM_API_KEY / MINIMAX_API_KEY
|
||||
cp .env.example .env # 按需填 LLM_API_KEY / AZURE_OPENAI_API_KEY
|
||||
uvicorn main:app --host 127.0.0.1 --port 4291
|
||||
```
|
||||
|
||||
@@ -18,21 +18,23 @@ uvicorn main:app --host 127.0.0.1 --port 4291
|
||||
## 路由
|
||||
|
||||
- `GET /health` — 健康检查 + 配置状态
|
||||
- `GET /documents` — 后端数据库里的文档归类列表;一条 TK 链接或一次上传视频默认一个 document
|
||||
- `POST /jobs` `{url}` — 创建 job,后台下载源视频,视频就绪后可手动解析或提取音频
|
||||
- `GET /jobs/{id}` — 当前状态 + 产物;若原始音轨已拆出,会返回 `source_audio_url`
|
||||
- `POST /jobs/{id}/transcribe` — 触发音频提取 + ASR + 翻译 + SKG 英文产品介绍文案;文案长度按原音频时长估算,配置 MiniMax 后从英文随机音色池生成配音。前端 Audio 节点提供“提取音频 / 重新提取音频”按钮,可与抽帧并行,不自动触发
|
||||
- `POST /jobs/{id}/transcribe` — 触发音频提取 + ASR + 翻译 + SKG 英文产品介绍文案;文案长度按原音频时长估算,配置 Azure OpenAI TTS 后从 Azure 音色池生成配音。前端 Audio 节点提供“提取音频 / 重新提取音频”按钮,可与抽帧并行,不自动触发
|
||||
- `GET /jobs/{id}/video.mp4` — 原视频
|
||||
- `GET /jobs/{id}/audio.wav` — 拆轨后的原始音频,供前端底部音频条生成波形
|
||||
- `GET /jobs/{id}/audio-script.mp3` — 英文改写文案的 MiniMax 配音
|
||||
- `GET /jobs/{id}/audio-script.mp3` — 英文改写文案的 Azure OpenAI TTS 配音
|
||||
- `GET /jobs/{id}/frames/{i}.jpg` — 第 i 张关键帧(0-9)
|
||||
|
||||
## Mock 模式
|
||||
|
||||
未设 `LLM_API_KEY` 时,转录走本地 mock,便于 UI 联调;未设 `MINIMAX_API_KEY` 时只生成改写文案,不生成配音文件。
|
||||
未设 `LLM_API_KEY` 时,转录走本地 mock,便于 UI 联调;未设 `AZURE_OPENAI_API_KEY` 且无法复用 `LLM_API_KEY` 时只生成改写文案,不生成配音文件。
|
||||
|
||||
## 依赖
|
||||
|
||||
- `ffmpeg` 系统二进制(拆轨 / 抽帧)
|
||||
- `yt-dlp` 系统二进制(也可走 Python 包)
|
||||
- SQLite 元数据数据库(默认 `APP_DB_URL=sqlite:///./jobs/app.db`);只存 document / job / media asset 元数据,原视频、音频、抽帧和生成文件继续放 `jobs/<jobId>/`
|
||||
- OpenAI 兼容 LLM 网关(ASR / 翻译 / 文案改写);如果 `/audio/transcriptions` 不可用,会用 `ASR_FALLBACK_MODEL` 走 Gemini 多模态音频识别
|
||||
- MiniMax T2A HTTP(英文产品介绍文案配音,使用 `MINIMAX_API_KEY`;默认随机音色池 `English_magnetic_voiced_man,English_Upbeat_Woman,English_MaturePartner`)
|
||||
- Azure OpenAI TTS(英文产品介绍文案配音,使用 `AZURE_OPENAI_API_KEY` 或回退复用 `LLM_API_KEY`;默认音色池 `alloy,verse,shimmer`)
|
||||
|
||||
536
api/database.py
Normal file
536
api/database.py
Normal file
@@ -0,0 +1,536 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sqlite3
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
|
||||
SCHEMA_VERSION = 1
|
||||
|
||||
|
||||
def default_database_url(jobs_dir: Path) -> str:
|
||||
return os.getenv("APP_DB_URL") or os.getenv("DATABASE_URL") or f"sqlite:///{jobs_dir / 'app.db'}"
|
||||
|
||||
|
||||
def redact_database_url(url: str) -> str:
|
||||
if "://" not in url or "@" not in url:
|
||||
return url
|
||||
scheme, rest = url.split("://", 1)
|
||||
_, host = rest.rsplit("@", 1)
|
||||
return f"{scheme}://***@{host}"
|
||||
|
||||
|
||||
def infer_source_kind(url: str) -> str:
|
||||
if url.startswith("upload://"):
|
||||
return "upload"
|
||||
if url.startswith("http://") or url.startswith("https://"):
|
||||
return "tiktok_link"
|
||||
return "unknown"
|
||||
|
||||
|
||||
def default_workflow_mode(source_kind: str) -> str:
|
||||
if source_kind == "upload":
|
||||
return "uploaded_reference"
|
||||
return "feed_recreation"
|
||||
|
||||
|
||||
def document_title(url: str, source_kind: str, fallback: str) -> str:
|
||||
if source_kind == "upload":
|
||||
return url.replace("upload://", "", 1).strip() or fallback
|
||||
if url:
|
||||
return url.strip()[:120]
|
||||
return fallback
|
||||
|
||||
|
||||
def storage_prefix(document_id: str, source_kind: str, workflow_mode: str) -> str:
|
||||
source = source_kind or "unknown"
|
||||
mode = workflow_mode or default_workflow_mode(source)
|
||||
return f"{mode}/{source}/{document_id}"
|
||||
|
||||
|
||||
class AppDatabase:
|
||||
def __init__(self, url: str, jobs_dir: Path):
|
||||
self.url = url
|
||||
self.jobs_dir = jobs_dir
|
||||
self.path = self._sqlite_path(url)
|
||||
self.enabled = True
|
||||
self.error = ""
|
||||
|
||||
@staticmethod
|
||||
def _sqlite_path(url: str) -> Path:
|
||||
if url == ":memory:":
|
||||
return Path(":memory:")
|
||||
if not url.startswith("sqlite:///"):
|
||||
raise RuntimeError("当前内置数据库层只支持 sqlite:/// URL;Postgres 迁移会复用同一张表语义。")
|
||||
raw = url[len("sqlite:///"):]
|
||||
return Path(raw).expanduser().resolve()
|
||||
|
||||
def connect(self) -> sqlite3.Connection:
|
||||
if str(self.path) != ":memory:":
|
||||
self.path.parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(str(self.path))
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
return conn
|
||||
|
||||
def init(self) -> None:
|
||||
with self.connect() as conn:
|
||||
conn.executescript(
|
||||
"""
|
||||
CREATE TABLE IF NOT EXISTS schema_meta (
|
||||
key TEXT PRIMARY KEY,
|
||||
value TEXT NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS documents (
|
||||
id TEXT PRIMARY KEY,
|
||||
title TEXT NOT NULL,
|
||||
source_kind TEXT NOT NULL,
|
||||
workflow_mode TEXT NOT NULL,
|
||||
source_url TEXT NOT NULL DEFAULT '',
|
||||
primary_job_id TEXT NOT NULL DEFAULT '',
|
||||
status TEXT NOT NULL DEFAULT 'created',
|
||||
storage_prefix TEXT NOT NULL,
|
||||
metadata_json TEXT NOT NULL DEFAULT '{}',
|
||||
created_at REAL NOT NULL,
|
||||
updated_at REAL NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS jobs (
|
||||
id TEXT PRIMARY KEY,
|
||||
document_id TEXT NOT NULL,
|
||||
source_kind TEXT NOT NULL,
|
||||
workflow_mode TEXT NOT NULL,
|
||||
source_url TEXT NOT NULL DEFAULT '',
|
||||
status TEXT NOT NULL,
|
||||
progress INTEGER NOT NULL DEFAULT 0,
|
||||
message TEXT NOT NULL DEFAULT '',
|
||||
storage_path TEXT NOT NULL,
|
||||
state_path TEXT NOT NULL,
|
||||
video_url TEXT NOT NULL DEFAULT '',
|
||||
duration REAL NOT NULL DEFAULT 0,
|
||||
width INTEGER NOT NULL DEFAULT 0,
|
||||
height INTEGER NOT NULL DEFAULT 0,
|
||||
frame_count INTEGER NOT NULL DEFAULT 0,
|
||||
video_count INTEGER NOT NULL DEFAULT 0,
|
||||
error TEXT NOT NULL DEFAULT '',
|
||||
metadata_json TEXT NOT NULL DEFAULT '{}',
|
||||
created_at REAL NOT NULL,
|
||||
updated_at REAL NOT NULL,
|
||||
FOREIGN KEY(document_id) REFERENCES documents(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS media_assets (
|
||||
id TEXT PRIMARY KEY,
|
||||
document_id TEXT NOT NULL,
|
||||
job_id TEXT NOT NULL,
|
||||
kind TEXT NOT NULL,
|
||||
role TEXT NOT NULL,
|
||||
path TEXT NOT NULL DEFAULT '',
|
||||
url TEXT NOT NULL DEFAULT '',
|
||||
frame_index INTEGER,
|
||||
timestamp REAL,
|
||||
width INTEGER NOT NULL DEFAULT 0,
|
||||
height INTEGER NOT NULL DEFAULT 0,
|
||||
duration REAL NOT NULL DEFAULT 0,
|
||||
metadata_json TEXT NOT NULL DEFAULT '{}',
|
||||
created_at REAL NOT NULL,
|
||||
updated_at REAL NOT NULL,
|
||||
FOREIGN KEY(document_id) REFERENCES documents(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY(job_id) REFERENCES jobs(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_documents_updated_at ON documents(updated_at DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_documents_source_kind ON documents(source_kind);
|
||||
CREATE INDEX IF NOT EXISTS idx_documents_workflow_mode ON documents(workflow_mode);
|
||||
CREATE INDEX IF NOT EXISTS idx_jobs_document_id ON jobs(document_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_jobs_updated_at ON jobs(updated_at DESC);
|
||||
CREATE INDEX IF NOT EXISTS idx_assets_document_id ON media_assets(document_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_assets_job_id ON media_assets(job_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_assets_role ON media_assets(role);
|
||||
"""
|
||||
)
|
||||
conn.execute(
|
||||
"INSERT OR REPLACE INTO schema_meta(key, value) VALUES('schema_version', ?)",
|
||||
(str(SCHEMA_VERSION),),
|
||||
)
|
||||
|
||||
def normalize_job_document(self, job: dict[str, Any]) -> dict[str, Any]:
|
||||
job_id = str(job.get("id") or "")
|
||||
source_url = str(job.get("url") or "")
|
||||
source_kind = str(job.get("source_kind") or "") or infer_source_kind(source_url)
|
||||
workflow_mode = str(job.get("workflow_mode") or "") or default_workflow_mode(source_kind)
|
||||
document_id = str(job.get("document_id") or "") or job_id
|
||||
prefix = str(job.get("storage_prefix") or "") or storage_prefix(document_id, source_kind, workflow_mode)
|
||||
return {
|
||||
"document_id": document_id,
|
||||
"source_kind": source_kind,
|
||||
"workflow_mode": workflow_mode,
|
||||
"storage_prefix": prefix,
|
||||
"title": document_title(source_url, source_kind, document_id),
|
||||
}
|
||||
|
||||
def sync_job(self, job: dict[str, Any], job_path: Path) -> None:
|
||||
if not self.enabled:
|
||||
return
|
||||
now = time.time()
|
||||
job_id = str(job.get("id") or "")
|
||||
if not job_id:
|
||||
return
|
||||
doc = self.normalize_job_document(job)
|
||||
state_path = job_path / "state.json"
|
||||
frames = list(job.get("frames") or [])
|
||||
generated_videos = list(job.get("generated_videos") or [])
|
||||
metadata = {
|
||||
"audio_segment_count": len(job.get("transcript") or []),
|
||||
"product_ref_count": len(job.get("product_refs") or []),
|
||||
"storyboard_image_count": len(job.get("storyboard_images") or []),
|
||||
}
|
||||
with self.connect() as conn:
|
||||
existing = conn.execute(
|
||||
"SELECT created_at FROM documents WHERE id = ?",
|
||||
(doc["document_id"],),
|
||||
).fetchone()
|
||||
created_at = float(existing["created_at"]) if existing else now
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO documents(
|
||||
id, title, source_kind, workflow_mode, source_url, primary_job_id,
|
||||
status, storage_prefix, metadata_json, created_at, updated_at
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(id) DO UPDATE SET
|
||||
title = excluded.title,
|
||||
source_kind = excluded.source_kind,
|
||||
workflow_mode = excluded.workflow_mode,
|
||||
source_url = excluded.source_url,
|
||||
primary_job_id = excluded.primary_job_id,
|
||||
status = excluded.status,
|
||||
storage_prefix = excluded.storage_prefix,
|
||||
metadata_json = excluded.metadata_json,
|
||||
updated_at = excluded.updated_at
|
||||
""",
|
||||
(
|
||||
doc["document_id"],
|
||||
doc["title"],
|
||||
doc["source_kind"],
|
||||
doc["workflow_mode"],
|
||||
str(job.get("url") or ""),
|
||||
job_id,
|
||||
str(job.get("status") or "created"),
|
||||
doc["storage_prefix"],
|
||||
json.dumps(metadata, ensure_ascii=False),
|
||||
created_at,
|
||||
now,
|
||||
),
|
||||
)
|
||||
existing_job = conn.execute("SELECT created_at FROM jobs WHERE id = ?", (job_id,)).fetchone()
|
||||
job_created_at = float(existing_job["created_at"]) if existing_job else now
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO jobs(
|
||||
id, document_id, source_kind, workflow_mode, source_url, status,
|
||||
progress, message, storage_path, state_path, video_url, duration,
|
||||
width, height, frame_count, video_count, error, metadata_json,
|
||||
created_at, updated_at
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(id) DO UPDATE SET
|
||||
document_id = excluded.document_id,
|
||||
source_kind = excluded.source_kind,
|
||||
workflow_mode = excluded.workflow_mode,
|
||||
source_url = excluded.source_url,
|
||||
status = excluded.status,
|
||||
progress = excluded.progress,
|
||||
message = excluded.message,
|
||||
storage_path = excluded.storage_path,
|
||||
state_path = excluded.state_path,
|
||||
video_url = excluded.video_url,
|
||||
duration = excluded.duration,
|
||||
width = excluded.width,
|
||||
height = excluded.height,
|
||||
frame_count = excluded.frame_count,
|
||||
video_count = excluded.video_count,
|
||||
error = excluded.error,
|
||||
metadata_json = excluded.metadata_json,
|
||||
updated_at = excluded.updated_at
|
||||
""",
|
||||
(
|
||||
job_id,
|
||||
doc["document_id"],
|
||||
doc["source_kind"],
|
||||
doc["workflow_mode"],
|
||||
str(job.get("url") or ""),
|
||||
str(job.get("status") or "created"),
|
||||
int(job.get("progress") or 0),
|
||||
str(job.get("message") or ""),
|
||||
str(job_path),
|
||||
str(state_path),
|
||||
str(job.get("video_url") or ""),
|
||||
float(job.get("duration") or 0),
|
||||
int(job.get("width") or 0),
|
||||
int(job.get("height") or 0),
|
||||
len(frames),
|
||||
len(generated_videos),
|
||||
str(job.get("error") or ""),
|
||||
json.dumps(metadata, ensure_ascii=False),
|
||||
job_created_at,
|
||||
now,
|
||||
),
|
||||
)
|
||||
conn.execute("DELETE FROM media_assets WHERE job_id = ?", (job_id,))
|
||||
for asset in self._job_assets(job, job_path, doc["document_id"]):
|
||||
conn.execute(
|
||||
"""
|
||||
INSERT INTO media_assets(
|
||||
id, document_id, job_id, kind, role, path, url, frame_index,
|
||||
timestamp, width, height, duration, metadata_json, created_at, updated_at
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
""",
|
||||
(
|
||||
asset["id"],
|
||||
asset["document_id"],
|
||||
asset["job_id"],
|
||||
asset["kind"],
|
||||
asset["role"],
|
||||
asset.get("path", ""),
|
||||
asset.get("url", ""),
|
||||
asset.get("frame_index"),
|
||||
asset.get("timestamp"),
|
||||
int(asset.get("width") or 0),
|
||||
int(asset.get("height") or 0),
|
||||
float(asset.get("duration") or 0),
|
||||
json.dumps(asset.get("metadata") or {}, ensure_ascii=False),
|
||||
now,
|
||||
now,
|
||||
),
|
||||
)
|
||||
|
||||
def _job_assets(self, job: dict[str, Any], job_path: Path, document_id: str) -> list[dict[str, Any]]:
|
||||
job_id = str(job.get("id") or "")
|
||||
items: list[dict[str, Any]] = []
|
||||
|
||||
def add(
|
||||
asset_id: str,
|
||||
kind: str,
|
||||
role: str,
|
||||
path: Path | str = "",
|
||||
url: str = "",
|
||||
frame_index: int | None = None,
|
||||
timestamp: float | None = None,
|
||||
width: int = 0,
|
||||
height: int = 0,
|
||||
duration: float = 0.0,
|
||||
metadata: dict[str, Any] | None = None,
|
||||
) -> None:
|
||||
items.append({
|
||||
"id": asset_id,
|
||||
"document_id": document_id,
|
||||
"job_id": job_id,
|
||||
"kind": kind,
|
||||
"role": role,
|
||||
"path": str(path) if path else "",
|
||||
"url": url,
|
||||
"frame_index": frame_index,
|
||||
"timestamp": timestamp,
|
||||
"width": width,
|
||||
"height": height,
|
||||
"duration": duration,
|
||||
"metadata": metadata or {},
|
||||
})
|
||||
|
||||
if (job_path / "source.mp4").exists() or job.get("video_url"):
|
||||
add(
|
||||
f"{job_id}:source_video",
|
||||
"video",
|
||||
"source_video",
|
||||
job_path / "source.mp4",
|
||||
str(job.get("video_url") or f"/jobs/{job_id}/video.mp4"),
|
||||
duration=float(job.get("duration") or 0),
|
||||
width=int(job.get("width") or 0),
|
||||
height=int(job.get("height") or 0),
|
||||
)
|
||||
if (job_path / "audio.wav").exists() or job.get("source_audio_url"):
|
||||
add(
|
||||
f"{job_id}:source_audio",
|
||||
"audio",
|
||||
"source_audio",
|
||||
job_path / "audio.wav",
|
||||
str(job.get("source_audio_url") or f"/jobs/{job_id}/audio.wav"),
|
||||
duration=float(job.get("duration") or 0),
|
||||
)
|
||||
|
||||
for frame in job.get("frames") or []:
|
||||
idx = int(frame.get("index") or 0)
|
||||
add(
|
||||
f"{job_id}:frame:{idx}",
|
||||
"image",
|
||||
"keyframe",
|
||||
job_path / "frames" / f"{idx:03d}.jpg",
|
||||
str(frame.get("url") or f"/jobs/{job_id}/frames/{idx}.jpg"),
|
||||
frame_index=idx,
|
||||
timestamp=float(frame.get("timestamp") or 0),
|
||||
metadata={"quality_report": frame.get("quality_report")},
|
||||
)
|
||||
if frame.get("cleaned_url"):
|
||||
add(
|
||||
f"{job_id}:frame:{idx}:cleaned",
|
||||
"image",
|
||||
"cleaned_keyframe",
|
||||
job_path / "cleaned" / f"{idx:03d}.jpg",
|
||||
str(frame.get("cleaned_url")),
|
||||
frame_index=idx,
|
||||
timestamp=float(frame.get("timestamp") or 0),
|
||||
)
|
||||
for generated in frame.get("generated_images") or []:
|
||||
gen_id = str(generated.get("id") or "")
|
||||
if gen_id:
|
||||
add(
|
||||
f"{job_id}:generated_image:{idx}:{gen_id}",
|
||||
"image",
|
||||
"generated_image",
|
||||
job_path / "gen" / f"{idx:03d}_{gen_id}.jpg",
|
||||
str(generated.get("url") or ""),
|
||||
frame_index=idx,
|
||||
metadata={"model": generated.get("model"), "mode": generated.get("mode")},
|
||||
)
|
||||
for scene_asset in frame.get("scene_assets") or []:
|
||||
asset_id = str(scene_asset.get("id") or "")
|
||||
if asset_id:
|
||||
add(
|
||||
f"{job_id}:scene_asset:{asset_id}",
|
||||
"image",
|
||||
str(scene_asset.get("asset_role") or "scene_asset"),
|
||||
job_path / "assets" / f"{asset_id}.jpg",
|
||||
str(scene_asset.get("url") or ""),
|
||||
frame_index=idx,
|
||||
width=int(scene_asset.get("width") or 0),
|
||||
height=int(scene_asset.get("height") or 0),
|
||||
metadata={"label": scene_asset.get("label"), "scene_mode": scene_asset.get("scene_mode")},
|
||||
)
|
||||
for element in frame.get("elements") or []:
|
||||
element_id = str(element.get("id") or "")
|
||||
cutout_ids = list(element.get("cutouts") or [])
|
||||
legacy_cutout = element.get("cutout_id")
|
||||
if legacy_cutout and legacy_cutout not in cutout_ids:
|
||||
cutout_ids.append(legacy_cutout)
|
||||
for cutout_id in cutout_ids:
|
||||
add(
|
||||
f"{job_id}:cutout:{idx}:{element_id}:{cutout_id}",
|
||||
"image",
|
||||
"element_cutout",
|
||||
job_path / "elements" / f"{idx:03d}_{element_id}_{cutout_id}.jpg",
|
||||
f"/jobs/{job_id}/frames/{idx}/elements/{element_id}/cutouts/{cutout_id}.jpg",
|
||||
frame_index=idx,
|
||||
metadata={"element_id": element_id, "name_zh": element.get("name_zh")},
|
||||
)
|
||||
for subject_asset in element.get("subject_assets") or []:
|
||||
asset_id = str(subject_asset.get("id") or "")
|
||||
if asset_id:
|
||||
add(
|
||||
f"{job_id}:subject_asset:{asset_id}",
|
||||
"image",
|
||||
"subject_asset",
|
||||
job_path / "assets" / f"{asset_id}.jpg",
|
||||
str(subject_asset.get("url") or ""),
|
||||
frame_index=idx,
|
||||
width=int(subject_asset.get("width") or 0),
|
||||
height=int(subject_asset.get("height") or 0),
|
||||
metadata={"view": subject_asset.get("view"), "label": subject_asset.get("label")},
|
||||
)
|
||||
|
||||
for ref in job.get("product_refs") or []:
|
||||
asset_id = str(ref.get("id") or ref.get("asset_id") or ref.get("url") or "")
|
||||
if asset_id:
|
||||
add(
|
||||
f"{job_id}:product_ref:{asset_id}",
|
||||
"image",
|
||||
"product_ref",
|
||||
self._path_from_job_url(job_path, job_id, str(ref.get("url") or "")),
|
||||
str(ref.get("url") or ""),
|
||||
metadata=ref,
|
||||
)
|
||||
|
||||
for video in job.get("generated_videos") or []:
|
||||
video_id = str(video.get("id") or "")
|
||||
if video_id:
|
||||
add(
|
||||
f"{job_id}:generated_video:{video_id}",
|
||||
"video",
|
||||
"generated_video",
|
||||
job_path / "videos" / f"{video_id}.mp4",
|
||||
str(video.get("url") or ""),
|
||||
frame_index=video.get("frame_idx"),
|
||||
duration=float(video.get("duration") or 0),
|
||||
metadata={"status": video.get("status"), "model": video.get("model"), "error": video.get("error")},
|
||||
)
|
||||
return items
|
||||
|
||||
def _path_from_job_url(self, job_path: Path, job_id: str, url: str) -> str:
|
||||
prefix = f"/jobs/{job_id}/"
|
||||
if not url.startswith(prefix):
|
||||
return ""
|
||||
tail = url[len(prefix):]
|
||||
if tail == "video.mp4":
|
||||
return str(job_path / "source.mp4")
|
||||
return str(job_path / tail)
|
||||
|
||||
def delete_job(self, job_id: str) -> None:
|
||||
if not self.enabled:
|
||||
return
|
||||
with self.connect() as conn:
|
||||
row = conn.execute("SELECT document_id FROM jobs WHERE id = ?", (job_id,)).fetchone()
|
||||
conn.execute("DELETE FROM jobs WHERE id = ?", (job_id,))
|
||||
if row:
|
||||
remaining = conn.execute(
|
||||
"SELECT COUNT(*) AS c FROM jobs WHERE document_id = ?",
|
||||
(row["document_id"],),
|
||||
).fetchone()
|
||||
if int(remaining["c"] or 0) == 0:
|
||||
conn.execute("DELETE FROM documents WHERE id = ?", (row["document_id"],))
|
||||
|
||||
def list_documents(self, limit: int | None = None) -> list[dict[str, Any]]:
|
||||
sql = """
|
||||
SELECT
|
||||
d.*,
|
||||
COUNT(DISTINCT j.id) AS job_count,
|
||||
COUNT(DISTINCT a.id) AS asset_count
|
||||
FROM documents d
|
||||
LEFT JOIN jobs j ON j.document_id = d.id
|
||||
LEFT JOIN media_assets a ON a.document_id = d.id
|
||||
GROUP BY d.id
|
||||
ORDER BY d.updated_at DESC
|
||||
"""
|
||||
params: tuple[Any, ...] = ()
|
||||
if limit is not None and limit > 0:
|
||||
sql += " LIMIT ?"
|
||||
params = (limit,)
|
||||
with self.connect() as conn:
|
||||
rows = conn.execute(sql, params).fetchall()
|
||||
return [dict(row) for row in rows]
|
||||
|
||||
def health(self) -> dict[str, Any]:
|
||||
if not self.enabled:
|
||||
return {"enabled": False, "url": redact_database_url(self.url), "error": self.error}
|
||||
try:
|
||||
with self.connect() as conn:
|
||||
docs = conn.execute("SELECT COUNT(*) AS c FROM documents").fetchone()["c"]
|
||||
jobs = conn.execute("SELECT COUNT(*) AS c FROM jobs").fetchone()["c"]
|
||||
assets = conn.execute("SELECT COUNT(*) AS c FROM media_assets").fetchone()["c"]
|
||||
return {
|
||||
"enabled": True,
|
||||
"url": redact_database_url(self.url),
|
||||
"schema_version": SCHEMA_VERSION,
|
||||
"documents": int(docs or 0),
|
||||
"jobs": int(jobs or 0),
|
||||
"assets": int(assets or 0),
|
||||
}
|
||||
except Exception as e:
|
||||
return {"enabled": False, "url": redact_database_url(self.url), "error": str(e)}
|
||||
|
||||
|
||||
def create_database(url: str, jobs_dir: Path) -> AppDatabase:
|
||||
db = AppDatabase(url, jobs_dir)
|
||||
db.init()
|
||||
return db
|
||||
345
api/main.py
345
api/main.py
@@ -25,10 +25,19 @@ from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import FileResponse
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from database import create_database, default_database_url, default_workflow_mode, infer_source_kind, storage_prefix
|
||||
|
||||
load_dotenv()
|
||||
|
||||
JOBS_DIR = Path(os.getenv("JOBS_DIR", "./jobs")).resolve()
|
||||
JOBS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
DATABASE_URL = default_database_url(JOBS_DIR)
|
||||
DB_INIT_ERROR = ""
|
||||
try:
|
||||
DB = create_database(DATABASE_URL, JOBS_DIR)
|
||||
except Exception as e:
|
||||
DB = None
|
||||
DB_INIT_ERROR = str(e)
|
||||
CORS_ORIGINS = [o.strip() for o in os.getenv("CORS_ORIGINS", "http://localhost:4290,http://127.0.0.1:4290").split(",") if o.strip()]
|
||||
PRODUCT_LIBRARY_DIR = Path(
|
||||
os.getenv("PRODUCT_LIBRARY_DIR", Path(__file__).resolve().parent / "product_library" / "skg-products")
|
||||
@@ -48,8 +57,18 @@ LOCAL_ASR_BIN = os.getenv("LOCAL_ASR_BIN", "").strip()
|
||||
LOCAL_ASR_MODEL = os.getenv("LOCAL_ASR_MODEL", "mlx-community/whisper-tiny").strip() or "mlx-community/whisper-tiny"
|
||||
LOCAL_ASR_TIMEOUT_SECONDS = max(30, int(os.getenv("LOCAL_ASR_TIMEOUT_SECONDS", "180")))
|
||||
TRANSLATE_MODEL = os.getenv("TRANSLATE_MODEL", "gemini-2.5-flash")
|
||||
REWRITE_MODEL = os.getenv("REWRITE_MODEL", "gemini-2.5-pro")
|
||||
VISION_MODEL = os.getenv("VISION_MODEL", "gemini-2.5-flash")
|
||||
DEFAULT_GPT_TEXT_MODEL = os.getenv("GPT_TEXT_MODEL", "gpt-4o").strip() or "gpt-4o"
|
||||
|
||||
|
||||
def gpt_model_env(name: str, default: str | None = None) -> str:
|
||||
value = os.getenv(name, default or DEFAULT_GPT_TEXT_MODEL).strip()
|
||||
if not value or value.lower().startswith("gemini-"):
|
||||
return default or DEFAULT_GPT_TEXT_MODEL
|
||||
return value
|
||||
|
||||
|
||||
REWRITE_MODEL = gpt_model_env("REWRITE_MODEL")
|
||||
VISION_MODEL = gpt_model_env("VISION_MODEL")
|
||||
IMAGE_BASE_URL = os.getenv("IMAGE_BASE_URL", LLM_BASE_URL).strip()
|
||||
IMAGE_API_KEY = os.getenv("IMAGE_API_KEY", LLM_API_KEY).strip()
|
||||
AI_HTTP_PROXY = (
|
||||
@@ -73,29 +92,14 @@ PRODUCT_ASSET_MIN_LONG_SIDE = max(512, int(os.getenv("PRODUCT_ASSET_MIN_LONG_SID
|
||||
PRODUCT_ASSET_MIN_SHORT_SIDE = max(320, int(os.getenv("PRODUCT_ASSET_MIN_SHORT_SIDE", "600")))
|
||||
PRODUCT_ASSET_JPEG_QUALITY = max(80, min(95, int(os.getenv("PRODUCT_ASSET_JPEG_QUALITY", "92"))))
|
||||
VIDEO_MODEL = os.getenv("VIDEO_MODEL", "seedance").strip() or "seedance"
|
||||
YTDLP_COOKIES_FILE = os.getenv("YTDLP_COOKIES_FILE", "").strip()
|
||||
YTDLP_COOKIES_FROM_BROWSER = os.getenv("YTDLP_COOKIES_FROM_BROWSER", "").strip()
|
||||
AUDIO_PRODUCT_BRIEF = os.getenv(
|
||||
"AUDIO_PRODUCT_BRIEF",
|
||||
"SKG 智能按摩产品,主打日常肩颈、腰背、眼部、膝盖或足部放松;广告表达要高级、干净、可信,不做医疗疗效承诺。",
|
||||
).strip()
|
||||
AUDIO_REWRITE_MODEL = os.getenv("AUDIO_REWRITE_MODEL", REWRITE_MODEL).strip() or REWRITE_MODEL
|
||||
MINIMAX_API_KEY = os.getenv("MINIMAX_API_KEY", "").strip()
|
||||
MINIMAX_TTS_BASE_URL = os.getenv("MINIMAX_TTS_BASE_URL", "https://api.minimax.io").strip().rstrip("/")
|
||||
MINIMAX_TTS_MODEL = os.getenv("MINIMAX_TTS_MODEL", "speech-2.8-turbo").strip() or "speech-2.8-turbo"
|
||||
MINIMAX_TTS_VOICE_ID = os.getenv(
|
||||
"MINIMAX_TTS_VOICE_ID",
|
||||
"English_expressive_narrator",
|
||||
).strip() or "English_expressive_narrator"
|
||||
DEFAULT_MINIMAX_TTS_VOICE_POOL = [
|
||||
"English_magnetic_voiced_man",
|
||||
"English_Upbeat_Woman",
|
||||
"English_MaturePartner",
|
||||
]
|
||||
MINIMAX_TTS_VOICE_POOL = [
|
||||
v.strip()
|
||||
for v in os.getenv("MINIMAX_TTS_VOICE_POOL", ",".join(DEFAULT_MINIMAX_TTS_VOICE_POOL)).split(",")
|
||||
if v.strip()
|
||||
]
|
||||
VOICE_PROVIDER = os.getenv("VOICE_PROVIDER", "azure_openai").strip().lower() or "azure_openai"
|
||||
AUDIO_REWRITE_MODEL = gpt_model_env("AUDIO_REWRITE_MODEL", REWRITE_MODEL)
|
||||
VOICE_PROVIDER = "azure_openai"
|
||||
AZURE_OPENAI_BASE_URL = os.getenv("AZURE_OPENAI_BASE_URL", "https://ai.skg.com/azure").strip().rstrip("/")
|
||||
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY", LLM_API_KEY).strip()
|
||||
AZURE_TTS_MODEL = os.getenv("AZURE_TTS_MODEL", "gpt-4o-mini-tts").strip() or "gpt-4o-mini-tts"
|
||||
@@ -107,6 +111,11 @@ AZURE_TTS_VOICE_POOL = [
|
||||
if v.strip()
|
||||
]
|
||||
AZURE_TTS_PATH = os.getenv("AZURE_TTS_PATH", "/audio/speech").strip() or "/audio/speech"
|
||||
AZURE_TTS_PATHS = [
|
||||
p.strip()
|
||||
for p in os.getenv("AZURE_TTS_PATHS", f"{AZURE_TTS_PATH},/audio/speech,/v1/audio/speech").split(",")
|
||||
if p.strip()
|
||||
]
|
||||
|
||||
POE_API_BASE_URL = os.getenv("POE_API_BASE_URL", "https://api.poe.com/v1").strip() or "https://api.poe.com/v1"
|
||||
POE_API_KEY = os.getenv("POE_API_KEY", "").strip()
|
||||
@@ -238,8 +247,8 @@ JobStatus = Literal[
|
||||
"transcribing", "transcribed", "failed",
|
||||
]
|
||||
|
||||
KEYFRAME_COUNT = int(os.getenv("KEYFRAME_COUNT", "12"))
|
||||
FrameExtractTarget = Literal["transparent_human", "balanced", "subject", "transition", "expression", "motion"]
|
||||
KEYFRAME_COUNT = int(os.getenv("KEYFRAME_COUNT", "6"))
|
||||
FrameExtractTarget = Literal["random_subject", "transparent_human", "balanced", "subject", "transition", "expression", "motion"]
|
||||
FrameExtractMode = Literal["replace", "append"]
|
||||
FrameExtractQuality = Literal["auto", "fast", "accurate", "ultra"]
|
||||
AnalyzeTask = tuple[str, int, FrameExtractTarget, FrameExtractMode, FrameExtractQuality]
|
||||
@@ -252,6 +261,7 @@ SceneMode = Literal["remove_subject", "similar", "style"]
|
||||
SceneStyle = Literal["source", "premium_product", "clean_studio", "warm_lifestyle", "cinematic"]
|
||||
SceneAssetRole = Literal["scene", "first_frame", "last_frame"]
|
||||
FRAME_TARGET_LABELS: dict[FrameExtractTarget, str] = {
|
||||
"random_subject": "人物随机",
|
||||
"transparent_human": "透明骨架人",
|
||||
"balanced": "综合关键帧",
|
||||
"subject": "清晰主体",
|
||||
@@ -541,6 +551,10 @@ class AudioScript(BaseModel):
|
||||
class Job(BaseModel):
|
||||
id: str
|
||||
url: str
|
||||
document_id: str = ""
|
||||
source_kind: Literal["tiktok_link", "upload", "unknown"] = "unknown"
|
||||
workflow_mode: Literal["feed_recreation", "uploaded_reference"] = "feed_recreation"
|
||||
storage_prefix: str = ""
|
||||
status: JobStatus = "created"
|
||||
progress: int = 0
|
||||
message: str = ""
|
||||
@@ -640,8 +654,26 @@ def job_with_artifacts(job: Job) -> Job:
|
||||
return job.model_copy(update=updates)
|
||||
|
||||
|
||||
def ensure_job_document_fields(job: Job) -> Job:
|
||||
source_kind = job.source_kind if job.source_kind != "unknown" else infer_source_kind(job.url)
|
||||
workflow_mode = job.workflow_mode or default_workflow_mode(source_kind)
|
||||
document_id = job.document_id or job.id
|
||||
job.source_kind = source_kind if source_kind in {"tiktok_link", "upload"} else "unknown"
|
||||
job.workflow_mode = workflow_mode if workflow_mode in {"feed_recreation", "uploaded_reference"} else "feed_recreation"
|
||||
job.document_id = document_id
|
||||
job.storage_prefix = job.storage_prefix or storage_prefix(document_id, job.source_kind, job.workflow_mode)
|
||||
return job
|
||||
|
||||
|
||||
def save_state(job: Job) -> None:
|
||||
(job_dir(job.id) / "state.json").write_text(job.model_dump_json(indent=2))
|
||||
ensure_job_document_fields(job)
|
||||
d = job_dir(job.id)
|
||||
(d / "state.json").write_text(job.model_dump_json(indent=2))
|
||||
if DB:
|
||||
try:
|
||||
DB.sync_job(job.model_dump(mode="json"), d)
|
||||
except Exception as e:
|
||||
print(f"[database sync failed] job={job.id} error={e}", flush=True)
|
||||
|
||||
|
||||
def update(job: Job, **kw) -> None:
|
||||
@@ -884,6 +916,12 @@ async def lifespan(_: FastAPI):
|
||||
message="服务重启 · 上次音频处理已中断,可重新处理",
|
||||
)
|
||||
JOBS[p.name] = job
|
||||
ensure_job_document_fields(job)
|
||||
if DB:
|
||||
try:
|
||||
DB.sync_job(job.model_dump(mode="json"), p)
|
||||
except Exception as e:
|
||||
print(f"[database restore sync failed] job={job.id} error={e}", flush=True)
|
||||
except Exception:
|
||||
pass
|
||||
yield
|
||||
@@ -995,6 +1033,35 @@ def run(cmd: list[str], cwd: Path | None = None) -> str:
|
||||
return res.stdout
|
||||
|
||||
|
||||
def ytdlp_cookie_args() -> list[str]:
|
||||
if YTDLP_COOKIES_FILE:
|
||||
cookies = Path(YTDLP_COOKIES_FILE).expanduser()
|
||||
if not cookies.exists():
|
||||
raise RuntimeError("TikTok cookies 文件不可用,请检查 YTDLP_COOKIES_FILE 配置。")
|
||||
return ["--cookies", str(cookies)]
|
||||
if YTDLP_COOKIES_FROM_BROWSER:
|
||||
return ["--cookies-from-browser", YTDLP_COOKIES_FROM_BROWSER]
|
||||
return []
|
||||
|
||||
|
||||
def normalize_download_error(error: Exception) -> str:
|
||||
raw = str(error)
|
||||
lower = raw.lower()
|
||||
auth_required = (
|
||||
"log in for access" in lower
|
||||
or "login" in lower and "cookies" in lower
|
||||
or "cookies-from-browser" in lower
|
||||
or "sign in" in lower and "tiktok" in lower
|
||||
)
|
||||
if auth_required:
|
||||
return (
|
||||
"TikTok 下载需要登录态。请上传视频文件,或在后端配置 "
|
||||
"YTDLP_COOKIES_FILE / YTDLP_COOKIES_FROM_BROWSER 后重试。"
|
||||
f"原始错误:{raw}"
|
||||
)
|
||||
return raw
|
||||
|
||||
|
||||
# ---- 启发式选帧工具 ----
|
||||
import imagehash
|
||||
import numpy as np
|
||||
@@ -1408,7 +1475,10 @@ def _target_score(item: dict, target: FrameExtractTarget) -> float:
|
||||
scene = float(item.get("scene_score_n", 0.0))
|
||||
motion = float(item.get("motion_n", 0.0))
|
||||
|
||||
if target == "transparent_human":
|
||||
if target == "random_subject":
|
||||
# 人物定向随机抽帧先用中心主体/清晰度形成候选池,再在池内随机取样。
|
||||
score = center * 0.52 + sharp * 0.24 + contrast * 0.14 + color * 0.10
|
||||
elif target == "transparent_human":
|
||||
# 当前抽帧阶段走本地算力:优先清晰中心主体、高对比、适度色彩和时间覆盖。
|
||||
# 透明骨架人的语义判断留给后续审核/识别,不在抽帧阶段逐帧调用 Vision。
|
||||
score = center * 0.45 + sharp * 0.30 + contrast * 0.15 + color * 0.10
|
||||
@@ -1460,6 +1530,15 @@ def _select_keyframes(candidates: list[dict], n: int, target: FrameExtractTarget
|
||||
elif it["score"] > dup["score"]:
|
||||
deduped[deduped.index(dup)] = it
|
||||
|
||||
if target == "random_subject":
|
||||
# 人物定向随机:从清晰、中心主体更强的候选池里随机抽,不再按动作峰值排序。
|
||||
ranked = sorted(deduped, key=lambda x: -float(x.get("score", 0.0)))
|
||||
pool_size = min(len(ranked), max(n * 6, n + 8))
|
||||
pool = ranked[:pool_size] if pool_size > 0 else ranked
|
||||
selected = random.sample(pool, k=min(n, len(pool))) if len(pool) > n else list(pool)
|
||||
selected.sort(key=lambda x: x["idx"])
|
||||
return selected
|
||||
|
||||
# 时序分桶:把候选时间轴等分 n 段,每段取当前目标下最优的
|
||||
total = len(candidates)
|
||||
buckets: list[list[dict]] = [[] for _ in range(n)]
|
||||
@@ -1648,13 +1727,15 @@ def pipeline_download(job_id: str) -> None:
|
||||
update(job, status="downloading", message="本地上传 · 跳过下载", progress=15)
|
||||
else:
|
||||
update(job, status="downloading", message="yt-dlp 下载中…", progress=5)
|
||||
run([
|
||||
cmd = [
|
||||
"yt-dlp", "-f", "best[ext=mp4]/best",
|
||||
"-o", str(mp4),
|
||||
"--no-warnings", "--no-playlist",
|
||||
"--retries", "3",
|
||||
*ytdlp_cookie_args(),
|
||||
job.url,
|
||||
])
|
||||
]
|
||||
run(cmd)
|
||||
if not mp4.exists():
|
||||
raise RuntimeError("下载完成但找不到 source.mp4")
|
||||
|
||||
@@ -1677,13 +1758,13 @@ def pipeline_download(job_id: str) -> None:
|
||||
)
|
||||
except Exception as e:
|
||||
message = "视频元数据解析失败" if stage == "metadata" else "下载失败"
|
||||
update(job, status="failed", error=str(e), message=message)
|
||||
update(job, status="failed", error=normalize_download_error(e), message=message)
|
||||
|
||||
|
||||
def pipeline_analyze(
|
||||
job_id: str,
|
||||
frame_count: int = KEYFRAME_COUNT,
|
||||
target: FrameExtractTarget = "transparent_human",
|
||||
target: FrameExtractTarget = "random_subject",
|
||||
mode: FrameExtractMode = "replace",
|
||||
quality: FrameExtractQuality = "auto",
|
||||
) -> None:
|
||||
@@ -1849,7 +1930,7 @@ def analyze_queue_worker() -> None:
|
||||
ANALYZE_WORKER_RUNNING = False
|
||||
|
||||
|
||||
# ---------- 音频转写 + 翻译 + SKG 改写 + MiniMax 配音 ----------
|
||||
# ---------- 音频转写 + 翻译 + SKG 改写 + Azure OpenAI 配音 ----------
|
||||
|
||||
class TranscriptionUnavailable(RuntimeError):
|
||||
pass
|
||||
@@ -2305,18 +2386,6 @@ def _rewrite_audio_script_sync(segments: list[TranscriptSegment], target_seconds
|
||||
return fallback, f"改写失败,使用本地模板:{e}"
|
||||
|
||||
|
||||
def _minimax_tts_url() -> str:
|
||||
if MINIMAX_TTS_BASE_URL.endswith("/v1/t2a_v2"):
|
||||
return MINIMAX_TTS_BASE_URL
|
||||
return f"{MINIMAX_TTS_BASE_URL}/v1/t2a_v2"
|
||||
|
||||
|
||||
def _choose_minimax_voice_id() -> str:
|
||||
if MINIMAX_TTS_VOICE_POOL:
|
||||
return random.choice(MINIMAX_TTS_VOICE_POOL)
|
||||
return MINIMAX_TTS_VOICE_ID
|
||||
|
||||
|
||||
def _choose_azure_voice_id() -> str:
|
||||
if AZURE_TTS_VOICE_POOL:
|
||||
return random.choice(AZURE_TTS_VOICE_POOL)
|
||||
@@ -2324,9 +2393,7 @@ def _choose_azure_voice_id() -> str:
|
||||
|
||||
|
||||
def _choose_tts_voice_id() -> str:
|
||||
if VOICE_PROVIDER == "azure_openai":
|
||||
return _choose_azure_voice_id()
|
||||
return _choose_minimax_voice_id()
|
||||
return _choose_azure_voice_id()
|
||||
|
||||
|
||||
def _voice_speed_for(voice_id: str, target_seconds: float, text: str) -> float:
|
||||
@@ -2343,60 +2410,22 @@ def _voice_speed_for(voice_id: str, target_seconds: float, text: str) -> float:
|
||||
return 0.99
|
||||
|
||||
|
||||
def _minimax_tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> str:
|
||||
if not MINIMAX_API_KEY:
|
||||
raise RuntimeError("MINIMAX_API_KEY 未配置,未生成配音")
|
||||
if not text.strip():
|
||||
raise RuntimeError("改写文案为空,未生成配音")
|
||||
payload = {
|
||||
"model": MINIMAX_TTS_MODEL,
|
||||
"text": text.strip()[:9500],
|
||||
"stream": False,
|
||||
"language_boost": "English",
|
||||
"output_format": "hex",
|
||||
"voice_setting": {
|
||||
"voice_id": voice_id,
|
||||
"speed": _voice_speed_for(voice_id, target_seconds, text),
|
||||
"vol": 1,
|
||||
"pitch": 0,
|
||||
},
|
||||
"audio_setting": {
|
||||
"sample_rate": 32000,
|
||||
"bitrate": 128000,
|
||||
"format": "mp3",
|
||||
"channel": 1,
|
||||
},
|
||||
}
|
||||
resp = httpx.post(
|
||||
_minimax_tts_url(),
|
||||
headers={"Authorization": f"Bearer {MINIMAX_API_KEY}", "Content-Type": "application/json"},
|
||||
json=payload,
|
||||
timeout=90,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
data = resp.json()
|
||||
base_resp = data.get("base_resp") or {}
|
||||
if int(base_resp.get("status_code", 0) or 0) != 0:
|
||||
raise RuntimeError(base_resp.get("status_msg") or "MiniMax TTS 返回失败")
|
||||
audio_hex = ((data.get("data") or {}).get("audio") or "").strip()
|
||||
if not audio_hex:
|
||||
raise RuntimeError("MiniMax TTS 未返回 audio hex")
|
||||
try:
|
||||
audio_bytes = bytes.fromhex(audio_hex)
|
||||
except ValueError as e:
|
||||
raise RuntimeError(f"MiniMax TTS audio hex 无法解析:{e}") from e
|
||||
out = job_dir(job_id) / "audio_script.mp3"
|
||||
out.write_bytes(audio_bytes)
|
||||
return f"/jobs/{job_id}/audio-script.mp3"
|
||||
|
||||
|
||||
def _azure_tts_url() -> str:
|
||||
path = AZURE_TTS_PATH if AZURE_TTS_PATH.startswith("/") else f"/{AZURE_TTS_PATH}"
|
||||
def _azure_tts_url_for(path_value: str) -> str:
|
||||
path = path_value if path_value.startswith("/") else f"/{path_value}"
|
||||
if AZURE_OPENAI_BASE_URL.endswith(path):
|
||||
return AZURE_OPENAI_BASE_URL
|
||||
return f"{AZURE_OPENAI_BASE_URL}{path}"
|
||||
|
||||
|
||||
def _azure_tts_urls() -> list[str]:
|
||||
urls: list[str] = []
|
||||
for path in AZURE_TTS_PATHS or [AZURE_TTS_PATH]:
|
||||
url = _azure_tts_url_for(path)
|
||||
if url not in urls:
|
||||
urls.append(url)
|
||||
return urls
|
||||
|
||||
|
||||
def _azure_openai_tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> str:
|
||||
if not AZURE_OPENAI_API_KEY:
|
||||
raise RuntimeError("AZURE_OPENAI_API_KEY 或 LLM_API_KEY 未配置,未生成配音")
|
||||
@@ -2409,18 +2438,32 @@ def _azure_openai_tts_sync(job_id: str, text: str, voice_id: str, target_seconds
|
||||
"response_format": "mp3",
|
||||
"speed": _voice_speed_for(voice_id, target_seconds, text),
|
||||
}
|
||||
resp = httpx.post(
|
||||
_azure_tts_url(),
|
||||
headers={
|
||||
"Authorization": f"Bearer {AZURE_OPENAI_API_KEY}",
|
||||
"api-key": AZURE_OPENAI_API_KEY,
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
json=payload,
|
||||
timeout=120,
|
||||
)
|
||||
headers = {
|
||||
"Authorization": f"Bearer {AZURE_OPENAI_API_KEY}",
|
||||
"api-key": AZURE_OPENAI_API_KEY,
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp: httpx.Response | None = None
|
||||
errors: list[str] = []
|
||||
with ai_http_client(timeout=120) as client:
|
||||
for url in _azure_tts_urls():
|
||||
try:
|
||||
current = client.post(url, headers=headers, json=payload)
|
||||
except Exception as e:
|
||||
errors.append(f"{url}: {type(e).__name__}: {e}")
|
||||
continue
|
||||
if current.status_code < 400:
|
||||
resp = current
|
||||
break
|
||||
errors.append(f"{url}: HTTP {current.status_code}: {current.text[:180]}")
|
||||
if current.status_code not in {404, 405}:
|
||||
resp = current
|
||||
break
|
||||
if resp is None:
|
||||
raise RuntimeError("Azure OpenAI TTS 不可用;已尝试 " + " | ".join(errors))
|
||||
if resp.status_code >= 400:
|
||||
raise RuntimeError(f"Azure OpenAI TTS HTTP {resp.status_code}: {resp.text[:300]}")
|
||||
detail = " | ".join(errors) or resp.text[:300]
|
||||
raise RuntimeError(f"Azure OpenAI TTS HTTP {resp.status_code}: {detail[:600]}")
|
||||
audio_bytes = resp.content
|
||||
if not audio_bytes:
|
||||
raise RuntimeError("Azure OpenAI TTS 未返回音频内容")
|
||||
@@ -2437,9 +2480,7 @@ def _azure_openai_tts_sync(job_id: str, text: str, voice_id: str, target_seconds
|
||||
|
||||
|
||||
def _tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> tuple[str, str, str]:
|
||||
if VOICE_PROVIDER == "azure_openai":
|
||||
return _azure_openai_tts_sync(job_id, text, voice_id, target_seconds), "azure_openai", AZURE_TTS_MODEL
|
||||
return _minimax_tts_sync(job_id, text, voice_id, target_seconds), "minimax", MINIMAX_TTS_MODEL
|
||||
return _azure_openai_tts_sync(job_id, text, voice_id, target_seconds), "azure_openai", AZURE_TTS_MODEL
|
||||
|
||||
|
||||
def _build_audio_script_sync(job_id: str, segments: list[TranscriptSegment], target_seconds: float = 12.0) -> AudioScript:
|
||||
@@ -2451,8 +2492,8 @@ def _build_audio_script_sync(job_id: str, segments: list[TranscriptSegment], tar
|
||||
speaker_profile, rhythm_profile = _audio_delivery_profile(segments, duration, selected_voice_id)
|
||||
voice_url = ""
|
||||
voice_error = ""
|
||||
voice_provider = "azure_openai" if VOICE_PROVIDER == "azure_openai" else "minimax"
|
||||
voice_model = AZURE_TTS_MODEL if voice_provider == "azure_openai" else MINIMAX_TTS_MODEL
|
||||
voice_provider = "azure_openai"
|
||||
voice_model = AZURE_TTS_MODEL
|
||||
try:
|
||||
voice_url, voice_provider, voice_model = _tts_sync(job_id, rewritten, selected_voice_id, duration)
|
||||
except Exception as e:
|
||||
@@ -3050,7 +3091,8 @@ def health() -> dict:
|
||||
"auth_configured": WEB_AUTH_CONFIGURED,
|
||||
"base_url": LLM_BASE_URL or "openai-default",
|
||||
"image_base_url": IMAGE_BASE_URL or LLM_BASE_URL or "openai-default",
|
||||
"voice_base_url": AZURE_OPENAI_BASE_URL if VOICE_PROVIDER == "azure_openai" else MINIMAX_TTS_BASE_URL,
|
||||
"voice_base_url": AZURE_OPENAI_BASE_URL,
|
||||
"database": DB.health() if DB else {"enabled": False, "url": DATABASE_URL, "error": DB_INIT_ERROR},
|
||||
"models": {
|
||||
"asr": ASR_MODEL,
|
||||
"local_asr": LOCAL_ASR_MODEL,
|
||||
@@ -3067,15 +3109,12 @@ def health() -> dict:
|
||||
"subject_image": SUBJECT_ASSET_IMAGE_MODEL,
|
||||
"subject_image_fallbacks": SUBJECT_ASSET_IMAGE_MODELS,
|
||||
"voice_provider": VOICE_PROVIDER,
|
||||
"voice_base_url": AZURE_OPENAI_BASE_URL if VOICE_PROVIDER == "azure_openai" else MINIMAX_TTS_BASE_URL,
|
||||
"voice_tts": AZURE_TTS_MODEL if VOICE_PROVIDER == "azure_openai" else MINIMAX_TTS_MODEL,
|
||||
"voice_id": AZURE_TTS_VOICE_ID if VOICE_PROVIDER == "azure_openai" else MINIMAX_TTS_VOICE_ID,
|
||||
"voice_pool": AZURE_TTS_VOICE_POOL if VOICE_PROVIDER == "azure_openai" else (MINIMAX_TTS_VOICE_POOL or [MINIMAX_TTS_VOICE_ID]),
|
||||
"voice_configured": bool(AZURE_OPENAI_API_KEY) if VOICE_PROVIDER == "azure_openai" else bool(MINIMAX_API_KEY),
|
||||
"minimax_tts": MINIMAX_TTS_MODEL,
|
||||
"minimax_voice": MINIMAX_TTS_VOICE_ID,
|
||||
"minimax_voice_pool": MINIMAX_TTS_VOICE_POOL or [MINIMAX_TTS_VOICE_ID],
|
||||
"minimax_configured": bool(MINIMAX_API_KEY),
|
||||
"voice_base_url": AZURE_OPENAI_BASE_URL,
|
||||
"voice_tts": AZURE_TTS_MODEL,
|
||||
"voice_tts_paths": AZURE_TTS_PATHS,
|
||||
"voice_id": AZURE_TTS_VOICE_ID,
|
||||
"voice_pool": AZURE_TTS_VOICE_POOL,
|
||||
"voice_configured": bool(AZURE_OPENAI_API_KEY),
|
||||
"video": VIDEO_MODEL,
|
||||
"video_aliases": VIDEO_MODEL_ALIASES,
|
||||
"video_provider": video_provider_name(),
|
||||
@@ -3088,6 +3127,9 @@ def health() -> dict:
|
||||
|
||||
class JobSummary(BaseModel):
|
||||
id: str
|
||||
document_id: str = ""
|
||||
source_kind: str = "unknown"
|
||||
workflow_mode: str = "feed_recreation"
|
||||
url: str
|
||||
status: JobStatus
|
||||
progress: int = 0
|
||||
@@ -3103,6 +3145,29 @@ class JobSummary(BaseModel):
|
||||
mtime: float = 0.0
|
||||
|
||||
|
||||
class DocumentSummary(BaseModel):
|
||||
id: str
|
||||
title: str
|
||||
source_kind: str
|
||||
workflow_mode: str
|
||||
source_url: str = ""
|
||||
primary_job_id: str = ""
|
||||
status: str = "created"
|
||||
storage_prefix: str = ""
|
||||
job_count: int = 0
|
||||
asset_count: int = 0
|
||||
created_at: float = 0.0
|
||||
updated_at: float = 0.0
|
||||
|
||||
|
||||
@app.get("/documents", response_model=list[DocumentSummary])
|
||||
def list_documents(limit: int | None = None) -> list[DocumentSummary]:
|
||||
if not DB:
|
||||
return []
|
||||
rows = DB.list_documents(limit)
|
||||
return [DocumentSummary(**row) for row in rows]
|
||||
|
||||
|
||||
@app.get("/jobs", response_model=list[JobSummary])
|
||||
def list_jobs(limit: int | None = None) -> list[JobSummary]:
|
||||
"""所有 job 的精简列表,按磁盘 state.json mtime 倒序(最新优先)。前端无 ?job= 时用它回填历史。"""
|
||||
@@ -3111,8 +3176,12 @@ def list_jobs(limit: int | None = None) -> list[JobSummary]:
|
||||
state_path = JOBS_DIR / job_id / "state.json"
|
||||
mtime = state_path.stat().st_mtime if state_path.exists() else 0.0
|
||||
thumb = f"/jobs/{job_id}/frames/{job.frames[0].index}.jpg" if job.frames else ""
|
||||
ensure_job_document_fields(job)
|
||||
items.append(JobSummary(
|
||||
id=job.id,
|
||||
document_id=job.document_id,
|
||||
source_kind=job.source_kind,
|
||||
workflow_mode=job.workflow_mode,
|
||||
url=job.url,
|
||||
status=job.status,
|
||||
progress=job.progress,
|
||||
@@ -3138,13 +3207,38 @@ async def create_job(req: CreateJobReq, bg: BackgroundTasks) -> Job:
|
||||
if not req.url.strip():
|
||||
raise HTTPException(400, "url required")
|
||||
job_id = uuid.uuid4().hex[:12]
|
||||
job = Job(id=job_id, url=req.url.strip())
|
||||
job = Job(id=job_id, url=req.url.strip(), document_id=job_id, source_kind="tiktok_link", workflow_mode="feed_recreation")
|
||||
JOBS[job_id] = job
|
||||
save_state(job)
|
||||
bg.add_task(pipeline_download, job_id)
|
||||
return job
|
||||
|
||||
|
||||
@app.post("/jobs/{job_id}/download/retry", response_model=Job)
|
||||
async def retry_job_download(job_id: str, bg: BackgroundTasks) -> Job:
|
||||
job = JOBS.get(job_id)
|
||||
if not job:
|
||||
raise HTTPException(404, "job not found")
|
||||
if job.source_kind == "upload" or job.url.startswith("upload://"):
|
||||
raise HTTPException(409, "uploaded videos cannot be redownloaded; upload the file again")
|
||||
if job.status in {"downloading", "splitting", "transcribing"}:
|
||||
raise HTTPException(409, f"job is busy: {job.status}")
|
||||
|
||||
mp4 = job_dir(job_id) / "source.mp4"
|
||||
if mp4.exists() and mp4.stat().st_size == 0:
|
||||
mp4.unlink()
|
||||
update(
|
||||
job,
|
||||
status="downloading",
|
||||
progress=1,
|
||||
error="",
|
||||
message="重新提交下载…",
|
||||
video_url="",
|
||||
)
|
||||
bg.add_task(pipeline_download, job_id)
|
||||
return job
|
||||
|
||||
|
||||
@app.post("/jobs/upload", response_model=Job)
|
||||
async def create_job_from_upload(bg: BackgroundTasks, file: UploadFile = File(...)) -> Job:
|
||||
if not file.filename:
|
||||
@@ -3162,7 +3256,7 @@ async def create_job_from_upload(bg: BackgroundTasks, file: UploadFile = File(..
|
||||
if not mp4.exists() or mp4.stat().st_size == 0:
|
||||
raise HTTPException(500, "upload failed")
|
||||
|
||||
job = Job(id=job_id, url=f"upload://{file.filename}")
|
||||
job = Job(id=job_id, url=f"upload://{file.filename}", document_id=job_id, source_kind="upload", workflow_mode="uploaded_reference")
|
||||
JOBS[job_id] = job
|
||||
save_state(job)
|
||||
bg.add_task(pipeline_download, job_id)
|
||||
@@ -3174,7 +3268,7 @@ async def trigger_analyze(
|
||||
job_id: str,
|
||||
bg: BackgroundTasks,
|
||||
frames: int = KEYFRAME_COUNT,
|
||||
target: FrameExtractTarget = "transparent_human",
|
||||
target: FrameExtractTarget = "random_subject",
|
||||
mode: FrameExtractMode = "replace",
|
||||
quality: FrameExtractQuality = "auto",
|
||||
) -> Job:
|
||||
@@ -3252,6 +3346,11 @@ def delete_job(job_id: str) -> dict[str, bool | str]:
|
||||
job = JOBS.pop(job_id, None)
|
||||
if not job and not d.exists():
|
||||
raise HTTPException(404, "job not found")
|
||||
if DB:
|
||||
try:
|
||||
DB.delete_job(job_id)
|
||||
except Exception as e:
|
||||
print(f"[database delete failed] job={job_id} error={e}", flush=True)
|
||||
if d.exists():
|
||||
shutil.rmtree(d)
|
||||
return {"ok": True, "id": job_id}
|
||||
|
||||
@@ -3,7 +3,8 @@
|
||||
|
||||
# Runtime
|
||||
JOBS_DIR=/data/jobs
|
||||
KEYFRAME_COUNT=12
|
||||
APP_DB_URL=sqlite:////data/jobs/app.db
|
||||
KEYFRAME_COUNT=6
|
||||
CORS_ORIGINS=https://marketing.skg.com
|
||||
API_PORT=4291
|
||||
|
||||
@@ -22,7 +23,9 @@ LLM_API_KEY=
|
||||
ASR_MODEL=whisper-1
|
||||
ASR_FALLBACK_MODEL=gemini-2.5-flash
|
||||
TRANSLATE_MODEL=gemini-2.5-flash
|
||||
REWRITE_MODEL=gemini-2.5-pro
|
||||
GPT_TEXT_MODEL=gpt-4o
|
||||
REWRITE_MODEL=gpt-4o
|
||||
VISION_MODEL=gpt-4o
|
||||
PRODUCT_VIEW_MODEL=gpt-image-2
|
||||
IMAGE_BASE_URL=https://ai.skg.com/ezlink/v1
|
||||
IMAGE_API_KEY=
|
||||
@@ -33,9 +36,14 @@ SUBJECT_ASSET_IMAGE_MODELS=gpt-image-2
|
||||
# Optional outbound proxy for AI gateway calls. Leave blank on normal VPS networking.
|
||||
AI_HTTP_PROXY=
|
||||
|
||||
# Optional TikTok download login state for yt-dlp. Keep cookies files private.
|
||||
YTDLP_COOKIES_FILE=
|
||||
YTDLP_COOKIES_FROM_BROWSER=
|
||||
|
||||
# Audio rewrite and Azure OpenAI TTS
|
||||
AUDIO_REWRITE_MODEL=gemini-2.5-pro
|
||||
AUDIO_REWRITE_MODEL=gpt-4o
|
||||
AUDIO_PRODUCT_BRIEF="SKG smart massage products for daily neck, shoulder, back, eye, knee, and foot relaxation. Keep claims premium, clean, credible, and non-medical."
|
||||
# Voice is fixed to Azure OpenAI in the backend.
|
||||
VOICE_PROVIDER=azure_openai
|
||||
AZURE_OPENAI_BASE_URL=https://ai.skg.com/azure
|
||||
AZURE_OPENAI_API_KEY=
|
||||
@@ -43,13 +51,7 @@ AZURE_TTS_MODEL=gpt-4o-mini-tts
|
||||
AZURE_TTS_VOICE_ID=alloy
|
||||
AZURE_TTS_VOICE_POOL=alloy,verse,shimmer
|
||||
AZURE_TTS_PATH=/audio/speech
|
||||
|
||||
# Legacy MiniMax TTS fallback; not the default voice provider.
|
||||
MINIMAX_API_KEY=
|
||||
MINIMAX_TTS_BASE_URL=https://api.minimax.io
|
||||
MINIMAX_TTS_MODEL=speech-2.8-turbo
|
||||
MINIMAX_TTS_VOICE_ID=English_expressive_narrator
|
||||
MINIMAX_TTS_VOICE_POOL=English_magnetic_voiced_man,English_Upbeat_Woman,English_MaturePartner
|
||||
AZURE_TTS_PATHS=/audio/speech,/v1/audio/speech
|
||||
|
||||
# Video generation. Use SKG Doubao / Seedance gateway in production.
|
||||
POE_API_BASE_URL=https://api.poe.com/v1
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -19,6 +19,7 @@ import { AdRecreationBoard } from "@/components/ad-recreation-board"
|
||||
import {
|
||||
addManualFrame, analyzeJob, createJob, getJob, listJobs, uploadJob, deleteJob, deleteFrame, deleteGeneratedImage,
|
||||
deleteGeneratedVideo, deleteCutout, generateStoryboardVideo, triggerTranscribe, describeFrame, updateStoryboard, copyProductLibraryAsset,
|
||||
formatJobError, retryJobDownload,
|
||||
type Job, type ImageRef, type KeyFrame, type ProductFusionShot, type StoryboardScene, type FrameExtractMode, type FrameExtractQuality, type FrameExtractTarget,
|
||||
} from "@/lib/api"
|
||||
import { TRANSPARENT_HUMAN_NEGATIVE_PROMPT, TRANSPARENT_HUMAN_VIDEO_PROMPT } from "@/lib/workflow-target"
|
||||
@@ -40,6 +41,7 @@ const VIDEO_FRAME_PANEL_ID = "video-frame-panel"
|
||||
const FLOATING_PANEL_IDS = new Set([KEYFRAME_PANEL_ID, VIDEO_FRAME_PANEL_ID])
|
||||
const DIRECT_VIDEO_GENERATION_PAUSED = true
|
||||
const FRAME_TARGET_LABELS: Record<FrameExtractTarget, string> = {
|
||||
random_subject: "人物随机",
|
||||
transparent_human: "透明骨架人",
|
||||
balanced: "综合关键帧",
|
||||
subject: "清晰主体",
|
||||
@@ -242,8 +244,8 @@ export default function Home() {
|
||||
const handleAnalyzeJob = useCallback(async (jobId: string, options?: { mode?: FrameExtractMode }) => {
|
||||
const targetJob = jobs.find((item) => item.id === jobId)
|
||||
if (!targetJob) return
|
||||
const frameTarget = frameTargets[jobId] ?? "transparent_human"
|
||||
const frameCount = frameCounts[jobId] ?? 12
|
||||
const frameTarget = frameTargets[jobId] ?? "random_subject"
|
||||
const frameCount = frameCounts[jobId] ?? 6
|
||||
const frameQuality = frameQualities[jobId] ?? "auto"
|
||||
const mode = options?.mode ?? (targetJob.frames.length > 0 ? "append" : "replace")
|
||||
setActiveJobId(jobId)
|
||||
@@ -487,8 +489,8 @@ export default function Home() {
|
||||
const visualRunning = target.status === "splitting"
|
||||
if (!hasVisualResult && !visualRunning && !autoTriggeredRef.current.has(visualKey)) {
|
||||
autoTriggeredRef.current.add(visualKey)
|
||||
const frameTarget = frameTargets[target.id] ?? "motion"
|
||||
const frameCount = frameCounts[target.id] ?? 12
|
||||
const frameTarget = frameTargets[target.id] ?? "random_subject"
|
||||
const frameCount = frameCounts[target.id] ?? 6
|
||||
const frameQuality = frameQualities[target.id] ?? "accurate"
|
||||
try {
|
||||
const updated = await analyzeJob(target.id, frameCount, frameTarget, "replace", frameQuality)
|
||||
@@ -572,15 +574,30 @@ export default function Home() {
|
||||
const handleStartProduction = useCallback(async (inputUrl?: string) => {
|
||||
const trimmed = inputUrl?.trim()
|
||||
const created = trimmed ? await handleSubmit(trimmed) : undefined
|
||||
const target = created ?? job
|
||||
let target = created ?? job
|
||||
if (!target) {
|
||||
toast.info("先粘贴视频链接或选择一个素材任务")
|
||||
return
|
||||
}
|
||||
if (!created && target.status === "failed") {
|
||||
autoTriggeredRef.current.delete(`${target.id}:audio`)
|
||||
autoTriggeredRef.current.delete(`${target.id}:visual`)
|
||||
}
|
||||
if (!created && target.status === "failed" && !target.video_url) {
|
||||
try {
|
||||
target = await retryJobDownload(target.id)
|
||||
updateJobInList(target)
|
||||
toast.info("已重新提交下载;下载完成后会自动跑音频文案路和视觉抽帧路")
|
||||
} catch (e) {
|
||||
toast.error("重新下载失败:" + (e instanceof Error ? e.message : String(e)))
|
||||
return
|
||||
}
|
||||
}
|
||||
setProductionJobIds((prev) => new Set(prev).add(target.id))
|
||||
toast.success("已进入并行素材分析:下载完成后自动跑音频文案路和视觉抽帧路")
|
||||
if (target.video_url) toast.success("已进入并行素材分析:音频文案路和视觉抽帧路会同步推进")
|
||||
else toast.success("已进入并行素材分析:下载完成后自动跑音频文案路和视觉抽帧路")
|
||||
void startProductionLanesForJob(target)
|
||||
}, [handleSubmit, job, startProductionLanesForJob])
|
||||
}, [handleSubmit, job, startProductionLanesForJob, updateJobInList])
|
||||
|
||||
useEffect(() => {
|
||||
if (productionJobIds.size === 0) return
|
||||
@@ -863,6 +880,9 @@ export default function Home() {
|
||||
if (job?.status === "downloaded" && prevStatusRef.current !== "downloaded") {
|
||||
toast.info("视频已下载,音频解析会自动开始;也可以在右侧手动重试", { duration: 6000 })
|
||||
}
|
||||
if (job?.status === "failed" && prevStatusRef.current !== "failed") {
|
||||
toast.error(formatJobError(job.error) || "任务失败", { duration: 10000 })
|
||||
}
|
||||
prevStatusRef.current = job?.status ?? null
|
||||
|
||||
const TERMINAL: Job["status"][] = ["downloaded", "frames_extracted", "transcribed", "failed"]
|
||||
|
||||
@@ -32,6 +32,7 @@ import {
|
||||
cutoutElement,
|
||||
deleteSubjectAsset,
|
||||
effectiveFrameUrl,
|
||||
formatJobError,
|
||||
generateSceneAsset,
|
||||
generateProductAngleAsset,
|
||||
generateSubjectAssets,
|
||||
@@ -52,6 +53,7 @@ import { type NodeData } from "@/components/nodes"
|
||||
import { MediaAssetTile } from "@/components/media-asset-tile"
|
||||
|
||||
const TARGETS: Array<{ value: FrameExtractTarget; label: string }> = [
|
||||
{ value: "random_subject", label: "人物随机" },
|
||||
{ value: "balanced", label: "综合" },
|
||||
{ value: "subject", label: "主体" },
|
||||
{ value: "motion", label: "动作" },
|
||||
@@ -1449,6 +1451,9 @@ function MaterialColumn({
|
||||
onSubmitUrl: () => void
|
||||
onStartProduction: () => void
|
||||
}) {
|
||||
const actionLabel = !url.trim() && job?.status === "failed"
|
||||
? job.video_url ? "重新解析" : "重新下载"
|
||||
: "开始分析"
|
||||
return (
|
||||
<section className="flex min-h-0 flex-col gap-3 rounded-lg border border-white/10 bg-white/[0.035] p-3 shadow-2xl">
|
||||
<header className="shrink-0 border-b border-white/10 pb-3">
|
||||
@@ -1474,7 +1479,7 @@ function MaterialColumn({
|
||||
disabled={data.submitting || (!url.trim() && !job)}
|
||||
className="inline-flex h-10 items-center justify-center rounded-md bg-rose-600 px-3 text-[13px] font-semibold text-white transition hover:bg-rose-500 disabled:cursor-not-allowed disabled:opacity-45"
|
||||
>
|
||||
开始分析
|
||||
{actionLabel}
|
||||
</button>
|
||||
<button
|
||||
type="button"
|
||||
@@ -1875,11 +1880,11 @@ function SourceReferenceBuildPanel({
|
||||
for (const frame of job.frames) {
|
||||
if (selectedFrames.has(frame.index)) onToggleFrame(frame.index)
|
||||
}
|
||||
const updated = await analyzeJob(job.id, 12, "motion", "replace", "accurate")
|
||||
const updated = await analyzeJob(job.id, 6, "random_subject", "replace", "accurate")
|
||||
onJobUpdate(updated)
|
||||
toast.info("已按动作峰值逻辑重新抽取 12 张参考帧,完成后在这里人工选择主角参考。")
|
||||
toast.info("已按人物定向随机逻辑重新抽取 6 张参考帧,完成后在这里人工选择主角参考。")
|
||||
} catch (e) {
|
||||
toast.error("12 张关键帧抽取失败:" + (e instanceof Error ? e.message : String(e)))
|
||||
toast.error("6 张关键帧抽取失败:" + (e instanceof Error ? e.message : String(e)))
|
||||
} finally {
|
||||
setExtracting(false)
|
||||
}
|
||||
@@ -1887,7 +1892,7 @@ function SourceReferenceBuildPanel({
|
||||
|
||||
const generateSimilarActor = async () => {
|
||||
if (!frames.length) {
|
||||
toast.warning("请先自动抽帧 12 张,或在原版视频上手动补帧。")
|
||||
toast.warning("请先自动抽帧 6 张,或在原版视频上手动补帧。")
|
||||
return
|
||||
}
|
||||
const baseFrame = subjectReferenceFrames[0]
|
||||
@@ -2000,11 +2005,11 @@ function SourceReferenceBuildPanel({
|
||||
type="button"
|
||||
onClick={() => void extractKeyframes()}
|
||||
disabled={!job.video_url || extracting || job.status === "splitting"}
|
||||
title="自动按动作峰值抽 12 张参考帧,更偏向手势、表情变化、节奏点和镜头变化"
|
||||
title="自动按人物定向随机逻辑抽 6 张参考帧,保留手动当前点补帧"
|
||||
className="inline-flex h-8 items-center justify-center gap-1 rounded-md bg-white px-3 text-[11px] font-semibold text-black transition hover:bg-white/90 disabled:cursor-not-allowed disabled:opacity-40"
|
||||
>
|
||||
{extracting || job.status === "splitting" ? <Loader2 className="h-3.5 w-3.5 animate-spin" /> : <Scissors className="h-3.5 w-3.5" />}
|
||||
自动抽帧 12 张
|
||||
自动抽帧 6 张
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
@@ -2039,7 +2044,7 @@ function SourceReferenceBuildPanel({
|
||||
})}
|
||||
{!frames.length && (
|
||||
<div className="col-span-full flex h-[106px] items-center justify-center rounded border border-dashed border-white/12 text-[11px] text-white/34">
|
||||
点击“自动抽帧 12 张”,或在原版视频播放器上用“当前点抽帧”补充人物参考。
|
||||
点击“自动抽帧 6 张”,或在原版视频播放器上用“当前点抽帧”补充人物参考。
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
@@ -3405,7 +3410,7 @@ function FrameExtractControls({
|
||||
</div>
|
||||
<div className="grid grid-cols-[1fr_1fr_72px] gap-2">
|
||||
<select
|
||||
value={job ? data.frameTargets[job.id] ?? "transparent_human" : "balanced"}
|
||||
value={job ? data.frameTargets[job.id] ?? "random_subject" : "random_subject"}
|
||||
onChange={(e) => job && data.onFrameTargetChange(job.id, e.target.value as FrameExtractTarget)}
|
||||
disabled={!job}
|
||||
className={controlClass}
|
||||
@@ -3424,8 +3429,8 @@ function FrameExtractControls({
|
||||
type="number"
|
||||
min={1}
|
||||
max={20}
|
||||
value={job ? data.frameCounts[job.id] ?? 12 : 12}
|
||||
onChange={(e) => job && data.onFrameCountChange(job.id, Number(e.target.value) || 12)}
|
||||
value={job ? data.frameCounts[job.id] ?? 6 : 6}
|
||||
onChange={(e) => job && data.onFrameCountChange(job.id, Number(e.target.value) || 6)}
|
||||
disabled={!job}
|
||||
className={`${controlClass} text-center`}
|
||||
/>
|
||||
@@ -3858,6 +3863,7 @@ function MaterialCard({
|
||||
onDelete?: () => void
|
||||
}) {
|
||||
const tone = statusTone(job)
|
||||
const errorText = formatJobError(job.error)
|
||||
return (
|
||||
<button
|
||||
type="button"
|
||||
@@ -3879,6 +3885,12 @@ function MaterialCard({
|
||||
<Metric label="文案" value={job.audio_script?.source_text || job.transcript.length ? "ready" : "-"} compact />
|
||||
<Metric label="段落" value={`${job.transcript.length}`} compact />
|
||||
</div>
|
||||
{job.status === "failed" && errorText && (
|
||||
<div className="mt-2 flex gap-1.5 rounded-md border border-rose-300/18 bg-rose-500/[0.08] px-2 py-1.5 text-[11px] leading-snug text-rose-100/82">
|
||||
<AlertTriangle className="mt-0.5 h-3.5 w-3.5 shrink-0" />
|
||||
<span className="line-clamp-3">{errorText}</span>
|
||||
</div>
|
||||
)}
|
||||
{onDelete && (
|
||||
<span
|
||||
role="button"
|
||||
|
||||
@@ -641,15 +641,15 @@ export const Dashboard = forwardRef<DashboardHandle, Props>(function Dashboard({
|
||||
</div>
|
||||
</KanbanCard>
|
||||
|
||||
<KanbanCard tone="green" tags={["配音"]} title={job?.audio_script?.voice_model || "MiniMax T2A"}>
|
||||
<KanbanCard tone="green" tags={["配音"]} title={job?.audio_script?.voice_model || "Azure OpenAI TTS"}>
|
||||
{job?.audio_script?.voice_url ? (
|
||||
<audio controls className="h-8 w-full" src={apiAssetUrl(job.audio_script.voice_url)} />
|
||||
) : (
|
||||
<div className="text-[11px] text-[var(--text-soft)]">
|
||||
{job?.audio_script?.error || "配置 MiniMax 后自动生成配音文件"}
|
||||
{job?.audio_script?.error || "配置 Azure OpenAI TTS 后自动生成配音文件"}
|
||||
</div>
|
||||
)}
|
||||
<div className="kanban-meta">{job?.audio_script?.voice_id || "random English voice"}</div>
|
||||
<div className="kanban-meta">{job?.audio_script?.voice_id || "Azure voice"}</div>
|
||||
</KanbanCard>
|
||||
</>
|
||||
)}
|
||||
|
||||
@@ -133,6 +133,7 @@ function clamp(value: number, min: number, max: number) {
|
||||
const THUMBNAIL_HEIGHT = 192
|
||||
const FLOATING_PANEL_EDGE_INSET = 8
|
||||
const FRAME_TARGET_OPTIONS: Array<{ value: FrameExtractTarget; label: string; hint: string }> = [
|
||||
{ value: "random_subject", label: "人物随机", hint: "从清晰人物候选里随机抽取" },
|
||||
{ value: "transparent_human", label: "透明骨架人", hint: "本地算力筛清晰主体,不逐帧调用 Vision" },
|
||||
{ value: "balanced", label: "综合关键帧", hint: "清晰、去重、变化、时间覆盖" },
|
||||
{ value: "subject", label: "清晰主体", hint: "人物 / 产品主体更清楚" },
|
||||
@@ -140,7 +141,7 @@ const FRAME_TARGET_OPTIONS: Array<{ value: FrameExtractTarget; label: string; hi
|
||||
{ value: "expression", label: "表情瞬间", hint: "人物 / 动物表情倾向" },
|
||||
{ value: "motion", label: "动作峰值", hint: "动作变化更明显" },
|
||||
]
|
||||
const FRAME_COUNT_OPTIONS = [12, 8, 5, 3]
|
||||
const FRAME_COUNT_OPTIONS = [6, 12, 8, 5, 3]
|
||||
const FRAME_QUALITY_OPTIONS: Array<{ value: FrameExtractQuality; label: string; hint: string }> = [
|
||||
{ value: "auto", label: "自动", hint: "展示友好:按电脑性能选择,最高只到精细" },
|
||||
{ value: "fast", label: "快速", hint: "2fps / 360px,长视频省电" },
|
||||
@@ -575,8 +576,8 @@ export function InputNode({ data, selected }: NodeProps<{ data: NodeData }> | an
|
||||
const aspectStr = ready ? `${j.width}/${j.height}` : "9/16"
|
||||
const thumbNaturalWidth = ready && j.height ? Math.max(96, Math.round(THUMBNAIL_HEIGHT * j.width / j.height)) : 96
|
||||
const toolWidth = Math.max(148, thumbNaturalWidth)
|
||||
const target = d.frameTargets[j.id] ?? "transparent_human"
|
||||
const count = d.frameCounts[j.id] ?? 12
|
||||
const target = d.frameTargets[j.id] ?? "random_subject"
|
||||
const count = d.frameCounts[j.id] ?? 6
|
||||
const quality = d.frameQualities[j.id] ?? "auto"
|
||||
const jHasFrames = j.frames.length > 0
|
||||
const jRunning = ["splitting", "transcribing"].includes(j.status)
|
||||
@@ -815,8 +816,8 @@ export function VideoFramePanelNode({ data }: any) {
|
||||
const duration = panelJob.duration ?? 0
|
||||
const frames = [...panelJob.frames].sort((a, b) => a.timestamp - b.timestamp)
|
||||
const aspect = panelJob.width && panelJob.height ? `${panelJob.width}/${panelJob.height}` : "9/16"
|
||||
const panelTarget = d.frameTargets[panelJob.id] ?? "transparent_human"
|
||||
const panelCount = d.frameCounts[panelJob.id] ?? 12
|
||||
const panelTarget = d.frameTargets[panelJob.id] ?? "random_subject"
|
||||
const panelCount = d.frameCounts[panelJob.id] ?? 6
|
||||
const panelQuality = d.frameQualities[panelJob.id] ?? "auto"
|
||||
const panelRunning = ["splitting", "transcribing"].includes(panelJob.status)
|
||||
const dockText: Record<CanvasPanelDock, string> = {
|
||||
@@ -2102,7 +2103,7 @@ export function RewriteNode({ data, selected }: any) {
|
||||
}
|
||||
|
||||
/* ============================================================
|
||||
5b. AudioNode — 合并 ASR + 翻译 + 改写 + MiniMax 配音
|
||||
5b. AudioNode — 合并 ASR + 翻译 + 改写 + Azure OpenAI 配音
|
||||
============================================================ */
|
||||
export function AudioNode({ data, selected }: any) {
|
||||
const d: NodeData = data
|
||||
@@ -2152,9 +2153,9 @@ export function AudioNode({ data, selected }: any) {
|
||||
}}
|
||||
>
|
||||
<div>
|
||||
音轨 → 取时长/节奏 → SKG 英文产品口播 → MiniMax 随机英文配音<br />
|
||||
音轨 → 取时长/节奏 → SKG 英文产品口播 → Azure OpenAI 英文配音<br />
|
||||
<span className="text-[var(--text-faint)] font-mono">
|
||||
{audioScript?.rewrite_model || "AUDIO_REWRITE_MODEL"} → {audioScript?.voice_model || "MiniMax T2A"}
|
||||
{audioScript?.rewrite_model || "AUDIO_REWRITE_MODEL"} → {audioScript?.voice_model || "Azure OpenAI TTS"}
|
||||
</span>
|
||||
</div>
|
||||
{job && (
|
||||
@@ -2195,7 +2196,7 @@ export function AudioNode({ data, selected }: any) {
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
{voiceUrl && <div className="text-[10.5px] text-emerald-200/85">MiniMax natural English voice ready · 底部音频条播放</div>}
|
||||
{voiceUrl && <div className="text-[10.5px] text-emerald-200/85">Azure OpenAI English voice ready · 底部音频条播放</div>}
|
||||
{isRewriting && (
|
||||
<div className="text-[10.5px] text-[var(--text-faint)]">正在按原音频时长生成英文产品口播和配音…</div>
|
||||
)}
|
||||
|
||||
@@ -172,10 +172,7 @@ export interface RuntimeModels {
|
||||
voice_id?: string
|
||||
voice_pool?: string[]
|
||||
voice_configured?: boolean
|
||||
minimax_tts?: string
|
||||
minimax_voice?: string
|
||||
minimax_voice_pool?: string[]
|
||||
minimax_configured?: boolean
|
||||
voice_tts_paths?: string[]
|
||||
video?: string
|
||||
video_aliases?: Record<string, string>
|
||||
video_provider?: string
|
||||
@@ -189,6 +186,15 @@ export interface RuntimeHealth {
|
||||
llm_configured?: boolean
|
||||
auth_configured?: boolean
|
||||
base_url?: string
|
||||
database?: {
|
||||
enabled: boolean
|
||||
url?: string
|
||||
schema_version?: number
|
||||
documents?: number
|
||||
jobs?: number
|
||||
assets?: number
|
||||
error?: string
|
||||
}
|
||||
models?: RuntimeModels
|
||||
}
|
||||
|
||||
@@ -419,7 +425,7 @@ export interface KeyFrame {
|
||||
generated_images?: GeneratedImage[]
|
||||
}
|
||||
|
||||
export type FrameExtractTarget = "transparent_human" | "balanced" | "subject" | "transition" | "expression" | "motion"
|
||||
export type FrameExtractTarget = "random_subject" | "transparent_human" | "balanced" | "subject" | "transition" | "expression" | "motion"
|
||||
export type FrameExtractMode = "replace" | "append"
|
||||
export type FrameExtractQuality = "auto" | "fast" | "accurate" | "ultra"
|
||||
export type AssetBackground = "white" | "black"
|
||||
@@ -574,6 +580,10 @@ export interface ProductRefStateItem {
|
||||
export interface Job {
|
||||
id: string
|
||||
url: string
|
||||
document_id?: string
|
||||
source_kind?: "tiktok_link" | "upload" | "unknown"
|
||||
workflow_mode?: "feed_recreation" | "uploaded_reference"
|
||||
storage_prefix?: string
|
||||
status: JobStatus
|
||||
progress: number
|
||||
message?: string
|
||||
@@ -596,14 +606,13 @@ export interface BackendHealth {
|
||||
llm_configured: boolean
|
||||
auth_configured?: boolean
|
||||
base_url: string
|
||||
database?: RuntimeHealth["database"]
|
||||
models?: {
|
||||
asr?: string
|
||||
translate?: string
|
||||
rewrite?: string
|
||||
audio_rewrite?: string
|
||||
minimax_tts?: string
|
||||
minimax_voice?: string
|
||||
minimax_configured?: boolean
|
||||
voice_tts_paths?: string[]
|
||||
video?: string
|
||||
video_aliases?: Record<string, string>
|
||||
video_base_url?: string
|
||||
@@ -617,6 +626,25 @@ export function apiAssetUrl(path?: string | null): string {
|
||||
return `${API_BASE}${path.startsWith("/") ? "" : "/"}${path}`
|
||||
}
|
||||
|
||||
export function isRestrictedDownloadError(error?: string | null): boolean {
|
||||
const text = (error ?? "").toLowerCase()
|
||||
return (
|
||||
text.includes("tiktok 下载需要登录态") ||
|
||||
text.includes("log in for access") ||
|
||||
text.includes("cookies-from-browser") ||
|
||||
text.includes("ytdlp_cookies_file") ||
|
||||
(text.includes("tiktok") && text.includes("cookies"))
|
||||
)
|
||||
}
|
||||
|
||||
export function formatJobError(error?: string | null): string {
|
||||
if (!error) return ""
|
||||
if (isRestrictedDownloadError(error)) {
|
||||
return "这个 TikTok 视频需要登录态。请上传 MP4,或让后端配置 YTDLP_COOKIES_FROM_BROWSER / YTDLP_COOKIES_FILE 后重试。"
|
||||
}
|
||||
return error
|
||||
}
|
||||
|
||||
export async function getHealth(): Promise<BackendHealth> {
|
||||
const res = await fetch(`${API_BASE}/health`)
|
||||
if (!res.ok) throw new Error(`health ${res.status}`)
|
||||
@@ -633,6 +661,15 @@ export async function createJob(tkUrl: string): Promise<Job> {
|
||||
return res.json()
|
||||
}
|
||||
|
||||
export async function retryJobDownload(id: string): Promise<Job> {
|
||||
const res = await fetch(`${API_BASE}/jobs/${id}/download/retry`, { method: "POST" })
|
||||
if (!res.ok) {
|
||||
const text = await res.text().catch(() => "")
|
||||
throw apiError("retryJobDownload", res.status, text)
|
||||
}
|
||||
return res.json()
|
||||
}
|
||||
|
||||
export async function uploadJob(file: File): Promise<Job> {
|
||||
const fd = new FormData()
|
||||
fd.append("file", file)
|
||||
@@ -664,6 +701,9 @@ export async function deleteJob(id: string): Promise<{ ok: boolean; id: string }
|
||||
|
||||
export interface JobSummary {
|
||||
id: string
|
||||
document_id?: string
|
||||
source_kind?: string
|
||||
workflow_mode?: string
|
||||
url: string
|
||||
status: JobStatus
|
||||
progress: number
|
||||
@@ -679,6 +719,28 @@ export interface JobSummary {
|
||||
mtime: number
|
||||
}
|
||||
|
||||
export interface DocumentSummary {
|
||||
id: string
|
||||
title: string
|
||||
source_kind: string
|
||||
workflow_mode: string
|
||||
source_url: string
|
||||
primary_job_id: string
|
||||
status: string
|
||||
storage_prefix: string
|
||||
job_count: number
|
||||
asset_count: number
|
||||
created_at: number
|
||||
updated_at: number
|
||||
}
|
||||
|
||||
export async function listDocuments(limit?: number): Promise<DocumentSummary[]> {
|
||||
const qs = limit && limit > 0 ? `?limit=${limit}` : ""
|
||||
const res = await fetch(`${API_BASE}/documents${qs}`)
|
||||
if (!res.ok) throw new Error(`listDocuments ${res.status}`)
|
||||
return res.json()
|
||||
}
|
||||
|
||||
export async function listJobs(limit?: number): Promise<JobSummary[]> {
|
||||
const qs = limit && limit > 0 ? `?limit=${limit}` : ""
|
||||
const res = await fetch(`${API_BASE}/jobs${qs}`)
|
||||
@@ -694,8 +756,8 @@ export async function triggerTranscribe(id: string): Promise<Job> {
|
||||
|
||||
export async function analyzeJob(
|
||||
id: string,
|
||||
frames = 12,
|
||||
target: FrameExtractTarget = "balanced",
|
||||
frames = 6,
|
||||
target: FrameExtractTarget = "random_subject",
|
||||
mode: FrameExtractMode = "replace",
|
||||
quality: FrameExtractQuality = "auto",
|
||||
): Promise<Job> {
|
||||
|
||||
Reference in New Issue
Block a user