from __future__ import annotations import asyncio import base64 import hashlib import hmac import io import json import os import random import re import secrets import shutil import subprocess import threading import time import uuid from contextlib import asynccontextmanager from dataclasses import dataclass from pathlib import Path from typing import Literal from urllib.parse import urlencode import httpx from dotenv import load_dotenv from fastapi import BackgroundTasks, FastAPI, File, Form, HTTPException, Request, Response, UploadFile from fastapi.middleware.cors import CORSMiddleware from fastapi.responses import FileResponse, JSONResponse, RedirectResponse from pydantic import BaseModel, ConfigDict, Field import db load_dotenv() JOBS_DIR = Path(os.getenv("JOBS_DIR", "./jobs")).resolve() JOBS_DIR.mkdir(parents=True, exist_ok=True) AGENT_RUNS_DIR = Path(os.getenv("AGENT_RUNS_DIR", JOBS_DIR.parent / "agent_runs")).resolve() AGENT_RUNS_DIR.mkdir(parents=True, exist_ok=True) CORS_ORIGINS = [o.strip() for o in os.getenv("CORS_ORIGINS", "http://localhost:4290,http://127.0.0.1:4290").split(",") if o.strip()] PRODUCT_LIBRARY_DIR = Path( os.getenv("PRODUCT_LIBRARY_DIR", Path(__file__).resolve().parent / "product_library" / "skg-products") ).resolve() PRODUCT_LIBRARY_MANIFEST = PRODUCT_LIBRARY_DIR / "manifest.json" CHARACTER_LIBRARY_DIR = Path( os.getenv("CHARACTER_LIBRARY_DIR", Path(__file__).resolve().parent / "character_library" / "skg-characters") ).resolve() CHARACTER_LIBRARY_MANIFEST = CHARACTER_LIBRARY_DIR / "manifest.json" SUBJECT_TEMPLATE_DIR = Path(os.getenv("SUBJECT_TEMPLATE_DIR", JOBS_DIR / "_subject_templates")).resolve() SUBJECT_TEMPLATE_IMAGE_DIR = SUBJECT_TEMPLATE_DIR / "images" SUBJECT_TEMPLATE_MANIFEST = SUBJECT_TEMPLATE_DIR / "manifest.json" SUBJECT_TEMPLATE_IMAGE_DIR.mkdir(parents=True, exist_ok=True) ASSET_LIBRARY_DIR = Path(os.getenv("ASSET_LIBRARY_DIR", JOBS_DIR.parent / "asset_library")).resolve() PROMPT_LIBRARY_DIR = Path(os.getenv("PROMPT_LIBRARY_DIR", JOBS_DIR.parent / "prompt_library")).resolve() PROMPT_LIBRARY_ITEMS_DIR = PROMPT_LIBRARY_DIR / "items" LIBRARY_TRASH_DIR = JOBS_DIR.parent / "_trash" for _library_dir in [ ASSET_LIBRARY_DIR / "subjects", ASSET_LIBRARY_DIR / "products", ASSET_LIBRARY_DIR / "scenes", ASSET_LIBRARY_DIR / "videos", PROMPT_LIBRARY_ITEMS_DIR, LIBRARY_TRASH_DIR, ]: _library_dir.mkdir(parents=True, exist_ok=True) LLM_BASE_URL = os.getenv("LLM_BASE_URL", "").strip() LLM_API_KEY = os.getenv("LLM_API_KEY", "").strip() ASR_BASE_URL = os.getenv("ASR_BASE_URL", LLM_BASE_URL).strip() ASR_API_KEY = (os.getenv("ASR_API_KEY") or LLM_API_KEY).strip() ASR_MODEL = os.getenv("ASR_MODEL", "whisper-1") ASR_LANGUAGE = os.getenv("ASR_LANGUAGE", "").strip() ASR_REMOTE_ENABLED = os.getenv("ASR_REMOTE_ENABLED", "true").strip().lower() not in {"0", "false", "no", "off"} ASR_LOCAL_FALLBACK_ENABLED = os.getenv("ASR_LOCAL_FALLBACK_ENABLED", "true").strip().lower() not in {"0", "false", "no", "off"} ASR_AUDIO_FALLBACK_ENABLED = os.getenv("ASR_AUDIO_FALLBACK_ENABLED", "true").strip().lower() not in {"0", "false", "no", "off"} ASR_FALLBACK_MODEL = os.getenv("ASR_FALLBACK_MODEL", "gemini-2.5-flash").strip() or "gemini-2.5-flash" ASR_TIMEOUT_SECONDS = max(15, int(os.getenv("ASR_TIMEOUT_SECONDS", "45"))) FASTER_WHISPER_MODEL = os.getenv("FASTER_WHISPER_MODEL", "base").strip() or "base" FASTER_WHISPER_DEVICE = os.getenv("FASTER_WHISPER_DEVICE", "cpu").strip() or "cpu" FASTER_WHISPER_COMPUTE_TYPE = os.getenv("FASTER_WHISPER_COMPUTE_TYPE", "int8").strip() or "int8" LOCAL_ASR_BIN = os.getenv("LOCAL_ASR_BIN", "").strip() LOCAL_ASR_MODEL = os.getenv("LOCAL_ASR_MODEL", "mlx-community/whisper-tiny").strip() or "mlx-community/whisper-tiny" LOCAL_ASR_TIMEOUT_SECONDS = max(30, int(os.getenv("LOCAL_ASR_TIMEOUT_SECONDS", "180"))) TRANSLATE_MODEL = os.getenv("TRANSLATE_MODEL", "gemini-2.5-flash") DEFAULT_GPT_TEXT_MODEL = os.getenv("GPT_TEXT_MODEL", "gpt-4o").strip() or "gpt-4o" ASR_AUTO_LANGUAGE_VALUES = {"", "auto", "detect", "multilingual", "multi"} def _asr_language_hint() -> str: language = ASR_LANGUAGE.strip() if language.lower() in ASR_AUTO_LANGUAGE_VALUES: return "" return language def _asr_language_label() -> str: return _asr_language_hint() or "auto" def gpt_model_env(name: str, default: str | None = None) -> str: value = os.getenv(name, default or DEFAULT_GPT_TEXT_MODEL).strip() if not value or value.lower().startswith("gemini-"): return default or DEFAULT_GPT_TEXT_MODEL return value REWRITE_MODEL = gpt_model_env("REWRITE_MODEL") VISION_MODEL = gpt_model_env("VISION_MODEL") IMAGE_BASE_URL = os.getenv("IMAGE_BASE_URL", LLM_BASE_URL).strip() IMAGE_API_KEY = os.getenv("IMAGE_API_KEY", LLM_API_KEY).strip() AI_HTTP_PROXY = ( os.getenv("AI_HTTP_PROXY") or os.getenv("IMAGE_HTTP_PROXY") or os.getenv("HTTPS_PROXY") or os.getenv("https_proxy") or os.getenv("HTTP_PROXY") or os.getenv("http_proxy") or "" ).strip() # Product decision: gpt-image-2 remains the primary image model. Gemini is only # allowed as an outage fallback when the primary gateway times out or returns # transient upstream failures. GPT_IMAGE_MODEL = "gpt-image-2" IMAGE_FALLBACK_MODEL = os.getenv("IMAGE_FALLBACK_MODEL", "gemini-3-pro-image-preview").strip() or "" IMAGE_FALLBACK_ENABLED = os.getenv("IMAGE_FALLBACK_ENABLED", "true").strip().lower() not in {"0", "false", "no", "off"} IMAGE_MODEL = GPT_IMAGE_MODEL PRODUCT_VIEW_MODEL = GPT_IMAGE_MODEL SUBJECT_ASSET_IMAGE_MODEL = GPT_IMAGE_MODEL IMAGE_SIZE_CHOICES = [ { "id": "auto", "label": "自动", "value": "auto", "description": "由图片模型自行决定输出尺寸", }, { "id": "1024x1536", "label": "竖图 2:3", "value": "1024x1536", "description": "适合信息流营销图、人物和产品竖版构图", }, { "id": "1024x1024", "label": "方图 1:1", "value": "1024x1024", "description": "适合头像、方形素材和电商图", }, { "id": "1536x1024", "label": "横图 3:2", "value": "1536x1024", "description": "适合横版封面和详情页配图", }, ] VIDEO_SIZE_CHOICES = [ { "id": "720x1280", "label": "竖屏 9:16", "value": "720x1280", "description": "适合抖音、短视频和飞书内预览", }, { "id": "1280x720", "label": "横屏 16:9", "value": "1280x720", "description": "适合横版展示和网页视频", }, { "id": "1024x1024", "label": "方形 1:1", "value": "1024x1024", "description": "适合方形广告位", }, { "id": "960x1280", "label": "竖屏 3:4", "value": "960x1280", "description": "适合更接近图文卡片的竖版素材", }, ] SubjectModelBundle = Literal["gpt", "gemini"] SubjectAgentMode = Literal["realistic", "cartoon", "elements", "custom"] SUBJECT_AGENT_GPT_MODEL = gpt_model_env("SUBJECT_AGENT_GPT_MODEL", VISION_MODEL) SUBJECT_AGENT_GEMINI_MODEL = os.getenv("SUBJECT_AGENT_GEMINI_MODEL", "gemini-2.5-flash").strip() or "gemini-2.5-flash" SUBJECT_ASSET_IMAGE_MODELS = [GPT_IMAGE_MODEL] + ( [IMAGE_FALLBACK_MODEL] if IMAGE_FALLBACK_ENABLED and IMAGE_FALLBACK_MODEL and IMAGE_FALLBACK_MODEL != GPT_IMAGE_MODEL else [] ) IMAGE_REQUEST_TIMEOUT_SECONDS = max(15, min(180, int(os.getenv("IMAGE_REQUEST_TIMEOUT_SECONDS", "60")))) IMAGE_CIRCUIT_FAILURE_THRESHOLD = max(1, int(os.getenv("IMAGE_CIRCUIT_FAILURE_THRESHOLD", "2"))) IMAGE_CIRCUIT_COOLDOWN_SECONDS = max(60, int(os.getenv("IMAGE_CIRCUIT_COOLDOWN_SECONDS", "600"))) _IMAGE_CIRCUIT_LOCK = threading.Lock() _IMAGE_PRIMARY_FAILURES = 0 _IMAGE_PRIMARY_OPEN_UNTIL = 0.0 PRODUCT_ASSET_MAX_SIDE = max(1024, int(os.getenv("PRODUCT_ASSET_MAX_SIDE", "1600"))) PRODUCT_ASSET_MIN_LONG_SIDE = max(512, int(os.getenv("PRODUCT_ASSET_MIN_LONG_SIDE", "900"))) PRODUCT_ASSET_MIN_SHORT_SIDE = max(320, int(os.getenv("PRODUCT_ASSET_MIN_SHORT_SIDE", "600"))) PRODUCT_ASSET_JPEG_QUALITY = max(80, min(95, int(os.getenv("PRODUCT_ASSET_JPEG_QUALITY", "92")))) VIDEO_MODEL = os.getenv("VIDEO_MODEL", "seedance").strip() or "seedance" YTDLP_COOKIES_FILE = os.getenv("YTDLP_COOKIES_FILE", "").strip() YTDLP_COOKIES_FROM_BROWSER = os.getenv("YTDLP_COOKIES_FROM_BROWSER", "").strip() AUDIO_PRODUCT_BRIEF = os.getenv( "AUDIO_PRODUCT_BRIEF", "SKG smart massage products for everyday neck-and-shoulder, back, eye, knee, or foot relaxation. Ads should feel premium, clean, trustworthy, and must not make medical efficacy claims.", ).strip() AUDIO_REWRITE_MODEL = gpt_model_env("AUDIO_REWRITE_MODEL", REWRITE_MODEL) VOICE_PROVIDER = "azure_openai" AZURE_OPENAI_BASE_URL = os.getenv("AZURE_OPENAI_BASE_URL", "https://ai.skg.com/azure").strip().rstrip("/") AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY", LLM_API_KEY).strip() AZURE_TTS_MODEL = os.getenv("AZURE_TTS_MODEL", "gpt-4o-mini-tts").strip() or "gpt-4o-mini-tts" AZURE_TTS_VOICE_ID = os.getenv("AZURE_TTS_VOICE_ID", "alloy").strip() or "alloy" DEFAULT_AZURE_TTS_VOICE_POOL = ["alloy", "verse", "shimmer"] AZURE_TTS_VOICE_POOL = [ v.strip() for v in os.getenv("AZURE_TTS_VOICE_POOL", ",".join(DEFAULT_AZURE_TTS_VOICE_POOL)).split(",") if v.strip() ] AZURE_TTS_PATH = os.getenv("AZURE_TTS_PATH", "/audio/speech").strip() or "/audio/speech" AZURE_TTS_PATHS = [ p.strip() for p in os.getenv("AZURE_TTS_PATHS", f"{AZURE_TTS_PATH},/audio/speech,/v1/audio/speech").split(",") if p.strip() ] POE_API_BASE_URL = os.getenv("POE_API_BASE_URL", "https://api.poe.com/v1").strip() or "https://api.poe.com/v1" POE_API_KEY = os.getenv("POE_API_KEY", "").strip() def env_video_model(name: str, default: str) -> str: value = os.getenv(name, "").strip() if not value: return default # Older local envs used business aliases as model IDs. Keep those aliases usable # while mapping them to concrete Poe video model IDs by default. if value.lower() in {"seedance", "kling", "veo", "veo3", "voe"}: return default return value VIDEO_MODEL_ALIASES = { "seedance": env_video_model("VIDEO_MODEL_SEEDANCE", "seedance-2-fast"), "kling": env_video_model("VIDEO_MODEL_KLING", "kling-omni"), "veo3": env_video_model("VIDEO_MODEL_VEO3", "veo-3.1-fast"), "veo": env_video_model("VIDEO_MODEL_VEO3", "veo-3.1-fast"), "voe": env_video_model("VIDEO_MODEL_VEO3", "veo-3.1-fast"), } VIDEO_API_BASE_URL = os.getenv("VIDEO_API_BASE_URL", "").strip() VIDEO_API_KEY = os.getenv("VIDEO_API_KEY", "").strip() WEB_AUTH_USERNAME = os.getenv("WEB_AUTH_USERNAME", "").strip() WEB_AUTH_PASSWORD = os.getenv("WEB_AUTH_PASSWORD", "").strip() WEB_AUTH_SESSION_SECRET = os.getenv("WEB_AUTH_SESSION_SECRET", "").strip() WEB_AUTH_COOKIE_NAME = os.getenv("WEB_AUTH_COOKIE_NAME", "skg_marketing_session").strip() or "skg_marketing_session" WEB_AUTH_COOKIE_SECURE = os.getenv("WEB_AUTH_COOKIE_SECURE", "true").strip().lower() not in {"0", "false", "no"} PASSWORD_AUTH_ENABLED = os.getenv("PASSWORD_AUTH_ENABLED", "true").strip().lower() not in {"0", "false", "no", "off"} FEISHU_APP_ID = (os.getenv("FEISHU_APP_ID") or os.getenv("FEISHU_CLIENT_ID") or "").strip() FEISHU_APP_SECRET = (os.getenv("FEISHU_APP_SECRET") or os.getenv("FEISHU_CLIENT_SECRET") or "").strip() FEISHU_REDIRECT_URI = os.getenv("FEISHU_REDIRECT_URI", "").strip() FEISHU_OAUTH_SCOPE = os.getenv("FEISHU_OAUTH_SCOPE", "").strip() FEISHU_AUTHORIZE_URL = os.getenv( "FEISHU_AUTHORIZE_URL", "https://accounts.feishu.cn/open-apis/authen/v1/authorize", ).strip() FEISHU_TOKEN_URL = os.getenv( "FEISHU_TOKEN_URL", "https://open.feishu.cn/open-apis/authen/v2/oauth/token", ).strip() FEISHU_USER_INFO_URL = os.getenv( "FEISHU_USER_INFO_URL", "https://open.feishu.cn/open-apis/authen/v1/user_info", ).strip() FEISHU_STATE_COOKIE_NAME = os.getenv("FEISHU_STATE_COOKIE_NAME", "skg_feishu_oauth_state").strip() or "skg_feishu_oauth_state" FEISHU_ALLOWED_EMAIL_DOMAINS = os.getenv("FEISHU_ALLOWED_EMAIL_DOMAINS", "").strip() FEISHU_ALLOWED_EMAILS = os.getenv("FEISHU_ALLOWED_EMAILS", "").strip() FEISHU_ALLOWED_TENANT_KEYS = os.getenv("FEISHU_ALLOWED_TENANT_KEYS", "").strip() AUTH_DATA_ISOLATION_ENABLED = os.getenv("AUTH_DATA_ISOLATION_ENABLED", "true").strip().lower() not in {"0", "false", "no", "off"} VIDEO_QUEUE_MAX_CONCURRENT = max(1, int(os.getenv("VIDEO_QUEUE_MAX_CONCURRENT", "2").strip() or "2")) VIDEO_QUEUE_MAX_CONCURRENT_PER_USER = max(1, int(os.getenv("VIDEO_QUEUE_MAX_CONCURRENT_PER_USER", "1").strip() or "1")) PASSWORD_AUTH_CONFIGURED = PASSWORD_AUTH_ENABLED and bool(WEB_AUTH_USERNAME and WEB_AUTH_PASSWORD and WEB_AUTH_SESSION_SECRET) FEISHU_AUTH_CONFIGURED = bool(FEISHU_APP_ID and FEISHU_APP_SECRET and WEB_AUTH_SESSION_SECRET) WEB_AUTH_CONFIGURED = bool(PASSWORD_AUTH_CONFIGURED or FEISHU_AUTH_CONFIGURED) def default_video_gateway_paths(base_url: str) -> tuple[str, str, str]: base = base_url.strip().rstrip("/").lower() if "ai.skg.com/doubao" in base: return ( "/api/v3/contents/generations/tasks", "/api/v3/contents/generations/tasks/{id}", "/api/v3/contents/generations/tasks/{id}/content", ) if "ark.cn-beijing.volces.com" in base: return ( "/contents/generations/tasks", "/contents/generations/tasks/{id}", "/contents/generations/tasks/{id}/content", ) return ("/videos", "/videos/{id}", "/videos/{id}/content") DEFAULT_VIDEO_CREATE_PATH, DEFAULT_VIDEO_STATUS_PATH, DEFAULT_VIDEO_CONTENT_PATH = default_video_gateway_paths(VIDEO_API_BASE_URL) VIDEO_CREATE_PATH = os.getenv("VIDEO_CREATE_PATH", DEFAULT_VIDEO_CREATE_PATH).strip() or DEFAULT_VIDEO_CREATE_PATH VIDEO_CREATE_PATHS = [ p.strip() for p in os.getenv( "VIDEO_CREATE_PATHS", VIDEO_CREATE_PATH if VIDEO_CREATE_PATH != "/videos" else f"{VIDEO_CREATE_PATH},/videos/generations,/video/generations", ).split(",") if p.strip() ] VIDEO_STATUS_PATH = os.getenv("VIDEO_STATUS_PATH", DEFAULT_VIDEO_STATUS_PATH).strip() or DEFAULT_VIDEO_STATUS_PATH VIDEO_CONTENT_PATH = os.getenv("VIDEO_CONTENT_PATH", DEFAULT_VIDEO_CONTENT_PATH).strip() or DEFAULT_VIDEO_CONTENT_PATH VIDEO_DURATION_FIELD = os.getenv("VIDEO_DURATION_FIELD", "seconds").strip() or "seconds" VIDEO_POLL_TIMEOUT_SECONDS = max(60, int(os.getenv("VIDEO_POLL_TIMEOUT_SECONDS", "900"))) FFMPEG_BIN = os.getenv("FFMPEG_BIN", "").strip() FFPROBE_BIN = os.getenv("FFPROBE_BIN", "").strip() LOCAL_FFMPEG_CANDIDATES = [ "/Applications/Downie 4.app/Contents/Resources/ffmpeg", "/Applications/Permute 3.app/Contents/Resources/ffmpeg", "/Applications/VideoFusion-macOS.app/Contents/Resources/ffmpeg", ] _MEDIA_BIN_CACHE: dict[str, str] = {} # OpenAI 客户端(OpenAI 兼容网关,含 SKG ezlink) from openai import OpenAI _llm_client: OpenAI | None = None _asr_client: OpenAI | None = None _image_client: OpenAI | None = None def ai_http_client(timeout: float = 120) -> httpx.Client: """HTTP client for SKG AI gateway calls. launchd does not reliably inherit interactive-shell proxy variables, so the app also supports an explicit AI_HTTP_PROXY / IMAGE_HTTP_PROXY in api/.env. """ kwargs: dict = {"timeout": timeout} if AI_HTTP_PROXY: kwargs["proxy"] = AI_HTTP_PROXY return httpx.Client(**kwargs) def openai_http_client(timeout: float = 120) -> httpx.Client | None: return ai_http_client(timeout=timeout) if AI_HTTP_PROXY else None def llm() -> OpenAI: global _llm_client if _llm_client is None: if not LLM_API_KEY: raise RuntimeError("LLM_API_KEY 未配置") kwargs = {"base_url": LLM_BASE_URL or None, "api_key": LLM_API_KEY} http_client = openai_http_client() if http_client: kwargs["http_client"] = http_client _llm_client = OpenAI(**kwargs) return _llm_client def asr_llm() -> OpenAI: global _asr_client if _asr_client is None: if not ASR_API_KEY: raise RuntimeError("ASR_API_KEY 或 LLM_API_KEY 未配置") kwargs = {"base_url": ASR_BASE_URL or LLM_BASE_URL or None, "api_key": ASR_API_KEY} http_client = openai_http_client() if http_client: kwargs["http_client"] = http_client _asr_client = OpenAI(**kwargs) return _asr_client def image_llm() -> OpenAI: global _image_client if _image_client is None: if not IMAGE_API_KEY: raise RuntimeError("IMAGE_API_KEY 或 LLM_API_KEY 未配置") kwargs = {"base_url": IMAGE_BASE_URL or None, "api_key": IMAGE_API_KEY} http_client = openai_http_client() if http_client: kwargs["http_client"] = http_client _image_client = OpenAI(**kwargs) return _image_client def product_view_llm() -> OpenAI: return image_llm() if PRODUCT_VIEW_MODEL == GPT_IMAGE_MODEL else llm() # Pipeline 状态: # created → downloading → downloaded(前端“开始”会继续触发音频解析) # → splitting → frames_extracted # → transcribing → transcribed | failed JobStatus = Literal[ "created", "downloading", "downloaded", "splitting", "frames_extracted", "transcribing", "transcribed", "failed", ] KEYFRAME_COUNT = int(os.getenv("KEYFRAME_COUNT", "12")) FrameExtractTarget = Literal["transparent_human", "balanced", "subject", "transition", "expression", "motion"] FrameExtractMode = Literal["replace", "append"] FrameExtractQuality = Literal["auto", "fast", "accurate", "ultra"] AnalyzeTask = tuple[str, int, FrameExtractTarget, FrameExtractMode, FrameExtractQuality] AssetBackground = Literal["white", "black"] AssetSize = Literal["source", "1024", "1536", "2048"] AssetQuality = Literal["hd"] SubjectKind = Literal["object", "living"] SubjectView = str SubjectAssetStatus = Literal["queued", "in_progress", "completed", "failed"] SceneMode = Literal["remove_subject", "similar", "style"] SceneStyle = Literal["source", "premium_product", "clean_studio", "warm_lifestyle", "cinematic"] SceneAssetRole = Literal["scene", "first_frame", "last_frame"] FRAME_TARGET_LABELS: dict[FrameExtractTarget, str] = { "transparent_human": "透明骨架人", "balanced": "综合关键帧", "subject": "清晰主体", "transition": "转场变化", "expression": "表情瞬间", "motion": "动作峰值", } TRANSPARENT_HUMAN_POSITIVE_PROMPT = ( "Target subject: transparent human character, translucent human body, glass-like human body, clear acrylic skin, " "transparent vinyl skin, visible clean white skeleton inside, skeleton visible inside transparent body, " "white bones inside clear body, non-horror skeleton character, friendly transparent humanoid, 3D commercial character, " "premium wellness character, transparent body with visible spine, transparent body with visible rib cage. " "中文目标:透明人体、半透明人体、玻璃人体、亚克力人体、果冻质感人体、外层透明皮肤、身体内部可见骨架、" "透明身体里的白色骨骼、干净白色骨架、非恐怖骷髅人、3D广告角色、透明骨架人、可见脊柱、可见肋骨、" "可见颈椎、可见骨盆、可见四肢骨骼、透明皮肤包裹骨架。" ) TRANSPARENT_HUMAN_NEGATIVE_PROMPT = ( "Avoid: normal human, ordinary skeleton, skeleton only without transparent body, horror skeleton, gore, blood, corpse, " "zombie, organs, veins, autopsy, surgery, hospital, dark horror scene, blurry person, heavily occluded person, " "person too small, product only, background only, no visible skeleton, no transparent body, transparent clothing only. " "反向排除:普通真人、普通骷髅、只有骨架没有透明外壳、恐怖骷髅、血腥、腐烂、僵尸、尸体、器官、血管、" "解剖、医院、手术、黑暗恐怖场景、模糊人物、遮挡严重、人物太远、只有产品没有人、只有背景没有人、" "看不到骨架、看不到透明身体、透明衣服但不是透明身体。" ) TRANSPARENT_HUMAN_QUALIFIED_STANDARD = ( "A qualified frame must satisfy all core conditions: 1) there is a humanoid character; " "2) the outer body is transparent or translucent; 3) a clean white skeleton is clearly visible inside the body; " "4) the transparent body and inner skeleton belong to the same character, not a background overlay; " "5) the character should occupy at least about 35% of frame height and be easy to inspect; " "6) no severe blur, occlusion, or deformation; 7) clean premium commercial wellness style, non-horror." ) FRAME_QUALITY_LABELS: dict[FrameExtractQuality, str] = { "auto": "自动", "fast": "快速", "accurate": "精细", "ultra": "极准", } class GeneratedImage(BaseModel): id: str # uuid hex 12 prompt: str model: str mode: str = "edit" # "edit"(带参考图) | "text"(纯文字) url: str # /jobs/{job_id}/frames/{idx}/gen/{id}.jpg selected: bool = False created_at: float = 0.0 class GeneratedVideo(BaseModel): id: str provider_id: str = "" frame_idx: int storyboard_row_idx: int | None = None prompt: str model: str = "" status: Literal["queued", "in_progress", "completed", "failed"] = "queued" url: str = "" poster_url: str = "" duration: float = 4.0 progress: int = 0 error: str = "" created_at: float = 0.0 queue_position: int = 0 queue_size: int = 0 queue_message: str = "" class VideoSourceRef(BaseModel): kind: Literal["image", "source_video"] = "image" url: str = "" class StoryboardScene(BaseModel): """分镜头编排:每个 selected 分镜对应一个 scene 描述 v2: 4 图槽 + 时长(复制粘贴模式)— 主体 / 场景 / 产品 / 动作 各一张图 v1 字段保留兼容(subject/product/scene/action/reference_ids)""" duration: float = 0 first_image: dict | None = None last_image: dict | None = None product_images: list[dict] = Field(default_factory=list) subject_images: list[dict] = Field(default_factory=list) product_fusion_shots: list[dict] = Field(default_factory=list) visual_mode: Literal["person_only", "person_product", "product_only", "environment"] = "person_product" needs_product: bool = True needs_subject: bool = True storyboard_row_idx: int | None = None subject_brief: str = "" skg_copy_en: str = "" skg_copy_zh: str = "" scene_one_line_en: str = "" scene_one_line_zh: str = "" action_one_line_en: str = "" action_one_line_zh: str = "" selected_video_id: str = "" first_frame_plan: str = "" last_frame_plan: str = "" product_placement: str = "" # 4 图槽:dict 含 {kind, frame_idx, element_id?, cutout_id?, label} subject_image: dict | None = None scene_image: dict | None = None product_image: dict | None = None action_image: dict | None = None # v1 兼容 subject: str = "" product: str = "" scene: str = "" action: str = "" reference_ids: list[str] = [] class StoryboardImage(BaseModel): """用户从各处"上推"到分镜头编排区的图片""" ref_id: str # uuid hex 8 kind: Literal["keyframe", "cutout", "asset"] # asset = 场景 / 主体视角等组图素材 frame_idx: int element_id: str | None = None # cutout 时 cutout_id: str | None = None # cutout 时(versioned id;老数据可能 == element_id) label: str = "" # 显示用名字 created_at: float = 0.0 class QualityReport(BaseModel): width: int = 0 height: int = 0 short_side: int = 0 sharpness: float = 0.0 risk: Literal["ok", "warn", "bad"] = "ok" warnings: list[str] = Field(default_factory=list) class TransparentHumanFrameScore(BaseModel): transparent_body_score: int = 0 skeleton_visible_score: int = 0 human_prominence_score: int = 0 clarity_score: int = 0 commercial_style_score: int = 0 product_usefulness_score: int = 0 total_score: int = 0 qualified: bool = False reject_reason: str = "" notes: str = "" class SceneAsset(BaseModel): id: str label: str = "" url: str = "" width: int = 0 height: int = 0 quality: AssetQuality = "hd" size: AssetSize = "source" scene_mode: SceneMode = "remove_subject" scene_style: SceneStyle = "source" asset_role: SceneAssetRole = "scene" quality_report: QualityReport | None = None created_at: float = 0.0 class SubjectAsset(BaseModel): id: str view: SubjectView label: str = "" url: str = "" width: int = 0 height: int = 0 background: AssetBackground = "white" quality: AssetQuality = "hd" size: AssetSize = "source" source_frame_indices: list[int] = Field(default_factory=list) ai_completed: bool = True status: SubjectAssetStatus = "completed" progress: int = 100 error: str = "" pack_id: str = "" pack_label: str = "" pack_mode: str = "" pack_created_at: float = 0.0 created_at: float = 0.0 class ProductLibraryItem(BaseModel): id: str handle: str title: str product_type: str = "" image_type: str = "gallery" image_index: int = 0 filename: str url: str width: int = 0 height: int = 0 source_path: str = "" white_score: float = 0.0 near_white_score: float = 0.0 has_people: bool = False tags: list[str] = Field(default_factory=list) class CharacterLibraryImage(BaseModel): id: str view: str label: str filename: str width: int = 0 height: int = 0 source_path: str = "" url: str = "" class CharacterLibraryItem(BaseModel): id: str name: str folder: str = "" description: str = "" prompt_brief: str = "" prompt_brief_zh: str = "" primary_image: str = "" images: list[CharacterLibraryImage] = Field(default_factory=list) class SubjectTemplateImage(BaseModel): id: str view: str label: str = "" filename: str url: str = "" width: int = 0 height: int = 0 background: AssetBackground = "white" quality: AssetQuality = "hd" size: AssetSize = "source" source_asset_id: str = "" source_frame_indices: list[int] = Field(default_factory=list) created_at: float = 0.0 class SubjectTemplateItem(BaseModel): id: str name: str description: str = "" note: str = "" prompt_brief: str = "" prompt_brief_zh: str = "" source: Literal["database"] = "database" source_job_id: str = "" source_frame_idx: int = -1 source_element_id: str = "" subject_style: Literal["transparent_human", "source_actor", "cartoon_subject"] = "transparent_human" primary_image: str = "" images: list[SubjectTemplateImage] = Field(default_factory=list) created_at: float = 0.0 updated_at: float = 0.0 PromptLibraryCategory = Literal["scene_desc", "video_desc", "subject_desc", "skg_script", "product_angle"] AssetLibraryKind = Literal["subjects", "products", "scenes", "videos"] class PromptLibraryItem(BaseModel): id: str category: PromptLibraryCategory name: str tags: list[str] = Field(default_factory=list) prompt_en: str = "" prompt_zh: str = "" use_count: int = 0 source_job_id: str = "" created_at: float = 0.0 updated_at: float = 0.0 class PromptLibraryWriteReq(BaseModel): category: PromptLibraryCategory name: str tags: list[str] = Field(default_factory=list) prompt_en: str = "" prompt_zh: str = "" source_job_id: str = "" class PromptLibraryPatchReq(BaseModel): category: PromptLibraryCategory | None = None name: str | None = None tags: list[str] | None = None prompt_en: str | None = None prompt_zh: str | None = None source_job_id: str | None = None class AssetLibraryImage(BaseModel): id: str view: str = "" label: str = "" filename: str url: str = "" width: int = 0 height: int = 0 created_at: float = 0.0 class AssetLibraryItem(BaseModel): id: str kind: AssetLibraryKind name: str name_zh: str = "" note: str = "" tags: list[str] = Field(default_factory=list) source_job_id: str = "" use_count: int = 0 created_at: float = 0.0 updated_at: float = 0.0 is_official: bool = False prompt_brief: str = "" prompt_brief_zh: str = "" subject_style: Literal["transparent_human", "source_actor", "cartoon_subject"] = "transparent_human" product_type: str = "" views: list[AssetLibraryImage] = Field(default_factory=list) images: list[AssetLibraryImage] = Field(default_factory=list) asset_role: str = "" aspect_ratio: str = "" image: AssetLibraryImage | None = None duration: float = 0.0 poster: AssetLibraryImage | None = None video_url: str = "" class AssetLibraryPatchReq(BaseModel): name: str | None = None name_zh: str | None = None note: str | None = None tags: list[str] | None = None source_job_id: str | None = None prompt_brief: str | None = None prompt_brief_zh: str | None = None subject_style: Literal["transparent_human", "source_actor", "cartoon_subject"] | None = None product_type: str | None = None asset_role: str | None = None aspect_ratio: str | None = None is_official: bool | None = None class ProductFusionRegion(BaseModel): x: float = 0 y: float = 0 w: float = 0 h: float = 0 class ProductFusionShot(BaseModel): id: str = "" first_image: dict | None = None last_image: dict | None = None product_images: list[dict] = Field(default_factory=list) product_image: dict | None = None character_id: str = "" character_name: str = "" subject_image: dict | None = None subject_images: list[dict] = Field(default_factory=list) person_image: dict | None = None product_region: ProductFusionRegion | None = None scene_image: dict | None = None action_text: str = "" duration: float = 5 image_model: str = "gpt-image-2" video_model: str = "seedance" guide_image: dict | None = None class KeyElement(BaseModel): """关键帧里识别 / 用户提取的元素 · 多次提取累积多张图,让用户挑选满意的""" id: str # uuid hex 8 name_zh: str name_en: str = "" position: str = "" source: Literal["auto", "manual", "region"] = "manual" region: dict | None = None # 多张提取图 id(每次 cutout 端点累积新 id)→ /jobs/.../elements/{element_id}/cutouts/{cutout_id}.jpg cutouts: list[str] = [] # 旧字段兼容(v1 单图)· 渲染时 fallback 用,新提取不再写入 cutout_id: str | None = None cutout_background: Literal["white", "black"] = "white" subject_kind: SubjectKind = "object" subject_assets: list[SubjectAsset] = Field(default_factory=list) subject_consensus_brief: str = "" subject_consensus_brief_zh: str = "" created_at: float = 0.0 class KeyFrame(BaseModel): index: int timestamp: float url: str description: dict | None = None # vision 模型识别结果 {scene, objects, style, suggested_prompt} transparent_human_score: TransparentHumanFrameScore | None = None cleaned_url: str | None = None # 清洗后干净版(待应用)→ /jobs/{id}/frames/{idx}/cleaned.jpg cleaned_applied: bool = False # 是否已用清洗版替换原图(替换后 cleaned_url=null) quality_report: QualityReport | None = None scene_assets: list[SceneAsset] = Field(default_factory=list) elements: list[KeyElement] = [] # 提取的元素清单(持久化) storyboard: StoryboardScene | None = None # 分镜头编排字段 generated_images: list[GeneratedImage] = [] class TranscriptSegment(BaseModel): index: int start: float end: float en: str zh: str = "" class AudioScript(BaseModel): status: Literal["idle", "rewriting", "completed", "failed"] = "idle" source_text: str = "" source_zh: str = "" rewritten_text: str = "" rewritten_text_zh: str = "" speaker_profile: str = "" rhythm_profile: str = "" background_audio_profile: str = "" product_brief: str = "" rewrite_model: str = "" voice_provider: str = "" voice_model: str = "" voice_id: str = "" voice_url: str = "" error: str = "" created_at: float = 0.0 class SubjectAgentAnalysis(BaseModel): model_config = ConfigDict(protected_namespaces=()) model_bundle: SubjectModelBundle = "gpt" model: str = "" source_frame_indices: list[int] = Field(default_factory=list) summary_zh: str = "" summary_en: str = "" generation_brief_en: str = "" trait_chips: list[str] = Field(default_factory=list) mode_options: list[str] = Field(default_factory=list) questions: list[str] = Field(default_factory=list) warnings: list[str] = Field(default_factory=list) created_at: float = 0.0 class SubjectAgentMessage(BaseModel): role: Literal["user", "assistant"] = "assistant" content: str = "" created_at: float = 0.0 class SubjectAgentState(BaseModel): model_config = ConfigDict(protected_namespaces=()) model_bundle: SubjectModelBundle = "gpt" source_frame_indices: list[int] = Field(default_factory=list) analysis: SubjectAgentAnalysis | None = None messages: list[SubjectAgentMessage] = Field(default_factory=list) selected_mode: SubjectAgentMode = "custom" selected_traits: list[str] = Field(default_factory=list) requirements_zh: str = "" generation_prompt_en: str = "" quantity: int = 6 updated_at: float = 0.0 class Job(BaseModel): id: str url: str owner_id: str = "" owner_name: str = "" owner_email: str = "" owner_provider: str = "" tenant_key: str = "" status: JobStatus = "created" progress: int = 0 message: str = "" video_url: str = "" duration: float = 0.0 width: int = 0 height: int = 0 source_audio_url: str = "" frames: list[KeyFrame] = Field(default_factory=list) transcript: list[TranscriptSegment] = Field(default_factory=list) audio_script: AudioScript = Field(default_factory=AudioScript) storyboard_images: list[StoryboardImage] = Field(default_factory=list) generated_videos: list[GeneratedVideo] = Field(default_factory=list) product_refs: list[dict] = Field(default_factory=list) subject_agent: SubjectAgentState = Field(default_factory=SubjectAgentState) error: str = "" class AuthLoginPayload(BaseModel): username: str password: str remember: bool = False JOBS: dict[str, Job] = {} ANALYZE_QUEUE: list[AnalyzeTask] = [] ANALYZE_WORKER_RUNNING = False AUDIO_WORKERS_RUNNING: set[str] = set() AUDIO_WORKERS_LOCK = threading.Lock() @dataclass class VideoQueueTask: job_id: str video_id: str owner_id: str args: tuple created_at: float VIDEO_QUEUE: list[VideoQueueTask] = [] VIDEO_QUEUE_RUNNING: dict[str, VideoQueueTask] = {} VIDEO_QUEUE_LOCK = threading.Lock() def ensure_auth_configured() -> None: if not WEB_AUTH_CONFIGURED: raise HTTPException(503, "WEB_AUTH_SESSION_SECRET 以及账号密码或飞书 OAuth 未配置") def ensure_password_auth_configured() -> None: if not PASSWORD_AUTH_CONFIGURED: raise HTTPException(503, "账号密码登录未配置") def ensure_feishu_auth_configured() -> None: if not FEISHU_AUTH_CONFIGURED: raise HTTPException(503, "飞书免登录未配置") def _auth_signature(body: str) -> str: return hmac.new(WEB_AUTH_SESSION_SECRET.encode("utf-8"), body.encode("utf-8"), hashlib.sha256).hexdigest() def _encode_auth_body(payload: dict) -> str: raw = json.dumps(payload, ensure_ascii=False, separators=(",", ":")).encode("utf-8") return base64.urlsafe_b64encode(raw).decode("ascii").rstrip("=") def _decode_auth_body(body: str) -> dict: padded = body + "=" * (-len(body) % 4) raw = base64.urlsafe_b64decode(padded.encode("ascii")) data = json.loads(raw.decode("utf-8")) return data if isinstance(data, dict) else {} def _csv_values(raw: str) -> set[str]: return {item.strip().lower() for item in raw.split(",") if item.strip()} def _normalize_next_url(value: str | None) -> str: value = (value or "/").strip() or "/" if not value.startswith("/") or value.startswith("//"): return "/" return value def _public_base_url(request: Request) -> str: proto = request.headers.get("x-forwarded-proto") or request.url.scheme host = request.headers.get("host") or request.url.netloc return f"{proto}://{host}".rstrip("/") def _feishu_redirect_uri(request: Request) -> str: if FEISHU_REDIRECT_URI: return FEISHU_REDIRECT_URI return f"{_public_base_url(request)}/api/auth/feishu/callback" def _session_user_id(payload: dict | None) -> str: payload = payload or {} explicit = str(payload.get("uid") or "").strip() if explicit: return explicit provider = str(payload.get("provider") or "").strip().lower() if provider == "feishu": for key in ("open_id", "union_id", "email", "u"): value = str(payload.get(key) or "").strip() if value: return f"feishu:{value.lower() if key == 'email' else value}" username = str(payload.get("u") or "").strip() or "anonymous" return f"password:{username}" def _public_session(payload: dict) -> dict: return { "uid": _session_user_id(payload), "provider": str(payload.get("provider") or "password"), "username": str(payload.get("u") or payload.get("name") or ""), "name": str(payload.get("name") or payload.get("u") or ""), "email": str(payload.get("email") or ""), "open_id": str(payload.get("open_id") or ""), "union_id": str(payload.get("union_id") or ""), "tenant_key": str(payload.get("tenant_key") or ""), "avatar_url": str(payload.get("avatar_url") or ""), } def make_auth_token(user: str | dict, ttl_seconds: int) -> str: if isinstance(user, str): payload = { "u": user, "name": user, "provider": "password", "uid": f"password:{user}", } else: payload = dict(user) payload["uid"] = _session_user_id(payload) payload.setdefault("u", payload.get("name") or payload.get("email") or payload["uid"]) payload.setdefault("name", payload.get("u") or payload["uid"]) payload.update({ "exp": int(time.time()) + ttl_seconds, "n": secrets.token_hex(8), }) body = _encode_auth_body(payload) return f"{body}.{_auth_signature(body)}" def verify_auth_token(token: str) -> dict | None: if not WEB_AUTH_CONFIGURED or "." not in token: return None body, supplied_sig = token.rsplit(".", 1) if not hmac.compare_digest(_auth_signature(body), supplied_sig): return None try: payload = _decode_auth_body(body) username = str(payload.get("u") or "") expires_at = int(payload.get("exp") or 0) except Exception: return None if expires_at < int(time.time()): return None provider = str(payload.get("provider") or "").strip().lower() if not provider: provider = "password" if username else "" if provider == "password": if not PASSWORD_AUTH_CONFIGURED or username != WEB_AUTH_USERNAME: return None payload["provider"] = "password" payload["uid"] = f"password:{username}" payload.setdefault("name", username) return _public_session(payload) if provider == "feishu": if not FEISHU_AUTH_CONFIGURED: return None payload["provider"] = "feishu" payload["uid"] = _session_user_id(payload) return _public_session(payload) return None def auth_session_from_request(request: Request) -> dict | None: token = request.cookies.get(WEB_AUTH_COOKIE_NAME, "") return verify_auth_token(token) def auth_username_from_request(request: Request) -> str | None: session = auth_session_from_request(request) return str(session.get("username") or session.get("name") or session.get("uid")) if session else None def data_user_from_request(request: Request) -> dict: session = auth_session_from_request(request) if session: return session if not WEB_AUTH_CONFIGURED: return {"uid": "local:dev", "provider": "local", "username": "local-dev", "name": "local-dev", "email": "", "tenant_key": ""} raise HTTPException(401, "unauthorized") def _is_password_session(user: dict | None) -> bool: return bool(user and str(user.get("provider") or "") == "password") def assign_owner(model: Job | "AgentRun", user: dict) -> None: model.owner_id = _session_user_id(user) model.owner_name = str(user.get("name") or user.get("username") or model.owner_id) model.owner_email = str(user.get("email") or "") model.owner_provider = str(user.get("provider") or "") model.tenant_key = str(user.get("tenant_key") or "") def user_can_access_job(job: Job | None, user: dict | None) -> bool: if not job: return False if not AUTH_DATA_ISOLATION_ENABLED or not WEB_AUTH_CONFIGURED: return True owner_id = str(getattr(job, "owner_id", "") or "").strip() if owner_id: return bool(user and owner_id == _session_user_id(user)) return _is_password_session(user) def _load_agent_run_for_access(run_id: str): run = AGENT_RUNS.get(run_id) if not run and agent_run_path(run_id).exists(): try: run = AgentRun.model_validate_json(agent_run_path(run_id).read_text(encoding="utf-8")) AGENT_RUNS[run_id] = run except Exception: return None return run def user_can_access_agent_run(run_id: str, user: dict | None) -> bool: if not AUTH_DATA_ISOLATION_ENABLED or not WEB_AUTH_CONFIGURED: return True run = _load_agent_run_for_access(run_id) if not run: return False owner_id = str(getattr(run, "owner_id", "") or "").strip() if owner_id: return bool(user and owner_id == _session_user_id(user)) return user_can_access_job(JOBS.get(run.job_id), user) or _is_password_session(user) JOB_PATH_RE = re.compile(r"^/jobs/([0-9a-f]{8,32})(?:/|$)") COPY_TO_JOB_PATH_RE = re.compile(r"^/asset-library/[^/]+/[^/]+/copy-to-job/([0-9a-f]{8,32})(?:/|$)") AGENT_RUN_PATH_RE = re.compile(r"^/agent-runs/([0-9a-f]{8,32})(?:/|$)") def _extract_protected_job_id(path: str) -> str: for pattern in (JOB_PATH_RE, COPY_TO_JOB_PATH_RE): match = pattern.match(path) if match: return match.group(1) return "" def _feishu_oauth_state(next_url: str) -> str: body = _encode_auth_body({ "kind": "feishu_oauth_state", "next": _normalize_next_url(next_url), "exp": int(time.time()) + 600, "n": secrets.token_hex(12), }) return f"{body}.{_auth_signature(body)}" def _verify_feishu_oauth_state(token: str) -> dict | None: if not token or "." not in token: return None body, supplied_sig = token.rsplit(".", 1) if not hmac.compare_digest(_auth_signature(body), supplied_sig): return None try: payload = _decode_auth_body(body) except Exception: return None if payload.get("kind") != "feishu_oauth_state" or int(payload.get("exp") or 0) < int(time.time()): return None payload["next"] = _normalize_next_url(str(payload.get("next") or "/")) return payload def _feishu_authorize_url(request: Request, state: str) -> str: params = { "client_id": FEISHU_APP_ID, "redirect_uri": _feishu_redirect_uri(request), "response_type": "code", "state": state, } if FEISHU_OAUTH_SCOPE: params["scope"] = FEISHU_OAUTH_SCOPE return f"{FEISHU_AUTHORIZE_URL}?{urlencode(params)}" def _exchange_feishu_code(code: str, redirect_uri: str) -> str: payload = { "grant_type": "authorization_code", "client_id": FEISHU_APP_ID, "client_secret": FEISHU_APP_SECRET, "code": code, "redirect_uri": redirect_uri, } with httpx.Client(timeout=20) as client: response = client.post(FEISHU_TOKEN_URL, json=payload) response.raise_for_status() data = response.json() if data.get("code") not in (None, 0, "0"): raise HTTPException(401, f"飞书授权失败:{data.get('msg') or data.get('message') or data.get('code')}") token_data = data.get("data") if isinstance(data.get("data"), dict) else data token = str( token_data.get("access_token") or token_data.get("user_access_token") or token_data.get("accessToken") or "" ).strip() if not token: raise HTTPException(401, "飞书授权未返回 user_access_token") return token def _fetch_feishu_user(access_token: str) -> dict: with httpx.Client(timeout=20) as client: response = client.get(FEISHU_USER_INFO_URL, headers={"Authorization": f"Bearer {access_token}"}) response.raise_for_status() data = response.json() if data.get("code") not in (None, 0, "0"): raise HTTPException(401, f"飞书用户信息获取失败:{data.get('msg') or data.get('message') or data.get('code')}") user = data.get("data") if isinstance(data.get("data"), dict) else data if not isinstance(user, dict): raise HTTPException(401, "飞书用户信息格式异常") return user def _build_feishu_session(user: dict) -> dict: email = str(user.get("email") or user.get("enterprise_email") or "").strip().lower() open_id = str(user.get("open_id") or "").strip() union_id = str(user.get("union_id") or "").strip() tenant_key = str(user.get("tenant_key") or "").strip() name = str(user.get("name") or user.get("en_name") or user.get("nickname") or email or open_id or union_id or "Feishu User").strip() avatar_url = str( user.get("avatar_url") or user.get("avatar_thumb") or user.get("avatar_middle") or user.get("avatar_big") or "" ).strip() session = { "provider": "feishu", "u": name, "name": name, "email": email, "open_id": open_id, "union_id": union_id, "tenant_key": tenant_key, "avatar_url": avatar_url, } session["uid"] = _session_user_id(session) return session def _validate_feishu_session(session: dict) -> None: allowed_emails = _csv_values(FEISHU_ALLOWED_EMAILS) allowed_domains = {item.lstrip("@") for item in _csv_values(FEISHU_ALLOWED_EMAIL_DOMAINS)} allowed_tenants = _csv_values(FEISHU_ALLOWED_TENANT_KEYS) email = str(session.get("email") or "").lower() domain = email.rsplit("@", 1)[1] if "@" in email else "" tenant_key = str(session.get("tenant_key") or "").lower() if allowed_emails and email not in allowed_emails: raise HTTPException(403, "当前飞书账号不在允许登录名单") if allowed_domains and domain not in allowed_domains: raise HTTPException(403, "当前飞书账号邮箱域不允许登录") if allowed_tenants and tenant_key not in allowed_tenants: raise HTTPException(403, "当前飞书租户不允许登录") def job_dir(job_id: str) -> Path: d = JOBS_DIR / job_id d.mkdir(parents=True, exist_ok=True) return d def source_audio_url_for(job_id: str) -> str: return f"/jobs/{job_id}/audio.wav" if (JOBS_DIR / job_id / "audio.wav").exists() else "" def job_with_artifacts(job: Job) -> Job: updates = {"source_audio_url": source_audio_url_for(job.id)} if not job.video_url and (JOBS_DIR / job.id / "source.mp4").exists(): updates["video_url"] = f"/jobs/{job.id}/video.mp4" return job.model_copy(update=updates) def save_state(job: Job) -> None: state_path = job_dir(job.id) / "state.json" state_path.write_text(job.model_dump_json(indent=2)) db.index_job(job.model_dump(), str(state_path)) def update(job: Job, **kw) -> None: for k, v in kw.items(): setattr(job, k, v) save_state(job) def public_api_base() -> str: return (LLM_BASE_URL or "https://api.openai.com/v1").rstrip("/") def video_uses_poe() -> bool: if VIDEO_API_BASE_URL: return VIDEO_API_BASE_URL.rstrip("/") == POE_API_BASE_URL.rstrip("/") return bool(POE_API_KEY) def video_uses_ark() -> bool: base = video_api_base() return "ark.cn-beijing.volces.com" in base or "ai.skg.com/doubao" in base def video_provider_name() -> str: base = video_api_base() if video_uses_poe(): return "poe" if "ai.skg.com/doubao" in base: return "doubao" if "ark.cn-beijing.volces.com" in base: return "ark" return "custom" def video_api_base() -> str: if VIDEO_API_BASE_URL: return VIDEO_API_BASE_URL.rstrip("/") if POE_API_KEY: return POE_API_BASE_URL.rstrip("/") return (LLM_BASE_URL or "https://api.openai.com/v1").rstrip("/") def video_api_key() -> str: if VIDEO_API_KEY: return VIDEO_API_KEY if video_uses_poe(): return POE_API_KEY return LLM_API_KEY def video_path(template: str, **values: str) -> str: path = template.format(**values) return path if path.startswith("/") else f"/{path}" def ensure_video_api_configured() -> None: if not video_api_key(): raise HTTPException(503, "POE_API_KEY、VIDEO_API_KEY 或 LLM_API_KEY 未配置,无法调用生视频 API") def storyboard_ref_path(job_id: str, ref: dict | None) -> Path | None: if not ref: return None try: kind = ref.get("kind") frame_idx = int(ref.get("frame_idx")) except Exception: return None if kind == "keyframe": clean = job_dir(job_id) / "cleaned" / f"{frame_idx:03d}.jpg" if clean.exists(): return clean p = job_dir(job_id) / "frames" / f"{frame_idx:03d}.jpg" return p if p.exists() else None if kind == "cutout": element_id = (ref.get("element_id") or "").strip() cutout_id = (ref.get("cutout_id") or "").strip() if not element_id: return None candidates = [] if cutout_id and cutout_id != element_id: candidates.append(job_dir(job_id) / "elements" / f"{frame_idx:03d}_{element_id}_{cutout_id}.jpg") candidates.append(job_dir(job_id) / "elements" / f"{frame_idx:03d}_{element_id}.jpg") candidates.append(job_dir(job_id) / "elements" / f"{frame_idx:03d}_{element_id}.png") for p in candidates: if p.exists(): return p if kind == "asset": asset_id = (ref.get("element_id") or ref.get("cutout_id") or "").strip() if not asset_id: return None p = job_dir(job_id) / "assets" / f"{asset_id}.jpg" return p if p.exists() else None return None def load_product_library_items() -> list[ProductLibraryItem]: if not PRODUCT_LIBRARY_MANIFEST.exists(): return [] try: data = json.loads(PRODUCT_LIBRARY_MANIFEST.read_text(encoding="utf-8")) return [ProductLibraryItem(**item) for item in data.get("items", [])] except Exception as e: raise HTTPException(500, f"product library manifest invalid: {e}") def find_product_library_item(product_id: str) -> ProductLibraryItem: product_id = product_id.strip() for item in load_product_library_items(): if item.id == product_id: return item raise HTTPException(404, "product library item not found") def product_library_file(item: ProductLibraryItem) -> Path: p = (PRODUCT_LIBRARY_DIR / item.filename).resolve() try: p.relative_to(PRODUCT_LIBRARY_DIR) except ValueError: raise HTTPException(400, "invalid product library path") if not p.exists(): raise HTTPException(404, "product library image missing") return p def load_character_library_items() -> list[CharacterLibraryItem]: if not CHARACTER_LIBRARY_MANIFEST.exists(): return [] try: data = json.loads(CHARACTER_LIBRARY_MANIFEST.read_text(encoding="utf-8")) items: list[CharacterLibraryItem] = [] for raw in data.get("characters", []): item = CharacterLibraryItem(**raw) for image in item.images: image.url = f"/character-library/skg/images/{image.filename}" items.append(item) return items except Exception as e: raise HTTPException(500, f"character library manifest invalid: {e}") def find_character_library_item(character_id: str) -> CharacterLibraryItem: character_id = character_id.strip() for item in load_character_library_items(): if item.id == character_id: return item raise HTTPException(404, "character library item not found") def character_library_file(filename: str) -> Path: p = (CHARACTER_LIBRARY_DIR / filename).resolve() try: p.relative_to(CHARACTER_LIBRARY_DIR) except ValueError: raise HTTPException(400, "invalid character library path") if not p.exists(): raise HTTPException(404, "character library image missing") return p def load_subject_template_items() -> list[SubjectTemplateItem]: if not SUBJECT_TEMPLATE_MANIFEST.exists(): return [] try: data = json.loads(SUBJECT_TEMPLATE_MANIFEST.read_text(encoding="utf-8")) items: list[SubjectTemplateItem] = [] for raw in data.get("templates", []): item = SubjectTemplateItem(**raw) for image in item.images: image.url = f"/subject-templates/images/{image.filename}" items.append(item) items.sort(key=lambda item: item.updated_at or item.created_at, reverse=True) return items except Exception as e: raise HTTPException(500, f"subject template manifest invalid: {e}") def save_subject_template_items(items: list[SubjectTemplateItem]) -> None: SUBJECT_TEMPLATE_MANIFEST.parent.mkdir(parents=True, exist_ok=True) SUBJECT_TEMPLATE_MANIFEST.write_text( json.dumps({"templates": [item.model_dump() for item in items]}, ensure_ascii=False, indent=2), encoding="utf-8", ) def find_subject_template_item(template_id: str) -> SubjectTemplateItem: template_id = template_id.strip() for item in load_subject_template_items(): if item.id == template_id: return item raise HTTPException(404, "subject template not found") def subject_template_image_file(filename: str) -> Path: p = (SUBJECT_TEMPLATE_IMAGE_DIR / filename).resolve() try: p.relative_to(SUBJECT_TEMPLATE_IMAGE_DIR) except ValueError: raise HTTPException(400, "invalid subject template image path") if not p.exists(): raise HTTPException(404, "subject template image missing") return p def _now_ts() -> float: return time.time() def _safe_tags(value: object) -> list[str]: if not isinstance(value, list): return [] return [str(item).strip() for item in value if str(item).strip()][:20] def _library_media_size(path: Path) -> tuple[int, int]: try: with Image.open(path) as im: return im.width, im.height except Exception: return 0, 0 def _library_kind_dir(kind: AssetLibraryKind | str) -> Path: if kind not in {"subjects", "products", "scenes", "videos"}: raise HTTPException(404, "asset library kind not found") return ASSET_LIBRARY_DIR / str(kind) def _asset_library_item_dir(kind: AssetLibraryKind | str, item_id: str) -> Path: item_id = item_id.strip() if not item_id or "/" in item_id or ".." in item_id: raise HTTPException(400, "invalid asset library id") return _library_kind_dir(kind) / item_id def _prompt_library_item_path(item_id: str) -> Path: item_id = item_id.strip() if not item_id or "/" in item_id or ".." in item_id: raise HTTPException(400, "invalid prompt library id") return PROMPT_LIBRARY_ITEMS_DIR / f"{item_id}.json" def _prompt_item_file(item: PromptLibraryItem) -> Path: return _prompt_library_item_path(item.id) def _asset_item_file(item: AssetLibraryItem) -> Path: return _asset_library_item_dir(item.kind, item.id) / "manifest.json" def _write_prompt_item(item: PromptLibraryItem) -> None: PROMPT_LIBRARY_ITEMS_DIR.mkdir(parents=True, exist_ok=True) _prompt_item_file(item).write_text(item.model_dump_json(indent=2), encoding="utf-8") _write_prompt_library_index() db.index_prompt_item(item.model_dump()) def _write_asset_item(item: AssetLibraryItem) -> None: p = _asset_item_file(item) p.parent.mkdir(parents=True, exist_ok=True) p.write_text(item.model_dump_json(indent=2), encoding="utf-8") _write_asset_library_index() db.index_asset_item(item.model_dump()) def _read_prompt_item(path: Path) -> PromptLibraryItem | None: try: return PromptLibraryItem.model_validate_json(path.read_text(encoding="utf-8")) except Exception: return None def _read_asset_item(path: Path) -> AssetLibraryItem | None: try: item = AssetLibraryItem.model_validate_json(path.read_text(encoding="utf-8")) _hydrate_asset_library_urls(item) return item except Exception: return None def load_prompt_library_items() -> list[PromptLibraryItem]: items = [_read_prompt_item(path) for path in PROMPT_LIBRARY_ITEMS_DIR.glob("*.json")] result = [item for item in items if item] result.sort(key=lambda item: item.updated_at or item.created_at, reverse=True) return result def load_asset_library_items(kind: AssetLibraryKind | str | None = None) -> list[AssetLibraryItem]: kinds = [kind] if kind else ["subjects", "products", "scenes", "videos"] items: list[AssetLibraryItem] = [] for current in kinds: root = _library_kind_dir(str(current)) for manifest in root.glob("*/manifest.json"): item = _read_asset_item(manifest) if item: items.append(item) items.sort(key=lambda item: item.updated_at or item.created_at, reverse=True) return items def _write_prompt_library_index() -> None: PROMPT_LIBRARY_DIR.mkdir(parents=True, exist_ok=True) items = [item.model_dump() for item in load_prompt_library_items()] (PROMPT_LIBRARY_DIR / "index.json").write_text(json.dumps({"items": items}, ensure_ascii=False, indent=2), encoding="utf-8") def _write_asset_library_index() -> None: ASSET_LIBRARY_DIR.mkdir(parents=True, exist_ok=True) items = [item.model_dump() for item in load_asset_library_items()] (ASSET_LIBRARY_DIR / "index.json").write_text(json.dumps({"items": items}, ensure_ascii=False, indent=2), encoding="utf-8") def _rebuild_library_index() -> None: _write_prompt_library_index() _write_asset_library_index() def _hydrate_asset_library_urls(item: AssetLibraryItem) -> None: for image in item.images: image.url = f"/asset-library/{item.kind}/{item.id}/file/{image.filename}" for image in item.views: image.url = f"/asset-library/{item.kind}/{item.id}/file/{image.filename}" if item.image: item.image.url = f"/asset-library/{item.kind}/{item.id}/file/{item.image.filename}" if item.poster: item.poster.url = f"/asset-library/{item.kind}/{item.id}/file/{item.poster.filename}" if item.kind == "videos": item.video_url = f"/asset-library/videos/{item.id}/file/video.mp4" def find_prompt_library_item(item_id: str) -> PromptLibraryItem: item = _read_prompt_item(_prompt_library_item_path(item_id)) if not item: raise HTTPException(404, "prompt library item not found") return item def find_asset_library_item(kind: AssetLibraryKind | str, item_id: str) -> AssetLibraryItem: item = _read_asset_item(_asset_library_item_dir(kind, item_id) / "manifest.json") if not item: raise HTTPException(404, "asset library item not found") return item def _library_item_file_path(item: AssetLibraryItem, filename: str) -> Path: filename = filename.strip().lstrip("/") base = _asset_library_item_dir(item.kind, item.id).resolve() p = (base / filename).resolve() try: p.relative_to(base) except ValueError: raise HTTPException(400, "invalid library file path") if not p.exists(): raise HTTPException(404, "library file not found") return p def _asset_library_search(items: list[AssetLibraryItem], q: str) -> list[AssetLibraryItem]: needle = q.strip().lower() if not needle: return items return [ item for item in items if needle in " ".join([ item.name, item.name_zh, item.note, item.prompt_brief, item.prompt_brief_zh, item.product_type, item.asset_role, " ".join(item.tags), ]).lower() ] def _prompt_library_search(items: list[PromptLibraryItem], q: str) -> list[PromptLibraryItem]: needle = q.strip().lower() if not needle: return items return [ item for item in items if needle in " ".join([item.name, item.prompt_en, item.prompt_zh, " ".join(item.tags)]).lower() ] def _library_ref_usage(kind: str, item_id: str) -> dict: refs: set[str] = set() library_kinds = { "subjects": "library_subject", "products": "library_product", "scenes": "library_scene", "videos": "library_video", } token = item_id ref_kind = library_kinds.get(kind, "") for job_id, job in JOBS.items(): raw = job.model_dump_json() if token in raw and (not ref_kind or ref_kind in raw): refs.add(job_id) for state_path in JOBS_DIR.glob("*/state.json"): if state_path.parent.name in refs: continue try: raw = state_path.read_text(encoding="utf-8") except Exception: continue if token in raw and (not ref_kind or ref_kind in raw): refs.add(state_path.parent.name) return {"count": len(refs), "jobs": sorted(refs)} def _copy_library_file_to_job(src: Path, job_id: str, label: str = "") -> dict: if job_id not in JOBS: raise HTTPException(404, "job not found") asset_id = f"asset_{uuid.uuid4().hex[:12]}" out_dir = job_dir(job_id) / "assets" out_dir.mkdir(parents=True, exist_ok=True) out = out_dir / f"{asset_id}.jpg" if src.suffix.lower() in {".jpg", ".jpeg", ".png", ".webp"}: try: with Image.open(src) as im: im.convert("RGB").save(out, "JPEG", quality=94) except Exception: shutil.copy2(src, out) else: shutil.copy2(src, out) width, height = _library_media_size(out) return { "kind": "asset", "frame_idx": -1, "element_id": asset_id, "cutout_id": asset_id, "label": label, "asset_meta": {"width": width, "height": height, "standard": "library-copy"}, } def _copy_library_to_job(kind: AssetLibraryKind | str, item_id: str, job_id: str) -> dict: item = find_asset_library_item(kind, item_id) item_dir = _asset_library_item_dir(kind, item_id) if item.kind == "videos": if job_id not in JOBS: raise HTTPException(404, "job not found") src = item_dir / "video.mp4" if not src.exists(): raise HTTPException(404, "library video missing") out_dir = job_dir(job_id) / "storyboard-videos" out_dir.mkdir(parents=True, exist_ok=True) video_id = f"library_{uuid.uuid4().hex[:12]}" out = out_dir / f"{video_id}.mp4" shutil.copy2(src, out) item.use_count += 1 item.updated_at = _now_ts() _write_asset_item(item) return {"kind": "video", "video_id": video_id, "url": f"/jobs/{job_id}/storyboard-videos/{video_id}.mp4", "label": item.name} image = item.image or item.poster or next(iter(item.views or item.images), None) if not image: raise HTTPException(404, "library image missing") result = _copy_library_file_to_job(item_dir / image.filename, job_id, item.name) item.use_count += 1 item.updated_at = _now_ts() _write_asset_item(item) return result def storyboard_ref_url(job_id: str, ref: dict | None) -> str: if not ref: return "" kind = ref.get("kind") frame_idx = ref.get("frame_idx") if kind == "keyframe" and frame_idx is not None: return f"/jobs/{job_id}/frames/{int(frame_idx)}.jpg" if kind == "cutout" and frame_idx is not None and ref.get("element_id"): element_id = ref.get("element_id") cutout_id = ref.get("cutout_id") if cutout_id and cutout_id != element_id: return f"/jobs/{job_id}/frames/{int(frame_idx)}/elements/{element_id}/cutouts/{cutout_id}.jpg" return f"/jobs/{job_id}/frames/{int(frame_idx)}/elements/{element_id}/cutout.jpg" if kind == "asset" and ref.get("element_id"): return f"/jobs/{job_id}/assets/{ref.get('element_id')}.jpg" if kind in {"library_subject", "library_product", "library_scene"}: return "" return "" def prepare_video_reference(src: Path, dst: Path, size: tuple[int, int] = (720, 1280)) -> None: dst.parent.mkdir(parents=True, exist_ok=True) img = Image.open(src).convert("RGB") img.thumbnail(size, Image.Resampling.LANCZOS) canvas = Image.new("RGB", size, (8, 8, 10)) x = (size[0] - img.width) // 2 y = (size[1] - img.height) // 2 canvas.paste(img, (x, y)) canvas.save(dst, "JPEG", quality=94) def update_generated_video(job_id: str, video_id: str, **kw) -> None: job = JOBS.get(job_id) if not job: return updated = [] for v in job.generated_videos: if v.id == video_id: data = v.model_dump() data.update(kw) updated.append(GeneratedVideo(**data)) else: updated.append(v) update(job, generated_videos=updated) def generated_video_exists(job_id: str, video_id: str) -> bool: job = JOBS.get(job_id) return bool(job and any(v.id == video_id for v in job.generated_videos)) def _video_queue_owner(job: Job) -> str: return (job.owner_id or f"job:{job.id}").strip() def _video_queue_owner_running_locked(owner_id: str) -> bool: return any(task.owner_id == owner_id for task in VIDEO_QUEUE_RUNNING.values()) def _refresh_video_queue_positions_locked() -> None: queue_size = len(VIDEO_QUEUE) for position, task in enumerate(VIDEO_QUEUE, start=1): if not generated_video_exists(task.job_id, task.video_id): continue if _video_queue_owner_running_locked(task.owner_id): message = "排队中 · 你的上一个视频生成中" elif position == 1: message = "排队中 · 即将开始" else: message = f"排队中 · 前方 {position - 1} 个任务" update_generated_video( task.job_id, task.video_id, status="queued", progress=0, queue_position=position, queue_size=queue_size, queue_message=message, ) def _video_queue_running_by_owner_locked() -> dict[str, int]: counts: dict[str, int] = {} for task in VIDEO_QUEUE_RUNNING.values(): counts[task.owner_id] = counts.get(task.owner_id, 0) + 1 return counts def dispatch_video_queue() -> None: tasks_to_start: list[VideoQueueTask] = [] with VIDEO_QUEUE_LOCK: running_by_owner = _video_queue_running_by_owner_locked() while len(VIDEO_QUEUE_RUNNING) < VIDEO_QUEUE_MAX_CONCURRENT: selected_index = -1 for index, task in enumerate(VIDEO_QUEUE): if not generated_video_exists(task.job_id, task.video_id): selected_index = index break if running_by_owner.get(task.owner_id, 0) < VIDEO_QUEUE_MAX_CONCURRENT_PER_USER: selected_index = index break if selected_index < 0: break task = VIDEO_QUEUE.pop(selected_index) if not generated_video_exists(task.job_id, task.video_id): continue VIDEO_QUEUE_RUNNING[task.video_id] = task running_by_owner[task.owner_id] = running_by_owner.get(task.owner_id, 0) + 1 update_generated_video( task.job_id, task.video_id, status="in_progress", progress=1, queue_position=0, queue_size=len(VIDEO_QUEUE), queue_message="准备素材…", ) tasks_to_start.append(task) _refresh_video_queue_positions_locked() for task in tasks_to_start: threading.Thread(target=_run_video_queue_task, args=(task,), daemon=True).start() def _run_video_queue_task(task: VideoQueueTask) -> None: try: if generated_video_exists(task.job_id, task.video_id): render_storyboard_video(*task.args) finally: with VIDEO_QUEUE_LOCK: VIDEO_QUEUE_RUNNING.pop(task.video_id, None) _refresh_video_queue_positions_locked() dispatch_video_queue() def enqueue_video_task(job: Job, video_id: str, task_args: tuple) -> None: task = VideoQueueTask( job_id=job.id, video_id=video_id, owner_id=_video_queue_owner(job), args=task_args, created_at=time.time(), ) with VIDEO_QUEUE_LOCK: VIDEO_QUEUE.append(task) _refresh_video_queue_positions_locked() dispatch_video_queue() def cancel_queued_video_task(job_id: str, video_id: str) -> bool: removed = False with VIDEO_QUEUE_LOCK: before = len(VIDEO_QUEUE) VIDEO_QUEUE[:] = [task for task in VIDEO_QUEUE if not (task.job_id == job_id and task.video_id == video_id)] removed = len(VIDEO_QUEUE) != before if removed: _refresh_video_queue_positions_locked() if removed: dispatch_video_queue() return removed @asynccontextmanager async def lifespan(_: FastAPI): db_ready = db.init_schema() try: _rebuild_library_index() except Exception as e: print(f"[resource library] index rebuild failed: {e}", flush=True) # 启动时从磁盘恢复 jobs(简化版:只列目录) for p in JOBS_DIR.iterdir(): if p.is_dir() and (p / "state.json").exists(): try: job = Job.model_validate_json((p / "state.json").read_text()) source_exists = (p / "source.mp4").exists() if job.status in {"created", "downloading"}: if source_exists: update(job, status="downloaded", progress=25, error="", message="服务重启 · 视频已恢复,可重新解析") else: update(job, status="failed", message="服务重启 · 下载任务已中断,请重新提交") elif job.status == "splitting": update( job, status="frames_extracted" if job.frames else "downloaded", progress=70 if job.frames else 25, error="", message="服务重启 · 上次抽帧已中断,可重新抽帧", ) elif job.status == "transcribing": audio_script = job.audio_script if audio_script.status == "rewriting": audio_script = audio_script.model_copy(update={ "status": "failed", "error": "服务重启 · 上次音频改写/配音已中断,可重新处理", "created_at": audio_script.created_at or time.time(), }) update( job, status="frames_extracted", progress=70, error="", audio_script=audio_script, message="服务重启 · 上次音频处理已中断,可重新处理", ) subject_generation_interrupted = False recovered_frames = [] for f in job.frames: for e in f.elements or []: recovered_assets = [] for asset in e.subject_assets or []: if asset.status in {"queued", "in_progress"}: recovered_assets.append(asset.model_copy(update={ "status": "failed", "progress": 100, "error": "服务重启 · 上次主体生成已中断,可重新生成", "ai_completed": False, })) subject_generation_interrupted = True else: recovered_assets.append(asset) e.subject_assets = recovered_assets recovered_frames.append(f) if subject_generation_interrupted: update(job, frames=recovered_frames, message="服务重启 · 上次主体生成已中断,可重新生成") video_generation_interrupted = False recovered_videos = [] for video in job.generated_videos: if video.status in {"queued", "in_progress"}: recovered_videos.append(video.model_copy(update={ "status": "failed", "progress": 100, "error": "服务重启 · 上次视频生成已中断,请重新生成", "queue_position": 0, "queue_size": 0, "queue_message": "", })) video_generation_interrupted = True else: recovered_videos.append(video) if video_generation_interrupted: update(job, generated_videos=recovered_videos, message="服务重启 · 上次视频生成已中断,请重新生成") JOBS[p.name] = job except Exception: pass if db_ready: for job in JOBS.values(): db.index_job(job.model_dump(), str(job_dir(job.id) / "state.json")) try: for item in load_prompt_library_items(): db.index_prompt_item(item.model_dump()) for item in load_asset_library_items(): db.index_asset_item(item.model_dump()) except Exception as e: print(f"[db] initial library sync failed: {e}", flush=True) yield app = FastAPI(title="SKG 营销内容工作台 API", lifespan=lifespan) app.add_middleware( CORSMiddleware, allow_origins=CORS_ORIGINS, allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) @app.middleware("http") async def enforce_data_isolation(request: Request, call_next): path = request.url.path if AUTH_DATA_ISOLATION_ENABLED and WEB_AUTH_CONFIGURED: try: user = data_user_from_request(request) except HTTPException: user = None job_id = _extract_protected_job_id(path) if job_id and not user_can_access_job(JOBS.get(job_id), user): return JSONResponse({"detail": "job not found"}, status_code=404) run_match = AGENT_RUN_PATH_RE.match(path) if run_match and not user_can_access_agent_run(run_match.group(1), user): return JSONResponse({"detail": "agent run not found"}, status_code=404) return await call_next(request) @app.get("/auth/check") def auth_check(request: Request) -> Response: ensure_auth_configured() if not auth_session_from_request(request): raise HTTPException(401, "unauthorized") return Response(status_code=204) @app.get("/auth/config") def auth_config() -> dict: return { "ok": True, "auth_configured": WEB_AUTH_CONFIGURED, "password_enabled": PASSWORD_AUTH_CONFIGURED, "feishu_enabled": FEISHU_AUTH_CONFIGURED, "data_isolation_enabled": AUTH_DATA_ISOLATION_ENABLED, } @app.get("/auth/me") def auth_me(request: Request) -> dict: session = auth_session_from_request(request) if not session: raise HTTPException(401, "unauthorized") db.upsert_user(session, request) return {"ok": True, "user": session} @app.post("/auth/login") def auth_login(payload: AuthLoginPayload, request: Request, response: Response) -> dict: ensure_password_auth_configured() username = payload.username.strip() password = payload.password valid_user = hmac.compare_digest(username, WEB_AUTH_USERNAME) valid_password = hmac.compare_digest(password, WEB_AUTH_PASSWORD) if not (valid_user and valid_password): raise HTTPException(401, "用户名或密码不正确") ttl_seconds = 60 * 60 * 24 * 30 if payload.remember else 60 * 60 * 12 response.set_cookie( key=WEB_AUTH_COOKIE_NAME, value=make_auth_token(WEB_AUTH_USERNAME, ttl_seconds), max_age=ttl_seconds, httponly=True, secure=WEB_AUTH_COOKIE_SECURE, samesite="lax", path="/", ) session = _public_session({"u": WEB_AUTH_USERNAME, "name": WEB_AUTH_USERNAME, "provider": "password", "uid": f"password:{WEB_AUTH_USERNAME}"}) db.upsert_user(session, request) db.audit(session, "login.password", "user", session["uid"], request=request) return {"ok": True, "username": WEB_AUTH_USERNAME} @app.get("/auth/feishu/start") def auth_feishu_start(request: Request) -> RedirectResponse: ensure_feishu_auth_configured() next_url = _normalize_next_url(request.query_params.get("next")) state = _feishu_oauth_state(next_url) response = RedirectResponse(_feishu_authorize_url(request, state), status_code=302) response.set_cookie( key=FEISHU_STATE_COOKIE_NAME, value=state, max_age=600, httponly=True, secure=WEB_AUTH_COOKIE_SECURE, samesite="lax", path="/", ) return response @app.get("/auth/feishu/callback") def auth_feishu_callback(request: Request) -> RedirectResponse: ensure_feishu_auth_configured() if request.query_params.get("error"): raise HTTPException(401, f"飞书授权取消或失败:{request.query_params.get('error')}") code = str(request.query_params.get("code") or "").strip() supplied_state = str(request.query_params.get("state") or "").strip() cookie_state = request.cookies.get(FEISHU_STATE_COOKIE_NAME, "") if not code: raise HTTPException(400, "missing feishu code") if not supplied_state or not cookie_state or not hmac.compare_digest(supplied_state, cookie_state): raise HTTPException(401, "invalid feishu state") state_payload = _verify_feishu_oauth_state(supplied_state) if not state_payload: raise HTTPException(401, "expired feishu state") access_token = _exchange_feishu_code(code, _feishu_redirect_uri(request)) session = _build_feishu_session(_fetch_feishu_user(access_token)) _validate_feishu_session(session) db.upsert_user(session, request) db.audit(session, "login.feishu", "user", session["uid"], request=request) ttl_seconds = 60 * 60 * 24 * 30 response = RedirectResponse(_normalize_next_url(str(state_payload.get("next") or "/")), status_code=302) response.set_cookie( key=WEB_AUTH_COOKIE_NAME, value=make_auth_token(session, ttl_seconds), max_age=ttl_seconds, httponly=True, secure=WEB_AUTH_COOKIE_SECURE, samesite="lax", path="/", ) response.delete_cookie( key=FEISHU_STATE_COOKIE_NAME, path="/", secure=WEB_AUTH_COOKIE_SECURE, samesite="lax", ) return response @app.post("/auth/logout") def auth_logout(response: Response) -> dict: response.delete_cookie( key=WEB_AUTH_COOKIE_NAME, path="/", secure=WEB_AUTH_COOKIE_SECURE, samesite="lax", ) return {"ok": True} class CanvasProjectWriteReq(BaseModel): id: str = "" name: str = "未命名项目" thumbnail: str = "" visibility: Literal["private", "team", "company"] = "private" canvas_data: dict = Field(default_factory=dict) created_at: float = 0.0 updated_at: float = 0.0 source: str = "canvas" class CanvasProjectImportReq(BaseModel): projects: list[CanvasProjectWriteReq] = Field(default_factory=list) class CanvasWorkflowWriteReq(BaseModel): id: str = "" name: str = "未命名工作流" description: str = "" thumbnail: str = "" workflow_data: dict = Field(default_factory=dict) created_at: float = 0.0 updated_at: float = 0.0 source: str = "canvas" source_project_id: str = "" def _ts(value) -> float: if hasattr(value, "timestamp"): return float(value.timestamp()) try: return float(value or 0) except (TypeError, ValueError): return 0.0 def _require_db() -> None: if not db.enabled(): raise HTTPException(503, "database not configured") def _canvas_project_public(row: dict) -> dict: return { "id": str(row.get("id") or ""), "name": str(row.get("name") or ""), "thumbnail": str(row.get("thumbnail") or ""), "visibility": str(row.get("visibility") or "private"), "canvas_data": row.get("canvas_data") or {}, "created_at": _ts(row.get("created_at")), "updated_at": _ts(row.get("updated_at")), "version": int(row.get("version") or 1), "owner_id": str(row.get("owner_id") or ""), "owner_name": str(row.get("owner_name") or ""), "owner_email": str(row.get("owner_email") or ""), "owner_provider": str(row.get("owner_provider") or ""), } def _canvas_workflow_public(row: dict) -> dict: return { "id": str(row.get("id") or ""), "name": str(row.get("name") or ""), "description": str(row.get("description") or ""), "thumbnail": str(row.get("thumbnail") or ""), "workflow_data": row.get("workflow_data") or {}, "created_at": _ts(row.get("created_at")), "updated_at": _ts(row.get("updated_at")), "version": int(row.get("version") or 1), "owner_id": str(row.get("owner_id") or ""), "owner_name": str(row.get("owner_name") or ""), "owner_email": str(row.get("owner_email") or ""), "owner_provider": str(row.get("owner_provider") or ""), } @app.get("/canvas-projects") def list_canvas_projects(request: Request) -> dict: _require_db() user = data_user_from_request(request) db.upsert_user(user, request) return { "ok": True, "items": [_canvas_project_public(row) for row in db.list_canvas_projects(user)], } @app.post("/canvas-projects") def create_canvas_project(req: CanvasProjectWriteReq, request: Request) -> dict: _require_db() user = data_user_from_request(request) db.upsert_user(user, request) row = db.upsert_canvas_project(user, req.model_dump()) if not row: raise HTTPException(500, "canvas project save failed") if str(row.get("owner_id") or "") != _session_user_id(user): raise HTTPException(403, "canvas project belongs to another user") db.audit(user, "canvas_project.create", "canvas_project", str(row.get("id") or ""), req.model_dump(exclude={"canvas_data"}), request, str(row.get("visibility") or "private")) return {"ok": True, "item": _canvas_project_public(row)} @app.put("/canvas-projects/{project_id}") def put_canvas_project(project_id: str, req: CanvasProjectWriteReq, request: Request) -> dict: _require_db() user = data_user_from_request(request) db.upsert_user(user, request) payload = req.model_dump() payload["id"] = project_id row = db.upsert_canvas_project(user, payload) if not row: raise HTTPException(500, "canvas project save failed") if str(row.get("owner_id") or "") != _session_user_id(user): raise HTTPException(403, "canvas project belongs to another user") db.audit(user, "canvas_project.save", "canvas_project", project_id, {"name": req.name, "visibility": req.visibility}, request, str(row.get("visibility") or "private")) return {"ok": True, "item": _canvas_project_public(row)} @app.get("/canvas-projects/{project_id}") def get_canvas_project(project_id: str, request: Request) -> dict: _require_db() user = data_user_from_request(request) row = db.get_canvas_project(project_id, user) if not row: raise HTTPException(404, "canvas project not found") return {"ok": True, "item": _canvas_project_public(row)} @app.delete("/canvas-projects/{project_id}") def delete_canvas_project(project_id: str, request: Request) -> dict: _require_db() user = data_user_from_request(request) ok = db.soft_delete_canvas_project(user, project_id) if not ok: raise HTTPException(404, "canvas project not found") db.audit(user, "canvas_project.delete", "canvas_project", project_id, request=request) return {"ok": True, "id": project_id} @app.post("/canvas-projects/import") def import_canvas_projects(req: CanvasProjectImportReq, request: Request) -> dict: _require_db() user = data_user_from_request(request) db.upsert_user(user, request) imported = [] for item in req.projects[:200]: payload = item.model_dump() payload["source"] = "localStorage" row = db.upsert_canvas_project(user, payload) if row: imported.append(_canvas_project_public(row)) db.audit(user, "canvas_project.import", "canvas_project", "", {"count": len(imported)}, request) return {"ok": True, "items": imported} @app.get("/canvas-workflows") def list_canvas_workflows(request: Request) -> dict: _require_db() user = data_user_from_request(request) db.upsert_user(user, request) return { "ok": True, "items": [_canvas_workflow_public(row) for row in db.list_canvas_workflows(user)], } @app.post("/canvas-workflows") def create_canvas_workflow(req: CanvasWorkflowWriteReq, request: Request) -> dict: _require_db() user = data_user_from_request(request) db.upsert_user(user, request) row = db.upsert_canvas_workflow(user, req.model_dump()) if not row: raise HTTPException(500, "canvas workflow save failed") if str(row.get("owner_id") or "") != _session_user_id(user): raise HTTPException(403, "canvas workflow belongs to another user") db.audit(user, "canvas_workflow.create", "canvas_workflow", str(row.get("id") or ""), {"name": req.name, "source_project_id": req.source_project_id}, request, "private") return {"ok": True, "item": _canvas_workflow_public(row)} @app.put("/canvas-workflows/{workflow_id}") def put_canvas_workflow(workflow_id: str, req: CanvasWorkflowWriteReq, request: Request) -> dict: _require_db() user = data_user_from_request(request) db.upsert_user(user, request) payload = req.model_dump() payload["id"] = workflow_id row = db.upsert_canvas_workflow(user, payload) if not row: raise HTTPException(500, "canvas workflow save failed") if str(row.get("owner_id") or "") != _session_user_id(user): raise HTTPException(403, "canvas workflow belongs to another user") db.audit(user, "canvas_workflow.save", "canvas_workflow", workflow_id, {"name": req.name}, request, "private") return {"ok": True, "item": _canvas_workflow_public(row)} @app.delete("/canvas-workflows/{workflow_id}") def delete_canvas_workflow(workflow_id: str, request: Request) -> dict: _require_db() user = data_user_from_request(request) ok = db.soft_delete_canvas_workflow(user, workflow_id) if not ok: raise HTTPException(404, "canvas workflow not found") db.audit(user, "canvas_workflow.delete", "canvas_workflow", workflow_id, request=request, visibility="private") return {"ok": True, "id": workflow_id} def _parse_library_metadata(raw: str) -> dict: if not raw.strip(): return {} try: data = json.loads(raw) return data if isinstance(data, dict) else {} except Exception as e: raise HTTPException(400, f"metadata json invalid: {e}") async def _save_upload_to_path(upload: UploadFile, dst: Path) -> None: dst.parent.mkdir(parents=True, exist_ok=True) data = await upload.read() if not data: raise HTTPException(400, f"{upload.filename or 'file'} is empty") dst.write_bytes(data) def _save_library_image(src: Path, dst: Path) -> AssetLibraryImage: dst.parent.mkdir(parents=True, exist_ok=True) try: with Image.open(src) as im: rgb = im.convert("RGB") rgb.save(dst, "JPEG", quality=94) except Exception: shutil.copy2(src, dst) width, height = _library_media_size(dst) return AssetLibraryImage( id=dst.stem, view=dst.stem, label=dst.stem, filename=str(dst.relative_to(dst.parents[1])), width=width, height=height, created_at=_now_ts(), ) @app.get("/prompt-library", response_model=list[PromptLibraryItem]) def list_prompt_library(category: PromptLibraryCategory | None = None, q: str = "", sort: str = "") -> list[PromptLibraryItem]: items = load_prompt_library_items() if category: items = [item for item in items if item.category == category] items = _prompt_library_search(items, q) if sort == "use_count": items.sort(key=lambda item: (item.use_count, item.updated_at or item.created_at), reverse=True) return items @app.get("/prompt-library/{item_id}", response_model=PromptLibraryItem) def get_prompt_library_item(item_id: str) -> PromptLibraryItem: return find_prompt_library_item(item_id) @app.post("/prompt-library", response_model=PromptLibraryItem) def create_prompt_library_item(req: PromptLibraryWriteReq, request: Request) -> PromptLibraryItem: user = data_user_from_request(request) now = _now_ts() name = req.name.strip() prompt_en = _ensure_english(req.prompt_en.strip()) if req.prompt_en.strip() else "" if not name: raise HTTPException(400, "prompt name required") if not prompt_en and not req.prompt_zh.strip(): raise HTTPException(400, "prompt content required") item = PromptLibraryItem( id=f"lib_prompt_{uuid.uuid4().hex[:12]}", category=req.category, name=name, tags=_safe_tags(req.tags), prompt_en=prompt_en or _ensure_english(req.prompt_zh.strip()), prompt_zh=req.prompt_zh.strip(), source_job_id=req.source_job_id.strip(), created_at=now, updated_at=now, ) _write_prompt_item(item) db.audit(user, "prompt_library.create", "prompt", item.id, {"category": item.category, "name": item.name}, request, "company") return item @app.patch("/prompt-library/{item_id}", response_model=PromptLibraryItem) def patch_prompt_library_item(item_id: str, req: PromptLibraryPatchReq, request: Request) -> PromptLibraryItem: user = data_user_from_request(request) item = find_prompt_library_item(item_id) data = item.model_dump() patch = req.model_dump(exclude_unset=True) if "tags" in patch: patch["tags"] = _safe_tags(patch["tags"]) if "name" in patch: patch["name"] = str(patch["name"]).strip() if "prompt_en" in patch and str(patch["prompt_en"]).strip(): patch["prompt_en"] = _ensure_english(str(patch["prompt_en"]).strip()) data.update(patch) data["updated_at"] = _now_ts() updated = PromptLibraryItem(**data) if not updated.name.strip(): raise HTTPException(400, "prompt name required") _write_prompt_item(updated) db.audit(user, "prompt_library.update", "prompt", item_id, {"fields": sorted(patch.keys())}, request, "company") return updated @app.delete("/prompt-library/{item_id}") def delete_prompt_library_item(item_id: str, request: Request) -> dict: user = data_user_from_request(request) item = find_prompt_library_item(item_id) src = _prompt_item_file(item) trash = LIBRARY_TRASH_DIR / "prompt_library" / f"{item.id}_{int(_now_ts())}.json" trash.parent.mkdir(parents=True, exist_ok=True) shutil.move(str(src), str(trash)) _write_prompt_library_index() db.audit(user, "prompt_library.delete", "prompt", item.id, request=request, visibility="company") return {"ok": True, "id": item.id, "trashed": str(trash)} @app.post("/prompt-library/{item_id}/use", response_model=PromptLibraryItem) def use_prompt_library_item(item_id: str, request: Request) -> PromptLibraryItem: user = data_user_from_request(request) item = find_prompt_library_item(item_id) item.use_count += 1 item.updated_at = _now_ts() _write_prompt_item(item) db.audit(user, "prompt_library.use", "prompt", item.id, request=request, visibility="company") return item @app.get("/asset-library/{kind}", response_model=list[AssetLibraryItem]) def list_asset_library(kind: AssetLibraryKind, q: str = "", sort: str = "") -> list[AssetLibraryItem]: items = _asset_library_search(load_asset_library_items(kind), q) if sort == "use_count": items.sort(key=lambda item: (item.use_count, item.updated_at or item.created_at), reverse=True) return items @app.get("/asset-library/{kind}/{item_id}", response_model=AssetLibraryItem) def get_asset_library_item(kind: AssetLibraryKind, item_id: str) -> AssetLibraryItem: return find_asset_library_item(kind, item_id) @app.post("/asset-library/{kind}", response_model=AssetLibraryItem) async def create_asset_library_item( kind: AssetLibraryKind, request: Request, metadata: str = Form("{}"), files: list[UploadFile] = File(default=[]), ) -> AssetLibraryItem: user = data_user_from_request(request) meta = _parse_library_metadata(metadata) if not files: raise HTTPException(400, "at least one file required") now = _now_ts() item_id = f"lib_{kind}_{uuid.uuid4().hex[:12]}" item_dir = _asset_library_item_dir(kind, item_id) tmp_dir = item_dir / "_tmp" item_dir.mkdir(parents=True, exist_ok=True) tmp_dir.mkdir(parents=True, exist_ok=True) name = str(meta.get("name") or meta.get("name_zh") or kind).strip() item = AssetLibraryItem( id=item_id, kind=kind, name=name, name_zh=str(meta.get("name_zh") or "").strip(), note=str(meta.get("note") or "").strip(), tags=_safe_tags(meta.get("tags")), source_job_id=str(meta.get("source_job_id") or "").strip(), is_official=bool(meta.get("is_official") or False), prompt_brief=str(meta.get("prompt_brief") or "").strip(), prompt_brief_zh=str(meta.get("prompt_brief_zh") or "").strip(), subject_style=str(meta.get("subject_style") or "transparent_human") if kind == "subjects" else "transparent_human", product_type=str(meta.get("product_type") or "").strip(), asset_role=str(meta.get("asset_role") or "").strip(), aspect_ratio=str(meta.get("aspect_ratio") or "").strip(), created_at=now, updated_at=now, ) requested_views = [str(v).strip() for v in (meta.get("views") or []) if str(v).strip()] saved_images: list[AssetLibraryImage] = [] for index, upload in enumerate(files): suffix = Path(upload.filename or "").suffix.lower() or ".jpg" tmp = tmp_dir / f"{index}{suffix}" await _save_upload_to_path(upload, tmp) if kind == "videos": if suffix in {".jpg", ".jpeg", ".png", ".webp"}: image = _save_library_image(tmp, item_dir / "poster.jpg") image.id = "poster" image.view = "poster" image.label = "poster" image.filename = "poster.jpg" item.poster = image else: shutil.copy2(tmp, item_dir / "video.mp4") item.video_url = f"/asset-library/videos/{item_id}/file/video.mp4" continue view = requested_views[index] if index < len(requested_views) else (Path(upload.filename or "").stem or f"view_{index + 1}") safe_view = re.sub(r"[^a-zA-Z0-9_-]+", "_", view).strip("_") or f"view_{index + 1}" image = _save_library_image(tmp, item_dir / "images" / f"{safe_view}.jpg") image.id = safe_view image.view = safe_view image.label = view image.filename = f"images/{safe_view}.jpg" saved_images.append(image) shutil.rmtree(tmp_dir, ignore_errors=True) if kind == "subjects": item.images = saved_images item.views = saved_images item.prompt_brief = _ensure_english(item.prompt_brief) if item.prompt_brief else item.note elif kind == "products": item.views = saved_images item.images = saved_images elif kind == "scenes": item.image = saved_images[0] if saved_images else None if item.image: item.image.id = "image" item.image.view = "scene" item.image.label = "scene" elif kind == "videos" and not item.video_url: raise HTTPException(400, "video file required") _hydrate_asset_library_urls(item) _write_asset_item(item) db.audit(user, "asset_library.create", "asset_library", item.id, {"kind": kind, "name": item.name}, request, "company") return item @app.patch("/asset-library/{kind}/{item_id}", response_model=AssetLibraryItem) def patch_asset_library_item(kind: AssetLibraryKind, item_id: str, req: AssetLibraryPatchReq, request: Request) -> AssetLibraryItem: user = data_user_from_request(request) item = find_asset_library_item(kind, item_id) data = item.model_dump() patch = req.model_dump(exclude_unset=True) if "tags" in patch: patch["tags"] = _safe_tags(patch["tags"]) data.update(patch) data["updated_at"] = _now_ts() updated = AssetLibraryItem(**data) _hydrate_asset_library_urls(updated) _write_asset_item(updated) db.audit(user, "asset_library.update", "asset_library", item_id, {"kind": kind, "fields": sorted(patch.keys())}, request, "company") return updated @app.get("/asset-library/{kind}/{item_id}/refs") def asset_library_refs(kind: AssetLibraryKind, item_id: str) -> dict: find_asset_library_item(kind, item_id) return _library_ref_usage(kind, item_id) @app.delete("/asset-library/{kind}/{item_id}") def delete_asset_library_item(kind: AssetLibraryKind, item_id: str, request: Request, force: bool = False) -> dict: user = data_user_from_request(request) item = find_asset_library_item(kind, item_id) refs = _library_ref_usage(kind, item_id) if refs["count"] and not force: raise HTTPException(409, {"message": "asset library item is referenced", **refs}) src = _asset_library_item_dir(kind, item.id) trash = LIBRARY_TRASH_DIR / "asset_library" / kind / f"{item.id}_{int(_now_ts())}" trash.parent.mkdir(parents=True, exist_ok=True) shutil.move(str(src), str(trash)) _write_asset_library_index() db.audit(user, "asset_library.delete", "asset_library", item.id, {"kind": kind, "refs": refs}, request, "company") return {"ok": True, "id": item.id, "refs": refs, "trashed": str(trash)} @app.post("/asset-library/{kind}/{item_id}/copy-to-job/{job_id}") def copy_asset_library_to_job(kind: AssetLibraryKind, item_id: str, job_id: str, request: Request) -> dict: user = data_user_from_request(request) db.audit(user, "asset_library.copy_to_job", "asset_library", item_id, {"kind": kind, "job_id": job_id}, request, "company") return _copy_library_to_job(kind, item_id, job_id) @app.get("/asset-library/{kind}/{item_id}/file/{filename:path}") def get_asset_library_file(kind: AssetLibraryKind, item_id: str, filename: str): item = find_asset_library_item(kind, item_id) p = _library_item_file_path(item, filename) suffix = p.suffix.lower() if suffix == ".mp4": return FileResponse(p, media_type="video/mp4") if suffix == ".png": return FileResponse(p, media_type="image/png") if suffix == ".webp": return FileResponse(p, media_type="image/webp") return FileResponse(p, media_type="image/jpeg") @app.get("/resource-library/recent") def resource_library_recent(hours: int = 24) -> dict: cutoff = _now_ts() - max(1, min(hours, 24 * 30)) * 3600 prompts = [ {"type": "prompt", "category": item.category, "id": item.id, "name": item.name, "created_at": item.created_at, "item": item.model_dump()} for item in load_prompt_library_items() if item.created_at >= cutoff ] assets = [ {"type": "asset", "kind": item.kind, "id": item.id, "name": item.name, "created_at": item.created_at, "item": item.model_dump()} for item in load_asset_library_items() if item.created_at >= cutoff ] items = sorted(prompts + assets, key=lambda item: item.get("created_at", 0), reverse=True) return {"items": items} # ---------- Pipeline 实现 ---------- def _binary_works(path: str) -> bool: if not path: return False if os.path.sep in path and not Path(path).exists(): return False try: res = subprocess.run([path, "-version"], capture_output=True, text=True, timeout=5) return res.returncode == 0 except Exception: return False def media_binary(name: Literal["ffmpeg", "ffprobe"]) -> str: cached = _MEDIA_BIN_CACHE.get(name) if cached: return cached env_bin = FFMPEG_BIN if name == "ffmpeg" else FFPROBE_BIN candidates: list[str] = [] if env_bin: candidates.append(env_bin) found = shutil.which(name) if found: candidates.append(found) if name == "ffmpeg": candidates.extend(LOCAL_FFMPEG_CANDIDATES) for candidate in candidates: if _binary_works(candidate): _MEDIA_BIN_CACHE[name] = candidate return candidate raise RuntimeError(f"{name} 不可用,请配置 {name.upper()}_BIN 或修复本机 ffmpeg 安装") def _normalize_media_cmd(cmd: list[str]) -> list[str]: if not cmd: return cmd if cmd[0] == "ffmpeg": return [media_binary("ffmpeg"), *cmd[1:]] if cmd[0] == "ffprobe": return [media_binary("ffprobe"), *cmd[1:]] return cmd def run(cmd: list[str], cwd: Path | None = None) -> str: cmd = _normalize_media_cmd(cmd) res = subprocess.run(cmd, cwd=cwd, capture_output=True, text=True) if res.returncode != 0: # ffmpeg 把 banner 写 stderr,挑最后几行(真错误一般在末尾) tail = "\n".join(res.stderr.splitlines()[-12:]) or res.stderr[-800:] raise RuntimeError(f"cmd failed: {' '.join(cmd[:3])}... · {tail}") return res.stdout def ytdlp_cookie_args() -> list[str]: if YTDLP_COOKIES_FILE: cookies = Path(YTDLP_COOKIES_FILE).expanduser() if not cookies.exists(): raise RuntimeError("TikTok cookies 文件不可用,请检查 YTDLP_COOKIES_FILE 配置。") return ["--cookies", str(cookies)] if YTDLP_COOKIES_FROM_BROWSER: return ["--cookies-from-browser", YTDLP_COOKIES_FROM_BROWSER] return [] def normalize_download_error(error: Exception) -> str: raw = str(error) lower = raw.lower() auth_required = ( "log in for access" in lower or "login" in lower and "cookies" in lower or "cookies-from-browser" in lower or "sign in" in lower and "tiktok" in lower ) if auth_required: return ( "TikTok 下载需要登录态。请上传视频文件,或在后端配置 " "YTDLP_COOKIES_FILE / YTDLP_COOKIES_FROM_BROWSER 后重试。" f"原始错误:{raw}" ) return raw # ---- 启发式选帧工具 ---- import imagehash import numpy as np from PIL import Image, ImageChops, ImageEnhance, ImageFilter, ImageOps def _sharpness_from_gray(g: np.ndarray) -> float: """Laplacian variance:值越大越清晰,模糊/转场帧值低。""" lap = (-4 * g[1:-1, 1:-1] + g[:-2, 1:-1] + g[2:, 1:-1] + g[1:-1, :-2] + g[1:-1, 2:]) return float(lap.var()) def _frame_metrics(img_path: Path, idx: int, timestamp: float, metric_width: int = 160) -> dict | None: """低清候选帧的本地评分特征。只用于排序,最终仍从原视频抽原尺寸帧。""" try: with Image.open(img_path) as raw: img = raw.convert("RGB") h = imagehash.phash(img) src_w, src_h = img.size metric_height = max(1, round(metric_width * src_h / max(src_w, 1))) small = img.resize((metric_width, metric_height)) except Exception: return None arr = np.asarray(small, dtype=np.float32) # Rec. 601 luma,保留 0-255 范围,便于和清晰度 / 对比度阈值一起看。 gray = (0.299 * arr[:, :, 0] + 0.587 * arr[:, :, 1] + 0.114 * arr[:, :, 2]).astype(np.float32) gh, gw = gray.shape center = gray[gh // 4:max(gh // 4 + 1, gh * 3 // 4), gw // 4:max(gw // 4 + 1, gw * 3 // 4)] rg = arr[:, :, 0] - arr[:, :, 1] yb = 0.5 * (arr[:, :, 0] + arr[:, :, 1]) - arr[:, :, 2] colorfulness = float(np.sqrt(rg.var() + yb.var()) + 0.3 * np.sqrt(rg.mean() ** 2 + yb.mean() ** 2)) return { "path": img_path, "idx": idx, "timestamp": timestamp, "hash": h, "gray": gray, "sharp": _sharpness_from_gray(gray), "center_sharp": _sharpness_from_gray(center), "brightness": float(gray.mean()), "contrast": float(gray.std()), "colorfulness": colorfulness, "scene_score": 0.0, "motion": 0.0, } def _physical_memory_gb() -> float: try: page_size = os.sysconf("SC_PAGE_SIZE") pages = os.sysconf("SC_PHYS_PAGES") return float(page_size * pages) / (1024 ** 3) except Exception: return 0.0 def _resolve_frame_quality(duration: float, quality: FrameExtractQuality) -> FrameExtractQuality: if quality != "auto": return quality cores = os.cpu_count() or 4 memory_gb = _physical_memory_gb() strong_machine = cores >= 10 and (memory_gb == 0.0 or memory_gb >= 32) # 展示/演示时不能把本机资源打满:auto 最高只到 accurate。 # ultra 保留为手动选择项,不再由 auto 自动命中。 if strong_machine and duration <= 600: return "accurate" if cores >= 8 and duration <= 240: return "accurate" return "fast" def _scan_profile(duration: float, quality: FrameExtractQuality) -> tuple[float, int, int, int]: """返回 scan_fps / scan_width / metric_width / estimated_count。""" if quality == "ultra": base_fps, scan_width, cap, metric_width = 12.0, 960, 1800, 320 elif quality == "accurate": base_fps, scan_width, cap, metric_width = 8.0, 720, 900, 240 else: base_fps, scan_width, cap, metric_width = 2.0, 360, 240, 160 estimated = max(1, min(int(duration * base_fps), cap)) scan_fps = max(0.02, min(base_fps, estimated / max(duration, 0.1))) return scan_fps, scan_width, metric_width, estimated def _image_quality_report(img_path: Path, region: dict | None = None) -> QualityReport: warnings: list[str] = [] try: with Image.open(img_path) as raw: img = raw.convert("RGB") width, height = img.size metric_width = min(512, width) metric_height = max(1, round(metric_width * height / max(width, 1))) small = img.resize((metric_width, metric_height)) gray = np.asarray(ImageOps.grayscale(small), dtype=np.float32) sharp = _sharpness_from_gray(gray) except Exception: return QualityReport(risk="bad", warnings=["无法读取图片质量信息"]) short_side = min(width, height) if short_side < 720: warnings.append(f"短边 {short_side}px 低于 720px,生视频可能偏糊") if sharp < 30: warnings.append("清晰度偏低,高清增强后仍可能有细节损失") if region: try: rw = int(float(region.get("w", 0)) * width) rh = int(float(region.get("h", 0)) * height) if min(rw, rh) < 512: warnings.append(f"主体框约 {rw}×{rh}px,主体素材偏小") except Exception: pass risk: Literal["ok", "warn", "bad"] = "ok" if any("低于" in w or "偏小" in w for w in warnings): risk = "warn" if short_side < 480 or sharp < 12: risk = "bad" return QualityReport(width=width, height=height, short_side=short_side, sharpness=round(sharp, 2), risk=risk, warnings=warnings) def _asset_target_size(source_path: Path, size: AssetSize, square: bool = False) -> tuple[int, int]: try: with Image.open(source_path) as raw: src_w, src_h = raw.size except Exception: src_w, src_h = 1024, 1024 if size == "source": return max(1, src_w), max(1, src_h) side = int(size) if square: return side, side if src_w >= src_h: return side, max(1, round(side * src_h / max(src_w, 1))) return max(1, round(side * src_w / max(src_h, 1))), side def _normalize_asset_image( img_bytes: bytes, out_path: Path, source_path: Path, size: AssetSize, background: AssetBackground = "white", square: bool = False, fill_subject: bool = False, ) -> tuple[int, int]: import io as _io target_w, target_h = _asset_target_size(source_path, size, square=square) bg = (255, 255, 255) if background == "white" else (0, 0, 0) out_path.parent.mkdir(parents=True, exist_ok=True) with Image.open(_io.BytesIO(img_bytes)) as raw: img = raw.convert("RGB") if fill_subject: diff = ImageChops.difference(img, Image.new("RGB", img.size, bg)) mask = diff.convert("L").point(lambda px: 255 if px > 18 else 0) bbox = mask.getbbox() if bbox: left, top, right, bottom = bbox pad_x = round((right - left) * 0.04) pad_y = round((bottom - top) * 0.03) img = img.crop(( max(0, left - pad_x), max(0, top - pad_y), min(img.width, right + pad_x), min(img.height, bottom + pad_y), )) max_w = max(1, round(target_w * 0.92)) max_h = max(1, round(target_h * 0.96)) if img.width and img.height: scale = min(max_w / img.width, max_h / img.height) if scale > 0: next_size = (max(1, round(img.width * scale)), max(1, round(img.height * scale))) if next_size != img.size: img = img.resize(next_size, Image.Resampling.LANCZOS) else: img.thumbnail((target_w, target_h), Image.Resampling.LANCZOS) canvas = Image.new("RGB", (target_w, target_h), bg) canvas.paste(img, ((target_w - img.width) // 2, (target_h - img.height) // 2)) canvas.save(out_path, "JPEG", quality=95) return target_w, target_h def _asset_url(job_id: str, asset_id: str) -> str: return f"/jobs/{job_id}/assets/{asset_id}.jpg" def _delete_subject_asset_file(job_id: str, asset_id: str) -> None: if not asset_id: return p = job_dir(job_id) / "assets" / f"{asset_id}.jpg" if p.exists(): try: p.unlink() except OSError: pass def _find_frame(job: Job, idx: int) -> KeyFrame: frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") return frame def _source_frame_path(job_id: str, idx: int) -> Path: cleaned_path = job_dir(job_id) / "cleaned" / f"{idx:03d}.jpg" if cleaned_path.exists(): return cleaned_path return job_dir(job_id) / "frames" / f"{idx:03d}.jpg" def _focus_source_for_element(job_id: str, idx: int, el: KeyElement) -> tuple[Path, Path | None]: import tempfile as _tempfile src = _source_frame_path(job_id, idx) tmp_focus: Path | None = None model_src = src if not el.region: return model_src, tmp_focus try: im = Image.open(src).convert("RGB") W, H = im.size r = el.region x = max(0.0, min(1.0, float(r.get("x", 0)))) y = max(0.0, min(1.0, float(r.get("y", 0)))) w = max(0.0, min(1.0 - x, float(r.get("w", 0)))) h = max(0.0, min(1.0 - y, float(r.get("h", 0)))) cx, cy = x + w / 2, y + h / 2 ew, eh = w * 1.6, h * 1.6 x0 = max(0.0, cx - ew / 2); y0 = max(0.0, cy - eh / 2) x1 = min(1.0, cx + ew / 2); y1 = min(1.0, cy + eh / 2) left, top, right, bottom = int(x0 * W), int(y0 * H), int(x1 * W), int(y1 * H) if right - left > 8 and bottom - top > 8: cropped = im.crop((left, top, right, bottom)) tmp = _tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) cropped.save(tmp.name, format="JPEG", quality=92) tmp.close() tmp_focus = Path(tmp.name) model_src = tmp_focus except Exception as e: print(f"[focus source crop failed, fallback to full frame] {e}", flush=True) return model_src, tmp_focus def _make_reference_contact_sheet(job_id: str, frame_indices: list[int], out_path: Path, max_items: int = 6) -> Path | None: paths: list[Path] = [] seen: set[int] = set() max_items = max(2, min(12, int(max_items or 6))) for idx in frame_indices: if idx in seen: continue seen.add(idx) p = _source_frame_path(job_id, idx) if p.exists(): paths.append(p) if len(paths) >= max_items: break if len(paths) <= 1: return None thumbs: list[Image.Image] = [] for p in paths: try: im = Image.open(p).convert("RGB") im.thumbnail((420, 420), Image.Resampling.LANCZOS) canvas = Image.new("RGB", (420, 420), (245, 245, 245)) canvas.paste(im, ((420 - im.width) // 2, (420 - im.height) // 2)) thumbs.append(canvas) except Exception: continue if len(thumbs) <= 1: return None cols = 4 if len(thumbs) > 6 else (3 if len(thumbs) > 2 else 2) rows = (len(thumbs) + cols - 1) // cols sheet = Image.new("RGB", (cols * 420, rows * 420), (245, 245, 245)) for i, thumb in enumerate(thumbs): sheet.paste(thumb, ((i % cols) * 420, (i // cols) * 420)) out_path.parent.mkdir(parents=True, exist_ok=True) sheet.save(out_path, "JPEG", quality=92) return out_path def _make_paths_contact_sheet(paths: list[Path], out_path: Path, max_items: int = 10) -> Path | None: usable: list[Path] = [] seen: set[str] = set() max_items = max(2, min(12, int(max_items or 10))) for p in paths: key = str(p) if key in seen or not p.exists(): continue seen.add(key) usable.append(p) if len(usable) >= max_items: break if len(usable) <= 1: return usable[0] if usable else None thumbs: list[Image.Image] = [] for p in usable: try: im = Image.open(p).convert("RGB") im.thumbnail((420, 420), Image.Resampling.LANCZOS) canvas = Image.new("RGB", (420, 420), (245, 245, 245)) canvas.paste(im, ((420 - im.width) // 2, (420 - im.height) // 2)) thumbs.append(canvas) except Exception: continue if len(thumbs) <= 1: return usable[0] if usable else None cols = 4 if len(thumbs) > 6 else (3 if len(thumbs) > 2 else 2) rows = (len(thumbs) + cols - 1) // cols sheet = Image.new("RGB", (cols * 420, rows * 420), (245, 245, 245)) for i, thumb in enumerate(thumbs): sheet.paste(thumb, ((i % cols) * 420, (i // cols) * 420)) out_path.parent.mkdir(parents=True, exist_ok=True) sheet.save(out_path, "JPEG", quality=92) return out_path SUBJECT_VIEW_LABELS: dict[str, str] = { "front": "正面", "back": "背面", "left": "左侧", "right": "右侧", "three_quarter_left": "左前 45°", "three_quarter_right": "右前 45°", "side": "侧面", "side_walk": "侧面走路", "top": "正投影俯视图", "bottom": "正投影仰视图", "expression_neutral": "中性表情", "expression_smile": "微笑表情", "expression_happy": "开心表情", "expression_angry": "生气表情", "expression_sad": "难过表情", "expression_relaxed": "放松表情", "expression_serious": "严肃表情", "expression_surprised": "惊讶表情", "action_walk": "走路动作", "action_turn": "转身动作", "action_sit": "坐姿动作", "action_hold": "手持动作", "action_use": "使用动作", "bust_front": "肩颈半身正面近景", "bust_left_45": "肩颈左前 45° 近景", "bust_right_45": "肩颈右前 45° 近景", "back_neck_detail": "后颈/肩背特写", "bust": "半身近景", "back_detail": "背部特写", } OBJECT_PATENT_VIEW_LABELS: dict[str, str] = { "front": "正投影主视图", "back": "正投影后视图", "left": "正投影左视图", "right": "正投影右视图", "top": "正投影俯视图", "bottom": "正投影仰视图", } def _subject_view_labels(kind: SubjectKind, requested: list[str] | None = None) -> list[tuple[SubjectView, str]]: if requested: normalized: list[str] = [] for raw in requested: key = "".join(ch for ch in str(raw).strip().lower() if ch.isalnum() or ch == "_") if key and key not in normalized: normalized.append(key) labels = OBJECT_PATENT_VIEW_LABELS if kind == "object" else SUBJECT_VIEW_LABELS return [(key, labels.get(key, SUBJECT_VIEW_LABELS.get(key, key.replace("_", " ")))) for key in normalized[:10]] if kind == "living": return [ ("front", "正面站立"), ("three_quarter_left", "左前 45° 站立"), ("left", "左侧站立"), ("back", "背面站立"), ("right", "右侧站立"), ("three_quarter_right", "右前 45° 站立"), ("bust_front", "肩颈半身正面近景"), ("bust_left_45", "肩颈左前 45° 近景"), ("bust_right_45", "肩颈右前 45° 近景"), ("back_neck_detail", "后颈/肩背特写"), ] return [ ("front", OBJECT_PATENT_VIEW_LABELS["front"]), ("back", OBJECT_PATENT_VIEW_LABELS["back"]), ("left", OBJECT_PATENT_VIEW_LABELS["left"]), ("right", OBJECT_PATENT_VIEW_LABELS["right"]), ("top", OBJECT_PATENT_VIEW_LABELS["top"]), ("bottom", OBJECT_PATENT_VIEW_LABELS["bottom"]), ] def _subject_view_projection_clause(view: str) -> str: if view == "front": return ( "Patent-style orthographic main/front elevation view: look straight at the designated main face, " "with the viewing direction perpendicular to that face. No perspective, no tilt, no 3/4 angle, no isometric view. " ) if view == "back": return ( "Patent-style orthographic rear elevation view: look straight at the rear face, " "with the viewing direction perpendicular to that face. No perspective, no tilt, no 3/4 angle, no isometric view. " ) if view == "left": return ( "Patent-style orthographic left side elevation view: look straight at the product's left side, " "with the viewing direction perpendicular to that side face. No perspective, no tilt, no 3/4 angle, no isometric view. " ) if view == "right": return ( "Patent-style orthographic right side elevation view: look straight at the product's right side, " "with the viewing direction perpendicular to that side face. No perspective, no tilt, no 3/4 angle, no isometric view. " ) if view == "top": return ( "Patent-style orthographic top view: look straight down from directly above the product, " "with the viewing direction perpendicular to the top face. No perspective, no tilt, no 3/4 angle, " "no oblique overhead camera, no visible front/side depth unless it is true product thickness in orthographic projection. " ) if view == "bottom": return ( "Patent-style orthographic bottom view: look straight up at the underside/bottom face, " "with the viewing direction perpendicular to the bottom face. No perspective, no tilt, no 3/4 angle, " "no low-angle perspective camera, no visible front/side depth unless it is true product thickness in orthographic projection. " ) return "" def _attach_temporal_metrics(items: list[dict]) -> None: """相邻低清帧差异:转场 / 动作目标依赖它,不需要逐帧高分辨率扫描。""" for i, it in enumerate(items): prev_delta = 0.0 next_delta = 0.0 if i > 0: prev_delta = float(np.mean(np.abs(it["gray"] - items[i - 1]["gray"])) / 255.0) if i + 1 < len(items): next_delta = float(np.mean(np.abs(items[i + 1]["gray"] - it["gray"])) / 255.0) it["scene_score"] = max(prev_delta, next_delta) it["motion"] = (prev_delta + next_delta) / 2.0 def _normalize_item_metrics(items: list[dict]) -> None: for key in ("sharp", "center_sharp", "contrast", "colorfulness", "scene_score", "motion"): vals = [float(it.get(key, 0.0)) for it in items if float(it.get(key, 0.0)) > 0] cap = float(np.percentile(vals, 95)) if vals else 1.0 if cap <= 0: cap = 1.0 for it in items: it[f"{key}_n"] = min(float(it.get(key, 0.0)) / cap, 1.0) def _target_score(item: dict, target: FrameExtractTarget) -> float: sharp = float(item.get("sharp_n", 0.0)) center = float(item.get("center_sharp_n", 0.0)) contrast = float(item.get("contrast_n", 0.0)) color = float(item.get("colorfulness_n", 0.0)) scene = float(item.get("scene_score_n", 0.0)) motion = float(item.get("motion_n", 0.0)) if target == "transparent_human": # 当前抽帧阶段走本地算力:优先清晰中心主体、高对比、适度色彩和时间覆盖。 # 透明骨架人的语义判断留给后续审核/识别,不在抽帧阶段逐帧调用 Vision。 score = center * 0.45 + sharp * 0.30 + contrast * 0.15 + color * 0.10 elif target == "subject": score = center * 0.48 + sharp * 0.25 + contrast * 0.17 + color * 0.10 elif target == "transition": score = scene * 0.55 + sharp * 0.28 + contrast * 0.12 + color * 0.05 elif target == "expression": # 没有额外视觉模型时,表情/动物瞬间只能用中心细节 + 清晰 + 轻微动作变化做本地近似。 score = center * 0.40 + sharp * 0.24 + motion * 0.18 + contrast * 0.12 + color * 0.06 elif target == "motion": score = motion * 0.45 + sharp * 0.30 + center * 0.15 + contrast * 0.10 else: score = sharp * 0.45 + scene * 0.22 + center * 0.15 + contrast * 0.12 + color * 0.06 brightness = float(item.get("brightness", 0.0)) raw_contrast = float(item.get("contrast", 0.0)) if raw_contrast < 4 or brightness < 8 or brightness > 247: return score * 0.15 if raw_contrast < 9: return score * 0.65 return score def _select_keyframes(candidates: list[dict], n: int, target: FrameExtractTarget, dup_threshold: int = 8) -> list[dict]: """ candidates: 按时间排序的低清候选帧评分项 n: 目标帧数 dup_threshold: pHash 汉明距离 < 此值视为相似(默认 8,64bit hash 大致 ~12.5% 像素差) """ if len(candidates) <= n: return candidates _attach_temporal_metrics(candidates) _normalize_item_metrics(candidates) for it in candidates: it["score"] = _target_score(it, target) # 去重:相似帧保留当前目标下分数更高的 deduped: list[dict] = [] for it in candidates: dup = None for kept in deduped: if (it["hash"] - kept["hash"]) < dup_threshold: dup = kept break if dup is None: deduped.append(it) elif it["score"] > dup["score"]: deduped[deduped.index(dup)] = it # 时序分桶:把候选时间轴等分 n 段,每段取当前目标下最优的 total = len(candidates) buckets: list[list[dict]] = [[] for _ in range(n)] for it in deduped: b = min(int(it["idx"] * n / total), n - 1) buckets[b].append(it) selected: list[dict] = [] for b in buckets: if b: selected.append(max(b, key=lambda x: x["score"])) # 空桶补足:从未选的 deduped 里按目标分数补 chosen_paths = {it["path"] for it in selected} remaining = sorted([it for it in deduped if it["path"] not in chosen_paths], key=lambda x: -x["score"]) while len(selected) < n and remaining: selected.append(remaining.pop(0)) # 按时间排序输出 selected.sort(key=lambda x: x["idx"]) return selected def _rank_keyframe_candidates(candidates: list[dict], target: FrameExtractTarget, limit: int, dup_threshold: int = 8) -> list[dict]: if not candidates: return [] _attach_temporal_metrics(candidates) _normalize_item_metrics(candidates) for it in candidates: it["score"] = _target_score(it, target) deduped: list[dict] = [] for it in sorted(candidates, key=lambda x: -float(x.get("score", 0.0))): if any((it["hash"] - kept["hash"]) < dup_threshold for kept in deduped): continue deduped.append(it) if len(deduped) >= limit: break return deduped def _score_transparent_human_frame(img_path: Path) -> TransparentHumanFrameScore: if not LLM_API_KEY: return TransparentHumanFrameScore( qualified=False, reject_reason="LLM_API_KEY 未配置,无法进行透明骨架人语义验收", ) img_b64 = base64.b64encode(img_path.read_bytes()).decode("ascii") prompt = ( "You are a strict keyframe quality inspector for a SKG transparent-human video recreation workflow. " + TRANSPARENT_HUMAN_POSITIVE_PROMPT + " " + TRANSPARENT_HUMAN_NEGATIVE_PROMPT + " " + TRANSPARENT_HUMAN_QUALIFIED_STANDARD + "\n\n" "Score this single frame using exactly these dimensions:\n" "- transparent_body_score: 0-25, clear transparent/translucent outer human body shell.\n" "- skeleton_visible_score: 0-25, clean white skeleton clearly visible inside the body.\n" "- human_prominence_score: 0-15, character centered/large/easy to identify, ideally >=35% frame height.\n" "- clarity_score: 0-15, no severe motion blur, occlusion, or deformation.\n" "- commercial_style_score: 0-10, clean premium non-horror advertising/wellness style.\n" "- product_usefulness_score: 0-10, useful for later SKG product video generation; neck/shoulder/waist/eye/foot/knee area visible when relevant.\n" "Reject if any of these is true: normal human only; ordinary skeleton only; product/background only; transparent person too far; severe blur; more than half occluded; horror/corpse/autopsy/surgery/hospital; unable to judge.\n" "Output strict JSON only with keys: transparent_body_score, skeleton_visible_score, human_prominence_score, clarity_score, commercial_style_score, product_usefulness_score, qualified, reject_reason, notes." ) try: resp = llm().chat.completions.create( model=VISION_MODEL, messages=[{"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}}, ]}], response_format={"type": "json_object"}, temperature=0.1, max_tokens=1200, ) raw = (resp.choices[0].message.content or "").strip() if not raw: raw = (getattr(resp.choices[0].message, "reasoning_content", "") or "").strip() import re as _re match = _re.search(r"\{[\s\S]*\}", raw) raw = match.group(0) if match else raw data = json.loads(raw) except Exception as e: return TransparentHumanFrameScore(qualified=False, reject_reason=f"AI 评分失败:{e}") def score(name: str, cap: int) -> int: try: value = int(round(float(data.get(name, 0)))) except Exception: value = 0 return max(0, min(cap, value)) item = TransparentHumanFrameScore( transparent_body_score=score("transparent_body_score", 25), skeleton_visible_score=score("skeleton_visible_score", 25), human_prominence_score=score("human_prominence_score", 15), clarity_score=score("clarity_score", 15), commercial_style_score=score("commercial_style_score", 10), product_usefulness_score=score("product_usefulness_score", 10), reject_reason=str(data.get("reject_reason", "") or ""), notes=str(data.get("notes", "") or ""), ) item.total_score = ( item.transparent_body_score + item.skeleton_visible_score + item.human_prominence_score + item.clarity_score + item.commercial_style_score + item.product_usefulness_score ) item.qualified = bool(data.get("qualified")) and ( item.transparent_body_score >= 18 and item.skeleton_visible_score >= 18 and item.human_prominence_score >= 8 and item.clarity_score >= 8 and item.commercial_style_score >= 6 and item.product_usefulness_score >= 4 and item.total_score >= 72 ) if not item.qualified and not item.reject_reason: item.reject_reason = f"透明骨架人评分不足,总分 {item.total_score}/100" return item def _duration_from_text(text: str) -> float: m = re.search(r"Duration:\s*(\d+):(\d+):(\d+(?:\.\d+)?)", text) if not m: return 0.0 hours, minutes, seconds = m.groups() return int(hours) * 3600 + int(minutes) * 60 + float(seconds) def _ffmpeg_probe_text(path: Path) -> str: ffmpeg = media_binary("ffmpeg") res = subprocess.run([ffmpeg, "-hide_banner", "-i", str(path)], capture_output=True, text=True) text = "\n".join(part for part in [res.stdout, res.stderr] if part) if "Input #0" not in text: tail = "\n".join(text.splitlines()[-12:]) raise RuntimeError(f"ffmpeg 读取媒体失败:{tail}") return text def _ffmpeg_meta_fallback(path: Path) -> dict: text = _ffmpeg_probe_text(path) duration = _duration_from_text(text) streams: list[dict] = [] for line in text.splitlines(): if " Video:" not in line: continue m = re.search(r"(? dict: try: out = run([ "ffprobe", "-v", "error", "-print_format", "json", "-show_streams", "-show_format", str(mp4), ]) return json.loads(out) except Exception: return _ffmpeg_meta_fallback(mp4) def media_duration(path: Path) -> float: try: out = run([ "ffprobe", "-v", "error", "-print_format", "json", "-show_format", str(path), ]) return float(json.loads(out).get("format", {}).get("duration") or 0) except Exception: try: return _duration_from_text(_ffmpeg_probe_text(path)) except Exception: return 0.0 def pipeline_download(job_id: str) -> None: """阶段 1:仅下载(或上传跳过),落 source.mp4;前端开始流程会在 downloaded 后触发音频解析。""" job = JOBS[job_id] d = job_dir(job_id) stage = "download" try: mp4 = d / "source.mp4" if mp4.exists(): update(job, status="downloading", message="本地上传 · 跳过下载", progress=15) else: update(job, status="downloading", message="yt-dlp 下载中…", progress=5) cmd = [ "yt-dlp", "-f", "best[ext=mp4]/best", "-o", str(mp4), "--no-warnings", "--no-playlist", "--retries", "3", *ytdlp_cookie_args(), job.url, ] run(cmd) if not mp4.exists(): raise RuntimeError("下载完成但找不到 source.mp4") stage = "metadata" meta = ffprobe_meta(mp4) v_stream = next((s for s in meta["streams"] if s["codec_type"] == "video"), None) duration = float(meta["format"]["duration"]) if duration <= 0: raise RuntimeError("视频时长读取失败") update( job, status="downloaded", video_url=f"/jobs/{job_id}/video.mp4", duration=duration, width=int(v_stream["width"]) if v_stream else 0, height=int(v_stream["height"]) if v_stream else 0, progress=25, error="", message=f"视频就绪 · {duration:.1f}s · 等待音频解析", ) except Exception as e: message = "视频元数据解析失败" if stage == "metadata" else "下载失败" update(job, status="failed", error=normalize_download_error(e), message=message) def pipeline_analyze( job_id: str, frame_count: int = KEYFRAME_COUNT, target: FrameExtractTarget = "transparent_human", mode: FrameExtractMode = "replace", quality: FrameExtractQuality = "auto", ) -> None: """阶段 2:拆音轨 + 抽关键帧。ASR/翻译是独立文案轨,不阻塞视觉素材流。""" job = JOBS[job_id] d = job_dir(job_id) try: mp4 = d / "source.mp4" if not mp4.exists(): raise RuntimeError("source.mp4 不存在,先完成下载") wav = d / "audio.wav" audio_running = job_id in AUDIO_WORKERS_RUNNING or job.audio_script.status == "rewriting" if wav.exists(): update(job, status="splitting", message="复用音轨 · 准备抽帧…", progress=35, source_audio_url=f"/jobs/{job_id}/audio.wav") elif audio_running: update(job, status="splitting", message="音频路并行处理中 · 准备抽帧…", progress=35) else: update(job, status="splitting", message="ffmpeg 拆分音轨…", progress=35) run([ "ffmpeg", "-y", "-i", str(mp4), "-vn", "-ac", "1", "-ar", "16000", "-c:a", "pcm_s16le", str(wav), ]) update(job, source_audio_url=f"/jobs/{job_id}/audio.wav") n = max(1, min(int(frame_count), 20)) target_label = FRAME_TARGET_LABELS.get(target, FRAME_TARGET_LABELS["balanced"]) duration = max(float(job.duration or 1.0), 0.1) effective_quality = _resolve_frame_quality(duration, quality) effective_quality_label = FRAME_QUALITY_LABELS.get(effective_quality, FRAME_QUALITY_LABELS["accurate"]) quality_label = f"自动·{effective_quality_label}" if quality == "auto" else effective_quality_label scan_fps, scan_width, metric_width, estimated_scan_count = _scan_profile(duration, effective_quality) update(job, message=f"本地{quality_label}扫描 · {target_label} · 约 {estimated_scan_count} 帧…", progress=45) frames_dir = d / "frames" replacing = mode == "replace" existing_frames = list(job.frames) if not replacing else [] if replacing and frames_dir.exists(): shutil.rmtree(frames_dir) frames_dir.mkdir(parents=True, exist_ok=True) scan_dir = d / "frame_scan" if scan_dir.exists(): shutil.rmtree(scan_dir) scan_dir.mkdir(parents=True) # 1) 低分辨率、低帧率扫描。扫描图只用于候选评分,最终不直接作为关键帧。 run([ "ffmpeg", "-y", "-i", str(mp4), "-vf", f"fps={scan_fps:.4f},scale={scan_width}:-2", "-q:v", "4", str(scan_dir / "s_%05d.jpg"), ]) scan_paths = sorted(scan_dir.glob("s_*.jpg")) if not scan_paths: raise RuntimeError("低清扫描没有生成候选帧") candidates: list[dict] = [] for i, p in enumerate(scan_paths): t = min(i / scan_fps, max(duration - 0.05, 0.0)) item = _frame_metrics(p, i, t, metric_width) if item: candidates.append(item) if not candidates: raise RuntimeError("候选帧评分失败") # 2) 目标化筛选:pHash 去重 + 清晰度 / 中心细节 / 转场变化 / 动作强度。 # 抽帧阶段只走本机算力,不逐帧调用 Vision;语义审核留到后续素材准备。 semantic_transparent = False selection_count = n if replacing else min(len(candidates), max(n * 4, n + len(existing_frames) + 2)) update(job, message=f"{quality_label}本地筛选 · {target_label} · {n} / {len(candidates)} 张…", progress=60) chosen = _select_keyframes(candidates, selection_count, target) # 3) 只对最终选中的时间点,从原视频抽高质量关键帧。 renamed: list[KeyFrame] = [] chosen_sorted = chosen if semantic_transparent else sorted(chosen, key=lambda it: float(it["timestamp"])) existing_timestamps = [float(f.timestamp) for f in existing_frames] next_idx = max((int(f.index) for f in existing_frames), default=-1) + 1 rejected_by_ai = 0 for attempt, item in enumerate(chosen_sorted, start=1): if len(renamed) >= n: break t = float(item["timestamp"]) if not replacing and any(abs(t - old) < 0.35 for old in existing_timestamps): continue idx = next_idx + len(renamed) dst = frames_dir / f"{idx:03d}.jpg" run([ "ffmpeg", "-y", "-ss", f"{t:.3f}", "-i", str(mp4), "-frames:v", "1", "-pix_fmt", "yuvj420p", "-q:v", "3", str(dst), ]) transparent_score: TransparentHumanFrameScore | None = None if semantic_transparent: update( job, message=f"AI 验收透明骨架人 · 已通过 {len(renamed)}/{n} · 候选 {attempt}/{len(chosen_sorted)}…", progress=min(68, 60 + int(attempt / max(1, len(chosen_sorted)) * 8)), ) transparent_score = _score_transparent_human_frame(dst) if not transparent_score.qualified: rejected_by_ai += 1 try: dst.unlink() except OSError: pass reason = transparent_score.reject_reason or f"总分 {transparent_score.total_score}/100" update(job, message=f"AI 退回候选帧 · {reason[:48]} · 自动换下一帧", progress=65) continue renamed.append(KeyFrame( index=idx, timestamp=round(t, 2), url=f"/jobs/{job_id}/frames/{idx}.jpg", transparent_human_score=transparent_score, )) existing_timestamps.append(t) if semantic_transparent and not renamed: raise RuntimeError("AI 未找到合格透明骨架人帧:需要透明/半透明人体外壳 + 清楚白色骨架 + 非恐怖广告感") # 4) 清理扫描目录 shutil.rmtree(scan_dir, ignore_errors=True) merged_frames = sorted(existing_frames + renamed, key=lambda f: f.timestamp) action_label = "追加" if not replacing else "抽取" final_message = ( f"已按「{quality_label} · {target_label}」AI验收 {action_label} {len(renamed)} 张" + (f" · 退回 {rejected_by_ai} 张" if semantic_transparent else "") + f" · 共 {len(merged_frames)} 张" ) if semantic_transparent else ( f"已按「{quality_label} · {target_label}」{action_label} {len(renamed)} 张关键帧 · 共 {len(merged_frames)} 张" ) update( job, status="transcribed" if job.transcript else "frames_extracted", frames=merged_frames, progress=70, error="", message=final_message, ) except Exception as e: update(job, status="failed", error=str(e), message="解析失败") def analyze_queue_worker() -> None: global ANALYZE_WORKER_RUNNING ANALYZE_WORKER_RUNNING = True try: while ANALYZE_QUEUE: job_id, frames, target, mode, quality = ANALYZE_QUEUE.pop(0) if job_id not in JOBS: continue pipeline_analyze(job_id, frames, target, mode, quality) if ANALYZE_QUEUE: for pos, (queued_job_id, *_rest) in enumerate(ANALYZE_QUEUE, start=1): queued_job = JOBS.get(queued_job_id) if queued_job: update(queued_job, status="splitting", progress=30, message=f"排队等待抽帧 · 前方 {pos - 1} 个任务") finally: ANALYZE_WORKER_RUNNING = False # ---------- 音频转写 + 翻译 + SKG 改写 + Azure OpenAI 配音 ---------- class TranscriptionUnavailable(RuntimeError): pass def _parse_asr_segments(content: str, duration: float) -> list[dict]: raw = (content or "").strip() if raw.startswith("```"): import re as _re match = _re.search(r"(\[[\s\S]*\]|\{[\s\S]*\})", raw) raw = match.group(0) if match else raw try: data = json.loads(raw) except json.JSONDecodeError: text = raw.strip() return [{"start": 0.0, "end": duration, "text": text}] if text else [] if isinstance(data, dict): if data.get("can_hear") is False: raise TranscriptionUnavailable("fallback ASR could not hear the audio") for key in ("segments", "data", "items", "result"): if isinstance(data.get(key), list): data = data[key] break else: text = str(data.get("text") or data.get("transcript") or "").strip() return [{"start": 0.0, "end": duration, "text": text}] if text else [] if not isinstance(data, list): return [] segments: list[dict] = [] for i, item in enumerate(data): if isinstance(item, str): text = item.strip() start = 0.0 if len(data) == 1 else duration * i / max(1, len(data)) end = duration if len(data) == 1 else duration * (i + 1) / max(1, len(data)) elif isinstance(item, dict): text = str(item.get("text") or item.get("en") or item.get("transcript") or "").strip() start = float(item.get("start") or item.get("start_time") or 0) end = float(item.get("end") or item.get("end_time") or duration) else: continue if text: segments.append({"start": max(0.0, start), "end": max(start, end), "text": text}) return segments def _clean_asr_segments(segments: list[dict], duration: float) -> list[dict]: clean: list[dict] = [] cursor = 0.0 for item in segments: text = str(item.get("text") or item.get("en") or item.get("transcript") or "").strip() if not text: continue try: start = float(item.get("start") if item.get("start") is not None else item.get("start_time") or 0) end = float(item.get("end") if item.get("end") is not None else item.get("end_time") or 0) except (TypeError, ValueError): continue if end <= 0 and duration > 0: end = duration start = max(0.0, min(start, duration if duration > 0 else start)) end = max(start + 0.05, min(end, duration if duration > 0 else end)) # Keep the timeline monotonic. Real ASR can overlap slightly, but the UI table should not jump back. if start < cursor - 0.25: start = cursor end = max(end, start + 0.05) cursor = max(cursor, end) clean.append({"start": round(start, 2), "end": round(end, 2), "text": text}) return clean def _segment_text_key(text: str) -> str: return re.sub(r"[^\w]+", " ", text.casefold(), flags=re.UNICODE).strip() def _validate_asr_segments(segments: list[dict], duration: float, source: str) -> list[dict]: clean = _clean_asr_segments(segments, duration) if not clean: raise TranscriptionUnavailable(f"{source} did not return transcript segments") keyed = [_segment_text_key(str(s.get("text") or "")) for s in clean if _segment_text_key(str(s.get("text") or ""))] unique_ratio = len(set(keyed)) / max(1, len(keyed)) one_secondish = [ s for s in clean if 0.75 <= (float(s["end"]) - float(s["start"])) <= 1.25 ] if len(clean) >= 12 and unique_ratio < 0.35: raise TranscriptionUnavailable(f"{source} returned repetitive transcript segments") if len(clean) >= 20 and len(one_secondish) / len(clean) > 0.75 and unique_ratio < 0.65: raise TranscriptionUnavailable(f"{source} returned synthetic one-second timeline") if duration > 0: last_end = max(float(s["end"]) for s in clean) words = sum(len(str(s.get("text") or "").split()) for s in clean) if len(clean) > 1 and last_end > duration + 3: raise TranscriptionUnavailable(f"{source} returned timestamps outside audio duration") if duration > 10 and last_end < duration * 0.45 and words < 20: raise TranscriptionUnavailable(f"{source} returned too little transcript coverage") for item in clean: item["_source"] = source return clean def _local_asr_binary() -> str: candidates = [ LOCAL_ASR_BIN, shutil.which("mlx_whisper") or "", "/opt/homebrew/bin/mlx_whisper", ] for candidate in candidates: if candidate and Path(candidate).exists() and os.access(candidate, os.X_OK): return candidate raise TranscriptionUnavailable("本机未找到可用 mlx_whisper") def _transcribe_mlx_sync(wav: Path) -> list[dict]: wav = wav.resolve() duration = media_duration(wav) binary = _local_asr_binary() output_name = "asr-local" output_path = wav.parent / f"{output_name}.json" if output_path.exists(): output_path.unlink() env = os.environ.copy() try: ffmpeg_path = Path(media_binary("ffmpeg")) env["PATH"] = f"{ffmpeg_path.parent}{os.pathsep}{env.get('PATH', '')}" except Exception: pass cmd = [ binary, str(wav), "--model", LOCAL_ASR_MODEL, "--output-dir", str(wav.parent), "--output-name", output_name, "--output-format", "json", "--verbose", "False", "--condition-on-previous-text", "False", "--word-timestamps", "True", ] try: result = subprocess.run( cmd, cwd=str(wav.parent), env=env, capture_output=True, text=True, timeout=LOCAL_ASR_TIMEOUT_SECONDS, ) except subprocess.TimeoutExpired as e: raise TranscriptionUnavailable(f"本机 ASR 超时:{LOCAL_ASR_TIMEOUT_SECONDS}s") from e if result.returncode != 0: detail = (result.stderr or result.stdout or "").strip().splitlines()[-1:] or ["本机 ASR 执行失败"] raise TranscriptionUnavailable(detail[0][:500]) if not output_path.exists(): raise TranscriptionUnavailable("本机 ASR 未生成 json 结果") data = json.loads(output_path.read_text(encoding="utf-8")) segments = data.get("segments") or [] return _validate_asr_segments(segments, duration, "mlx_whisper") def _transcribe_faster_whisper_sync(wav: Path) -> list[dict]: try: from faster_whisper import WhisperModel except Exception as e: raise TranscriptionUnavailable(f"faster-whisper 不可用:{e}") from e duration = media_duration(wav) model = WhisperModel( FASTER_WHISPER_MODEL, device=FASTER_WHISPER_DEVICE, compute_type=FASTER_WHISPER_COMPUTE_TYPE, ) language_hint = _asr_language_hint() transcribe_options = { "beam_size": 1, "vad_filter": True, "condition_on_previous_text": False, } if language_hint: transcribe_options["language"] = language_hint raw_segments, _info = model.transcribe(str(wav.resolve()), **transcribe_options) detected_language = str(getattr(_info, "language", "") or language_hint or "auto") segments = [ {"start": float(seg.start), "end": float(seg.end), "text": str(seg.text or "").strip()} for seg in raw_segments if str(seg.text or "").strip() ] return _validate_asr_segments(segments, duration, f"faster-whisper:{FASTER_WHISPER_MODEL}:{detected_language}") def _transcribe_gemini_sync(wav: Path) -> list[dict]: duration = media_duration(wav) audio_b64 = base64.b64encode(wav.read_bytes()).decode("ascii") prompt = ( "Transcribe the attached audio. Return strict JSON only, no markdown. " "If you cannot truly hear the audio, return {\"can_hear\": false}. Do not guess. " "If you can hear it, return {\"can_hear\": true, \"segments\": " "[{\"start\": 0.0, \"end\": 1.2, \"text\": \"original-language transcript\"}]}. " "Keep the transcript in the spoken source language; do not translate it here. " "Only include timestamps you can infer from the audio." ) last_error: Exception | None = None for attempt in range(3): try: resp = llm().with_options(timeout=ASR_TIMEOUT_SECONDS).chat.completions.create( model=ASR_FALLBACK_MODEL, messages=[{"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "input_audio", "input_audio": {"data": audio_b64, "format": "wav"}}, ]}], temperature=0, ) content = (resp.choices[0].message.content or "").strip() return _validate_asr_segments(_parse_asr_segments(content, duration), duration, "gemini audio fallback") except Exception as e: last_error = e if attempt < 2: time.sleep(1.0) raise last_error or RuntimeError("Gemini audio transcription failed") def _transcribe_sync(wav: Path) -> list[dict]: """Remote ASR first; local/multimodal fallbacks are explicit runtime switches.""" errors: list[str] = [] duration = media_duration(wav) if ASR_REMOTE_ENABLED: try: with wav.open("rb") as f: language_hint = _asr_language_hint() resp = asr_llm().with_options(timeout=ASR_TIMEOUT_SECONDS).audio.transcriptions.create( file=(wav.name, f, "audio/wav"), model=ASR_MODEL, response_format="verbose_json", timestamp_granularities=["segment"], **({"language": language_hint} if language_hint else {}), ) raw = resp.model_dump() if hasattr(resp, "model_dump") else resp segments = raw.get("segments") or [] # 兜底:网关如果不返回 segments,把全文当一段 if not segments and raw.get("text"): segments = [{"start": 0.0, "end": float(raw.get("duration", 0) or 0), "text": raw["text"]}] detected_language = str(raw.get("language") or language_hint or "auto") return _validate_asr_segments(segments, duration, f"{ASR_MODEL}:{detected_language}") except Exception as e: errors.append(f"{ASR_MODEL}: {e}") else: errors.append(f"{ASR_MODEL}: remote disabled") if ASR_LOCAL_FALLBACK_ENABLED: try: return _transcribe_faster_whisper_sync(wav) except Exception as e: errors.append(f"faster-whisper: {e}") try: return _transcribe_mlx_sync(wav) except Exception as e: errors.append(f"mlx_whisper: {e}") else: errors.append("local ASR fallback disabled") if ASR_AUDIO_FALLBACK_ENABLED: try: return _transcribe_gemini_sync(wav) except Exception as e: errors.append(f"{ASR_FALLBACK_MODEL}: {e}") else: errors.append("multimodal audio fallback disabled") raise TranscriptionUnavailable(";".join(errors)) def _translate_sync(segments: list[dict]) -> list[str]: """批量翻译为中文,按段返回""" payload = [{"i": i, "text": s.get("text", "").strip()} for i, s in enumerate(segments)] prompt = ( "你是多语言字幕翻译。把下列原语言字幕段翻译为简体中文;" "如果原文已经是中文,只做简体中文规范化和口语化整理,不要改写意思。" "保持原意、口语化、自然流畅。" "严格返回 JSON object,不要任何 markdown 或多余文字,schema: " '{"translations":[{"i": 0, "zh": "..."}]}\n\n输入:\n' + json.dumps(payload, ensure_ascii=False) ) try: resp = llm().with_options(timeout=ASR_TIMEOUT_SECONDS).chat.completions.create( model=TRANSLATE_MODEL, messages=[{"role": "user", "content": prompt}], response_format={"type": "json_object"}, temperature=0.2, ) content = resp.choices[0].message.content or "[]" except Exception: return ["" for _ in segments] try: data = json.loads(content) if isinstance(data, dict): for k in ("data", "items", "result", "translations"): if k in data and isinstance(data[k], list): data = data[k] break if not isinstance(data, list): data = [] except json.JSONDecodeError: data = [] zh_by_idx: dict[int, str] = {} for it in data: if isinstance(it, dict) and "i" in it: zh_by_idx[int(it["i"])] = str(it.get("zh", "")) return [zh_by_idx.get(i, "") for i in range(len(segments))] def _transcript_join(segments: list[TranscriptSegment], field: Literal["en", "zh"]) -> str: lines: list[str] = [] for s in segments: text = (s.zh if field == "zh" else s.en).strip() if text: lines.append(f"[{s.start:.1f}-{s.end:.1f}s] {text}") return "\n".join(lines) def _voiceover_target_words(target_seconds: float) -> tuple[int, int]: seconds = max(4.0, min(float(target_seconds or 0) or 12.0, 45.0)) center = int(round(seconds * 2.35)) return max(10, int(center * 0.86)), min(110, max(14, int(center * 1.12))) def _segment_duration(segments: list[TranscriptSegment]) -> float: if not segments: return 0.0 start = min((s.start for s in segments), default=0.0) end = max((s.end for s in segments), default=0.0) return max(0.0, end - start) def _fallback_audio_script(segments: list[TranscriptSegment], target_seconds: float = 12.0) -> str: seconds = max(target_seconds, _segment_duration(segments), 4.0) if seconds <= 7: return "Meet SKG: warm massage, easy comfort, and a tiny reset for busy bodies." if seconds <= 13: return ( "Meet SKG, your shortcut to a calmer body break. A little warmth, a steady massage rhythm, " "and suddenly your day feels less tight and more yours." ) if seconds <= 22: return ( "This is SKG: smart massage for the moments your body asks for a pause. Warmth, rhythm, " "and a clean wearable feel turn neck, back, or everyday tension into a softer reset." ) return ( "Say hello to SKG, the small reset button your day keeps asking for. From neck and shoulder breaks " "to back, eye, knee, or foot comfort, SKG brings warm, rhythmic massage into everyday routines, " "so winding down feels simple, smart, and a little more fun." ) def _audio_delivery_profile(segments: list[TranscriptSegment], target_seconds: float, voice_id: str) -> tuple[str, str]: duration = max(float(target_seconds or 0), _segment_duration(segments), 0.0) words = sum(len([w for w in s.en.replace("\n", " ").split(" ") if w.strip()]) for s in segments) sentence_count = len([s for s in segments if (s.en or s.zh).strip()]) wpm = int(round(words / max(duration, 1.0) * 60)) if words else 0 avg_sentence = duration / sentence_count if sentence_count else 0.0 speaker = ( f"按原素材的短视频单人旁白处理;当前近似音色为 {voice_id},用于保持商业口播的亲近感和节奏。" if voice_id else "按原素材的短视频单人旁白处理;等待选择 TTS 音色。" ) rhythm = ( f"源音频约 {duration:.1f}s,{sentence_count} 个语义段,语速约 {wpm} wpm,平均每段 {avg_sentence:.1f}s;" "新配音按相同时长、短句停顿和信息密度改写。" if duration > 0 and sentence_count else "源音频节奏信息不足;新配音按 8-12 秒信息流广告口播节奏生成。" ) return speaker, rhythm def _fallback_audio_profile(segments: list[TranscriptSegment], target_seconds: float = 0.0) -> tuple[str, str, str]: duration = max(float(target_seconds or 0), _segment_duration(segments), 0.0) words = sum(len([w for w in s.en.replace("\n", " ").split(" ") if w.strip()]) for s in segments) sentence_count = len([s for s in segments if (s.en or s.zh).strip()]) wpm = int(round(words / max(duration, 1.0) * 60)) if words else 0 avg_sentence = duration / sentence_count if sentence_count else 0.0 speaker = "检测到短视频口播人声;当前仅能根据转写段落估算,未做声纹克隆。" rhythm = ( f"音频约 {duration:.1f}s,{sentence_count} 个文案段,语速约 {wpm} wpm,平均每段 {avg_sentence:.1f}s。" if duration > 0 and sentence_count else "音频节奏信息不足;等待模型返回更完整的语速和停顿分析。" ) background = "背景音待模型细分;当前已保留原音频文件,可继续用于音乐、人声和环境声判断。" return speaker, rhythm, background def _audio_profile_model_sync(wav: Path, segments: list[TranscriptSegment], target_seconds: float = 0.0) -> tuple[str, str, str]: fallback = _fallback_audio_profile(segments, target_seconds) if not LLM_API_KEY or not wav.exists(): return fallback transcript = _ensure_english(_transcript_join(segments, "en") or _transcript_join(segments, "zh") or "No reliable transcript.") try: audio_b64 = base64.b64encode(wav.read_bytes()).decode("ascii") except Exception: return fallback prompt = ( "Analyze this short-video audio for an ad recreation workflow. Return strict JSON only, no markdown.\n" "Fields:\n" "- speaker_profile: describe speaker count, likely gender/age range if audible, tone, energy, accent/language, confidence.\n" "- rhythm_profile: describe pacing, pauses, speech density, segment rhythm, and timing pattern.\n" "- background_audio_profile: describe music, background sound, ambience, SFX, loudness relationship to voice, and whether it should be recreated or replaced.\n" "Do not invent an exact identity. If uncertain, state uncertainty.\n\n" f"Known transcript/timestamps:\n{transcript[:5000]}" ) last_error: Exception | None = None for attempt in range(2): try: resp = llm().with_options(timeout=ASR_TIMEOUT_SECONDS).chat.completions.create( model=ASR_FALLBACK_MODEL, messages=[{"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "input_audio", "input_audio": {"data": audio_b64, "format": "wav"}}, ]}], response_format={"type": "json_object"}, temperature=0.1, max_tokens=900, ) content = (resp.choices[0].message.content or "").strip() data = json.loads(content) speaker = str(data.get("speaker_profile") or "").strip() rhythm = str(data.get("rhythm_profile") or "").strip() background = str(data.get("background_audio_profile") or "").strip() if speaker or rhythm or background: return ( speaker or fallback[0], rhythm or fallback[1], background or fallback[2], ) except Exception as e: last_error = e if attempt == 0: time.sleep(1.0) if last_error: print(f"[audio profile fallback] {last_error}", flush=True) return fallback def _build_audio_intake_sync(job_id: str, wav: Path, segments: list[TranscriptSegment], target_seconds: float = 0.0) -> AudioScript: source_text = _transcript_join(segments, "en") source_zh = _transcript_join(segments, "zh") duration = max(float(target_seconds or 0), _segment_duration(segments), 0.0) speaker_profile, rhythm_profile, background_audio_profile = _audio_profile_model_sync(wav, segments, duration) return AudioScript( status="completed", source_text=source_text, source_zh=source_zh, speaker_profile=speaker_profile, rhythm_profile=rhythm_profile, background_audio_profile=background_audio_profile, product_brief=AUDIO_PRODUCT_BRIEF, rewrite_model=ASR_FALLBACK_MODEL, created_at=time.time(), ) def _rewrite_audio_script_sync(segments: list[TranscriptSegment], target_seconds: float = 12.0) -> tuple[str, str, str]: fallback = _fallback_audio_script(segments, target_seconds) try: fallback_zh = _translate_text_sync(fallback, "zh", max_tokens=300) if LLM_API_KEY else "" except Exception: fallback_zh = "" if not LLM_API_KEY: return fallback, fallback_zh, "LLM_API_KEY 未配置,使用本地 SKG 模板" source_text = _transcript_join(segments, "en") min_words, max_words = _voiceover_target_words(target_seconds) prompt = ( "You are an English short-video voice-over writer for SKG wellness massagers. " "Write a fresh product-introduction VO for SKG. Use the source transcript only as timing and pacing reference; " "do not summarize it unless it helps the rhythm.\n" "Rules:\n" f"1. Target audio length is about {target_seconds:.1f} seconds. Output {min_words}-{max_words} English words.\n" "2. Make it natural, warm, premium, and a little playful. It should sound like a real creator, not a stiff ad.\n" "3. Do not claim medical treatment, cure, pain elimination, or clinical effects.\n" "4. Do not copy the original brand, creator, price, platform language, or exact claims.\n" "5. Introduce SKG products directly: smart massage, warmth, rhythm, daily neck/back/eye/knee/foot relaxation.\n" "6. Keep it easy for TTS: short sentences, spoken phrasing, no hashtags, no stage directions, no quotation marks.\n" "7. If the source transcript is thin, ignore it and write a general SKG product intro.\n" 'Return strict JSON only: {"rewritten_text":"English VO","rewritten_text_zh":"Simplified Chinese mirror for team review"}.\n\n' f"SKG product context: {_ensure_english(AUDIO_PRODUCT_BRIEF)}\n\n" f"English transcript:\n{source_text or 'None'}" ) try: resp = llm().chat.completions.create( model=AUDIO_REWRITE_MODEL, messages=[ {"role": "system", "content": "Return valid JSON only. No explanation. No markdown."}, {"role": "user", "content": prompt}, ], response_format={"type": "json_object"}, temperature=0.72, max_tokens=600, ) raw = (resp.choices[0].message.content or "").strip() if raw.startswith("```"): import re as _re match = _re.search(r"\{[\s\S]*\}", raw) raw = match.group(0) if match else raw data = json.loads(raw) text = str(data.get("rewritten_text", "")).strip() text_zh = str(data.get("rewritten_text_zh", "")).strip() if text and not text_zh: text_zh = _translate_text_sync(text, "zh", max_tokens=300) return (text or fallback), (text_zh or fallback_zh), "" except Exception as e: return fallback, fallback_zh, f"改写失败,使用本地模板:{e}" def _choose_azure_voice_id() -> str: if AZURE_TTS_VOICE_POOL: return random.choice(AZURE_TTS_VOICE_POOL) return AZURE_TTS_VOICE_ID def _choose_tts_voice_id() -> str: return _choose_azure_voice_id() def _voice_speed_for(voice_id: str, target_seconds: float, text: str) -> float: words = len([w for w in text.replace("\n", " ").split(" ") if w.strip()]) estimated_seconds = words / 2.35 if words else target_seconds if target_seconds > 0 and estimated_seconds > target_seconds * 1.12: return 1.06 if target_seconds > 0 and estimated_seconds < target_seconds * 0.82: return 0.94 if voice_id == "English_MaturePartner": return 0.96 if voice_id == "English_Upbeat_Woman": return 1.02 return 0.99 def _azure_tts_url_for(path_value: str) -> str: path = path_value if path_value.startswith("/") else f"/{path_value}" if AZURE_OPENAI_BASE_URL.endswith(path): return AZURE_OPENAI_BASE_URL return f"{AZURE_OPENAI_BASE_URL}{path}" def _azure_tts_urls() -> list[str]: urls: list[str] = [] for path in AZURE_TTS_PATHS or [AZURE_TTS_PATH]: url = _azure_tts_url_for(path) if url not in urls: urls.append(url) return urls def _azure_openai_tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> str: if not AZURE_OPENAI_API_KEY: raise RuntimeError("AZURE_OPENAI_API_KEY 或 LLM_API_KEY 未配置,未生成配音") if not text.strip(): raise RuntimeError("改写文案为空,未生成配音") payload = { "model": AZURE_TTS_MODEL, "voice": voice_id, "input": text.strip()[:9500], "response_format": "mp3", "speed": _voice_speed_for(voice_id, target_seconds, text), } headers = { "Authorization": f"Bearer {AZURE_OPENAI_API_KEY}", "api-key": AZURE_OPENAI_API_KEY, "Content-Type": "application/json", } resp: httpx.Response | None = None errors: list[str] = [] with ai_http_client(timeout=120) as client: for url in _azure_tts_urls(): try: current = client.post(url, headers=headers, json=payload) except Exception as e: errors.append(f"{url}: {type(e).__name__}: {e}") continue if current.status_code < 400: resp = current break errors.append(f"{url}: HTTP {current.status_code}: {current.text[:180]}") if current.status_code not in {404, 405}: resp = current break if resp is None: raise RuntimeError("Azure OpenAI TTS 不可用;已尝试 " + " | ".join(errors)) if resp.status_code >= 400: detail = " | ".join(errors) or resp.text[:300] raise RuntimeError(f"Azure OpenAI TTS HTTP {resp.status_code}: {detail[:600]}") audio_bytes = resp.content if not audio_bytes: raise RuntimeError("Azure OpenAI TTS 未返回音频内容") content_type = resp.headers.get("content-type", "") if "application/json" in content_type.lower(): try: data = resp.json() except Exception: data = {"error": resp.text[:300]} raise RuntimeError(f"Azure OpenAI TTS 返回 JSON 而不是音频:{str(data)[:300]}") out = job_dir(job_id) / "audio_script.mp3" out.write_bytes(audio_bytes) return f"/jobs/{job_id}/audio-script.mp3" def _tts_sync(job_id: str, text: str, voice_id: str, target_seconds: float = 12.0) -> tuple[str, str, str]: return _azure_openai_tts_sync(job_id, text, voice_id, target_seconds), "azure_openai", AZURE_TTS_MODEL def _build_audio_script_sync(job_id: str, segments: list[TranscriptSegment], target_seconds: float = 12.0) -> AudioScript: source_text = _transcript_join(segments, "en") source_zh = _transcript_join(segments, "zh") duration = max(float(target_seconds or 0), _segment_duration(segments), 4.0) rewritten, rewritten_zh, rewrite_error = _rewrite_audio_script_sync(segments, duration) selected_voice_id = _choose_tts_voice_id() speaker_profile, rhythm_profile = _audio_delivery_profile(segments, duration, selected_voice_id) voice_url = "" voice_error = "" voice_provider = "azure_openai" voice_model = AZURE_TTS_MODEL try: voice_url, voice_provider, voice_model = _tts_sync(job_id, rewritten, selected_voice_id, duration) except Exception as e: voice_error = str(e) # 改写失败时已有本地 SKG 模板兜底,不把它标成用户可见错误;配音失败才需要提示。 errors = voice_error return AudioScript( status="completed", source_text=source_text, source_zh=source_zh, rewritten_text=rewritten, rewritten_text_zh=rewritten_zh, speaker_profile=speaker_profile, rhythm_profile=rhythm_profile, product_brief=AUDIO_PRODUCT_BRIEF, rewrite_model=AUDIO_REWRITE_MODEL, voice_provider=voice_provider, voice_model=voice_model, voice_id=selected_voice_id, voice_url=voice_url, error=errors, created_at=time.time(), ) def pipeline_transcribe(job_id: str, manage_job_status: bool = True) -> None: job = JOBS[job_id] d = job_dir(job_id) wav = d / "audio.wav" def progress(message: str, value: int) -> None: if manage_job_status: update(job, status="transcribing", message=message, progress=value, error="") try: if not wav.exists(): mp4 = d / "source.mp4" if not mp4.exists(): raise RuntimeError("source.mp4 不存在,视频导入完成后再提取音频") progress("ffmpeg 提取音频轨…", max(45, min(job.progress, 70))) run([ "ffmpeg", "-y", "-i", str(mp4), "-vn", "-ac", "1", "-ar", "16000", "-c:a", "pcm_s16le", str(wav), ]) if not wav.exists(): raise RuntimeError("音频提取完成但找不到 audio.wav") update(job, source_audio_url=f"/jobs/{job_id}/audio.wav") target_duration = max(media_duration(wav), float(job.duration or 0), 4.0) if not LLM_API_KEY: # 无 key 模式:mock 数据 progress("ASR (mock) …", 75) time.sleep(1.0) mock = [ TranscriptSegment(index=0, start=0.0, end=3.5, en="Welcome back, today we're testing something new.", zh="欢迎回来,今天我们要测试一些新东西。"), TranscriptSegment(index=1, start=3.5, end=7.2, en="This device looks really sleek and minimal.", zh="这个设备看起来非常时尚和简约。"), ] update_kwargs = { "transcript": mock, "audio_script": AudioScript( status="rewriting", source_text=_transcript_join(mock, "en"), source_zh=_transcript_join(mock, "zh"), speaker_profile="正在分析原音频讲话人和口播节奏…", rhythm_profile="正在按原音频时长、语速和停顿分析口播节奏…", background_audio_profile="正在分析背景音乐、环境声和音效…", product_brief=AUDIO_PRODUCT_BRIEF, rewrite_model=ASR_FALLBACK_MODEL, ), } if manage_job_status: update_kwargs.update(message="ASR mock 完成,分析声音和背景音…", progress=92) update(job, **update_kwargs) audio_script = _build_audio_intake_sync(job_id, wav, mock, target_duration) if manage_job_status: update(job, transcript=mock, status="transcribed", progress=100, audio_script=audio_script, message="音频解析完成(MOCK · 未设 LLM_API_KEY)") else: update(job, transcript=mock, audio_script=audio_script) return # 1) whisper ASR progress(f"{ASR_MODEL} {_asr_language_label()} 语种转录中…", 78) segments = _transcribe_sync(wav) if not segments: raise TranscriptionUnavailable("ASR 未返回可用字幕段") asr_source = str(segments[0].get("_source") or ASR_MODEL) # 先把英文段落落到 job 上(让 UI 提前看到,翻译再补 zh) en_only = [ TranscriptSegment( index=i, start=float(s.get("start", 0)), end=float(s.get("end", 0)), en=str(s.get("text", "")).strip(), zh="", ) for i, s in enumerate(segments) ] if manage_job_status: update(job, transcript=en_only, message=f"ASR 完成 · {len(en_only)} 段,开始翻译…", progress=88) else: update(job, transcript=en_only) # 2) Gemini 翻译 zh_list = _translate_sync(segments) full = [ TranscriptSegment( index=seg.index, start=seg.start, end=seg.end, en=seg.en, zh=zh_list[i] if i < len(zh_list) else "", ) for i, seg in enumerate(en_only) ] update_kwargs = { "transcript": full, "audio_script": AudioScript( status="rewriting", source_text=_transcript_join(full, "en"), source_zh=_transcript_join(full, "zh"), speaker_profile="正在分析原音频讲话人和口播节奏…", rhythm_profile="正在按原音频时长、语速和停顿分析口播节奏…", background_audio_profile="正在分析背景音乐、环境声和音效…", product_brief=AUDIO_PRODUCT_BRIEF, rewrite_model=ASR_FALLBACK_MODEL, ), } if manage_job_status: update_kwargs.update(message="翻译完成,分析讲话人、节奏和背景音…", progress=94) update(job, **update_kwargs) audio_script = _build_audio_intake_sync(job_id, wav, full, target_duration) if manage_job_status: update(job, transcript=full, status="transcribed", progress=100, audio_script=audio_script, message=f"音频解析完成 · {len(full)} 段({asr_source} + {TRANSLATE_MODEL} + {ASR_FALLBACK_MODEL} 音频分析)") else: update(job, transcript=full, audio_script=audio_script) except Exception as e: if manage_job_status: update( job, status="failed", audio_script=AudioScript(status="failed", error=str(e), created_at=time.time()), error=str(e), message="转录失败", ) else: update(job, audio_script=AudioScript(status="failed", error=str(e), created_at=time.time())) def _audio_processing_worker(job_id: str, manage_job_status: bool) -> None: try: pipeline_transcribe(job_id, manage_job_status=manage_job_status) finally: with AUDIO_WORKERS_LOCK: AUDIO_WORKERS_RUNNING.discard(job_id) def start_audio_processing(job_id: str, manage_job_status: bool = True) -> bool: job = JOBS.get(job_id) if not job: return False if not manage_job_status: has_audio_output = bool(job.transcript) or bool(job.audio_script.rewritten_text) if has_audio_output or job.audio_script.status == "rewriting": return False with AUDIO_WORKERS_LOCK: if job_id in AUDIO_WORKERS_RUNNING: return False AUDIO_WORKERS_RUNNING.add(job_id) threading.Thread( target=_audio_processing_worker, args=(job_id, manage_job_status), daemon=True, name=f"audio-{job_id}", ).start() return True def _image_is_capacity_error(status_code: int, body: str) -> bool: lower = body.lower() return ( status_code == 429 or ( status_code in (500, 502, 503, 504) and any(token in lower for token in ("saturated", "rate", "quota", "capacity", "overload", "timeout", "繁忙", "饱和", "过载")) ) ) def _image_retry_delay(attempt: int, status_code: int = 0, body: str = "", retry_after: str | None = None) -> float: if retry_after: try: return max(1.0, min(60.0, float(retry_after))) except ValueError: pass if _image_is_capacity_error(status_code, body): return [6.0, 14.0, 30.0, 45.0][min(attempt, 3)] return [1.0, 2.0, 4.0, 8.0][min(attempt, 3)] def _image_is_transport_error(message: str) -> bool: lower = message.lower() return any( token in lower for token in ( "connecterror", "connecttimeout", "readtimeout", "timeout", "nodename nor servname", "name or service not known", "temporary failure in name resolution", "operation not permitted", "connection refused", "network is unreachable", ) ) def _image_fallback_models() -> list[str]: if not IMAGE_FALLBACK_ENABLED or not IMAGE_FALLBACK_MODEL or IMAGE_FALLBACK_MODEL == GPT_IMAGE_MODEL: return [] return [IMAGE_FALLBACK_MODEL] def _image_circuit_snapshot() -> dict: now = time.time() with _IMAGE_CIRCUIT_LOCK: open_until = _IMAGE_PRIMARY_OPEN_UNTIL return { "primary": GPT_IMAGE_MODEL, "fallbacks": _image_fallback_models(), "failure_threshold": IMAGE_CIRCUIT_FAILURE_THRESHOLD, "cooldown_seconds": IMAGE_CIRCUIT_COOLDOWN_SECONDS, "primary_failures": _IMAGE_PRIMARY_FAILURES, "primary_open": open_until > now, "primary_open_until": open_until if open_until > now else 0, "primary_open_remaining_seconds": max(0, int(open_until - now)), } def _image_primary_circuit_open() -> bool: return _image_circuit_snapshot()["primary_open"] def _normalize_image_model_preference(value: str | None) -> str: raw = (value or "auto").strip().lower() if raw in {"", "auto", "default"}: return "auto" if raw in {"gpt", "gpt-image", GPT_IMAGE_MODEL.lower()}: return GPT_IMAGE_MODEL if IMAGE_FALLBACK_MODEL and raw in {"gemini", IMAGE_FALLBACK_MODEL.lower()}: return IMAGE_FALLBACK_MODEL return "auto" def _image_model_candidates(force_fallback: bool = False, preference: str | None = "auto") -> list[str]: normalized = _normalize_image_model_preference(preference) fallbacks = _image_fallback_models() if normalized == GPT_IMAGE_MODEL: return [GPT_IMAGE_MODEL] if normalized == IMAGE_FALLBACK_MODEL and fallbacks: return [IMAGE_FALLBACK_MODEL] if not fallbacks: return [GPT_IMAGE_MODEL] if force_fallback or _image_primary_circuit_open(): return fallbacks return [GPT_IMAGE_MODEL, *fallbacks] def image_model_options() -> list[dict]: options = [ { "id": "auto", "label": "自动", "model": GPT_IMAGE_MODEL, "description": "优先 GPT Image 2,必要时按后端熔断和兜底策略切到备用图片模型", "available": bool(IMAGE_API_KEY), }, { "id": GPT_IMAGE_MODEL, "label": "GPT Image 2", "model": GPT_IMAGE_MODEL, "description": "主生图模型,适合营销图和参考图重绘", "available": bool(IMAGE_API_KEY), }, ] if IMAGE_FALLBACK_ENABLED and IMAGE_FALLBACK_MODEL and IMAGE_FALLBACK_MODEL != GPT_IMAGE_MODEL: options.append({ "id": IMAGE_FALLBACK_MODEL, "label": "Gemini 图片", "model": IMAGE_FALLBACK_MODEL, "description": "备用图片模型,适合主模型慢或失败时手动选择", "available": bool(IMAGE_API_KEY), }) return options def image_size_options() -> list[dict]: return IMAGE_SIZE_CHOICES def _normalize_image_size(raw: str | None) -> str: value = (raw or "auto").strip().lower() aliases = { "vertical": "1024x1536", "portrait": "1024x1536", "竖图": "1024x1536", "square": "1024x1024", "方图": "1024x1024", "horizontal": "1536x1024", "landscape": "1536x1024", "横图": "1536x1024", } value = aliases.get(value, value) allowed = {str(item["value"]) for item in IMAGE_SIZE_CHOICES} if value not in allowed: raise HTTPException(400, f"unsupported image size: {raw}") return value def _image_size_payload(raw: str | None) -> dict: size = _normalize_image_size(raw) return {} if size == "auto" else {"size": size} def video_duration_options() -> list[int]: if video_uses_ark(): return [5, 8, 10, 12, 15] return [4, 8, 12] def video_size_options() -> list[dict]: return VIDEO_SIZE_CHOICES def _normalize_video_size(raw: str | None) -> str: value = (raw or "720x1280").strip().lower().replace(" ", "") aliases = { "vertical": "720x1280", "portrait": "720x1280", "9:16": "720x1280", "竖屏": "720x1280", "horizontal": "1280x720", "landscape": "1280x720", "16:9": "1280x720", "横屏": "1280x720", "square": "1024x1024", "1:1": "1024x1024", "方形": "1024x1024", "3:4": "960x1280", } value = aliases.get(value, value) allowed = {str(item["value"]) for item in VIDEO_SIZE_CHOICES} if value not in allowed: raise HTTPException(400, f"unsupported video size: {raw}") return value def video_model_options() -> list[dict]: label_map = { "seedance": "Seedance 2.0 Fast", "kling": "Kling", "veo3": "Veo 3", "veo": "Veo", "voe": "Veo", } concrete_label_map = { "doubao-seedance-2-0-fast-260128": "Seedance 2.0 Fast", } seen_models: set[str] = set() options: list[dict] = [] for key in ["seedance", "kling", "veo3", "veo"]: if key not in VIDEO_MODEL_ALIASES: continue model = VIDEO_MODEL_ALIASES[key] if model in seen_models: continue seen_models.add(model) options.append({ "id": key, "label": concrete_label_map.get(model, label_map.get(key, key)), "model": model, "description": f"当前视频网关可选模型;单次时长最高 {max(video_duration_options())} 秒", "duration_options": video_duration_options(), "size_options": video_size_options(), "max_duration_seconds": max(video_duration_options()), "available": bool(video_api_key()), }) default_model = resolve_video_model(VIDEO_MODEL) if not any(item["id"] == VIDEO_MODEL or item["model"] == default_model for item in options): options.insert(0, { "id": VIDEO_MODEL, "label": label_map.get(VIDEO_MODEL, VIDEO_MODEL), "model": default_model, "description": "默认视频模型", "duration_options": video_duration_options(), "size_options": video_size_options(), "max_duration_seconds": max(video_duration_options()), "available": bool(video_api_key()), }) return options def _image_failure_can_fallback(status_code: int, body: str, last_err: str) -> bool: if status_code in (400, 401, 403, 404): return False return ( status_code == 429 or status_code >= 500 or _image_is_capacity_error(status_code, body) or _image_is_transport_error(last_err) or "timeout" in (body or "").lower() ) def _image_record_primary_success() -> None: global _IMAGE_PRIMARY_FAILURES, _IMAGE_PRIMARY_OPEN_UNTIL with _IMAGE_CIRCUIT_LOCK: if _IMAGE_PRIMARY_FAILURES or _IMAGE_PRIMARY_OPEN_UNTIL: print(f"[image circuit] primary {GPT_IMAGE_MODEL} recovered", flush=True) _IMAGE_PRIMARY_FAILURES = 0 _IMAGE_PRIMARY_OPEN_UNTIL = 0.0 def _image_record_primary_failure(reason: str) -> None: global _IMAGE_PRIMARY_FAILURES, _IMAGE_PRIMARY_OPEN_UNTIL if not _image_fallback_models(): return with _IMAGE_CIRCUIT_LOCK: _IMAGE_PRIMARY_FAILURES += 1 if _IMAGE_PRIMARY_FAILURES >= IMAGE_CIRCUIT_FAILURE_THRESHOLD: _IMAGE_PRIMARY_OPEN_UNTIL = time.time() + IMAGE_CIRCUIT_COOLDOWN_SECONDS print( f"[image circuit] primary {GPT_IMAGE_MODEL} opened for {IMAGE_CIRCUIT_COOLDOWN_SECONDS}s " f"after {_IMAGE_PRIMARY_FAILURES} failures; fallback={IMAGE_FALLBACK_MODEL}; reason={reason[:220]}", flush=True, ) else: print( f"[image circuit] primary {GPT_IMAGE_MODEL} failure {_IMAGE_PRIMARY_FAILURES}/{IMAGE_CIRCUIT_FAILURE_THRESHOLD}; " f"fallback={IMAGE_FALLBACK_MODEL}; reason={reason[:220]}", flush=True, ) def _image_failure_message(kind: str, attempts: int, last_err: str, capacity_seen: bool) -> str: if capacity_seen: return ( f"{kind} failed after {attempts} attempts: gpt-image-2 上游负载饱和," f"已自动退避重试仍失败,请稍后点重试。最后错误:{last_err}" ) if "timeout" in last_err.lower(): return ( f"{kind} failed after {attempts} attempts: gpt-image-2 图片网关响应超时" f"(单次 {IMAGE_REQUEST_TIMEOUT_SECONDS}s),模型未更改。" f"请检查 {IMAGE_BASE_URL or LLM_BASE_URL or 'image gateway'} 的 gpt-image-2 上游渠道或稍后重试。" f"最后错误:{last_err}" ) if _image_is_transport_error(last_err): return ( f"{kind} failed after {attempts} attempts: 图片网关网络/DNS 连接失败," "请确认本机网络或在 api/.env 配置 AI_HTTP_PROXY / IMAGE_HTTP_PROXY 后重启后端。" f"最后错误:{last_err}" ) return f"{kind} failed after {attempts} attempts: {last_err}" def _image_error_status(error: Exception) -> int: msg = str(error) return 503 if ( "上游负载饱和" in msg or "HTTP 429" in msg or "saturated" in msg.lower() or _image_is_transport_error(msg) ) else 500 def _image_endpoint(path: str) -> str: base = (IMAGE_BASE_URL or "").strip().rstrip("/") if not base: raise RuntimeError("IMAGE_BASE_URL 或 LLM_BASE_URL 未配置") return f"{base}/{path.lstrip('/')}" def _image_generation_response(prompt: str, model: str, size: str | None = "auto") -> dict: with ai_http_client(timeout=IMAGE_REQUEST_TIMEOUT_SECONDS) as client: r = client.post( _image_endpoint("/images/generations"), headers={"Authorization": f"Bearer {IMAGE_API_KEY}"}, json={"model": model, "prompt": prompt, "n": 1, **_image_size_payload(size)}, ) r.raise_for_status() return r.json() def _image_should_retry( attempt: int, total_attempts: int, status_code: int, body: str, last_err: str, next_mode_changed: bool = False, ) -> bool: if attempt >= total_attempts - 1: return False if next_mode_changed and status_code not in (401, 403): if status_code == 0 and _image_is_transport_error(last_err): return False return True if status_code in (400, 401, 403, 404): return False if status_code == 0 and _image_is_transport_error(last_err): return False return True def _prepare_image_edit_bytes(image_path: Path, max_side: int) -> bytes: import io as _io from PIL import Image as _PILImage try: im = _PILImage.open(image_path) if max(im.size) > max_side: im.thumbnail((max_side, max_side), _PILImage.LANCZOS) buf = _io.BytesIO() im.convert("RGB").save(buf, format="JPEG", quality=88) return buf.getvalue() except Exception: return image_path.read_bytes() def _image_edit_call( image_path: Path | list[Path], prompt: str, model: str | None = None, models: list[str] | None = None, fallback_text: bool = False, max_attempts: int = 3, max_side: int = 1024, force_fallback_model: bool = False, image_model_preference: str | None = "auto", ) -> tuple[bytes, str]: """通用 image edit 调用 · 失败重试 + 可选 text fallback。 返回 (image_bytes, effective_mode) where effective_mode in {"edit","text"}。 失败 raise RuntimeError。 输入图自动 resize 到 max_side(默认 1024)边长后再用 multipart 上传;多参考图使用 image[]。 生图模型主路径使用 gpt-image-2;Gemini 只在主模型上游异常时兜底。model/models 参数只保留兼容旧调用。""" import base64 as b64lib import time as _time import httpx if not IMAGE_API_KEY: raise RuntimeError("IMAGE_API_KEY 或 LLM_API_KEY 未配置") model = GPT_IMAGE_MODEL image_paths = image_path if isinstance(image_path, list) else [image_path] image_paths = [path for path in image_paths if path and path.exists()][:10] if not image_paths: raise RuntimeError("image edit reference image missing") img_bytes_list = [_prepare_image_edit_bytes(path, max_side) for path in image_paths] model_candidates = _image_model_candidates(force_fallback=force_fallback_model, preference=image_model_preference) mode_plan: list[str] = ["edit"] if model_candidates != [GPT_IMAGE_MODEL] else ["edit"] * max_attempts if fallback_text: mode_plan.append("text") attempt_steps = [(current_mode, current_model) for current_mode in mode_plan for current_model in model_candidates] last_err = "" resp_data: dict = {} effective_mode = "edit" capacity_seen = False attempts_done = 0 for attempt, (current_mode, current_model) in enumerate(attempt_steps): attempts_done = attempt + 1 status_code = 0 body = "" retry_after: str | None = None try: if current_mode == "edit": with ai_http_client(timeout=IMAGE_REQUEST_TIMEOUT_SECONDS) as client: r = client.post( _image_endpoint("/images/edits"), headers={ "Authorization": f"Bearer {IMAGE_API_KEY}", }, data={"model": current_model, "prompt": prompt, "n": "1"}, files=( {"image": ("reference.jpg", img_bytes_list[0], "image/jpeg")} if len(img_bytes_list) == 1 else [ ("image[]", (f"reference_{idx + 1}.jpg", img_bytes, "image/jpeg")) for idx, img_bytes in enumerate(img_bytes_list) ] ), ) r.raise_for_status() resp_data = r.json() else: resp_data = _image_generation_response(prompt, current_model) if resp_data.get("data"): effective_mode = f"{current_mode}:{current_model}" model = current_model # 记录实际成功的 model if current_model == GPT_IMAGE_MODEL: _image_record_primary_success() break err_obj = resp_data.get("error") or {} last_err = f"empty data · {err_obj.get('code', '')} · {str(err_obj.get('message', ''))[:200]} · model={current_model}" except httpx.HTTPStatusError as e: body = e.response.text status_code = e.response.status_code retry_after = e.response.headers.get("retry-after") capacity_seen = capacity_seen or _image_is_capacity_error(status_code, body) fatal = status_code in (401, 403) last_err = f"HTTP {status_code}: {body[:200]} · model={current_model}" if fatal: raise RuntimeError(f"image edit HTTP {status_code}: {body[:300]}") except Exception as e: last_err = f"{type(e).__name__}: {e} · model={current_model}" fallbackable = current_model == GPT_IMAGE_MODEL and _image_failure_can_fallback(status_code, body, last_err) if fallbackable: _image_record_primary_failure(last_err) if any(next_model != GPT_IMAGE_MODEL for _next_mode, next_model in attempt_steps[attempt + 1:]): print(f"[image edit fallback → {IMAGE_FALLBACK_MODEL}] {last_err}", flush=True) continue next_mode_changed = attempt < len(attempt_steps) - 1 and attempt_steps[attempt + 1][0] != current_mode if _image_should_retry(attempt, len(attempt_steps), status_code, body, last_err, next_mode_changed): tag = f"retry {attempt + 1}/{len(attempt_steps)} → {current_model}" delay = _image_retry_delay(attempt, status_code, body, retry_after) print(f"[image edit {tag}, sleep {delay:.0f}s] {last_err}", flush=True) _time.sleep(delay) else: break data_arr = resp_data.get("data", []) if not data_arr: raise RuntimeError(_image_failure_message("image edit", attempts_done, last_err, capacity_seen)) item = data_arr[0] b64 = item.get("b64_json") if not b64 and item.get("url"): with ai_http_client(timeout=IMAGE_REQUEST_TIMEOUT_SECONDS) as client: image_resp = client.get(item["url"]) image_resp.raise_for_status() return image_resp.content, effective_mode if not b64: raise RuntimeError("image edit returned no b64_json") return b64lib.b64decode(b64), effective_mode def _image_text_call( prompt: str, model: str | None = None, models: list[str] | None = None, max_attempts: int = 3, force_fallback_model: bool = False, image_model_preference: str | None = "auto", ) -> tuple[bytes, str]: """Text-only image generation. gpt-image-2 primary, Gemini only as outage fallback.""" import base64 as b64lib import time as _time import httpx if not IMAGE_API_KEY: raise RuntimeError("IMAGE_API_KEY 或 LLM_API_KEY 未配置") candidates = _image_model_candidates(force_fallback=force_fallback_model, preference=image_model_preference) attempt_models = candidates if candidates != [GPT_IMAGE_MODEL] else [GPT_IMAGE_MODEL] * max_attempts last_err = "" capacity_seen = False attempts_done = 0 for attempt, current_model in enumerate(attempt_models): attempts_done = attempt + 1 status_code = 0 body = "" retry_after: str | None = None try: resp_data = _image_generation_response(prompt, current_model) if resp_data.get("data"): item = resp_data["data"][0] b64 = item.get("b64_json") if b64: if current_model == GPT_IMAGE_MODEL: _image_record_primary_success() return b64lib.b64decode(b64), f"text:{current_model}" if item.get("url"): with ai_http_client(timeout=IMAGE_REQUEST_TIMEOUT_SECONDS) as client: image_resp = client.get(item["url"]) image_resp.raise_for_status() if current_model == GPT_IMAGE_MODEL: _image_record_primary_success() return image_resp.content, f"text:{current_model}" err_obj = resp_data.get("error") or {} last_err = f"empty data · {err_obj.get('code', '')} · {str(err_obj.get('message', ''))[:200]} · model={current_model}" except httpx.HTTPStatusError as e: body = e.response.text status_code = e.response.status_code retry_after = e.response.headers.get("retry-after") capacity_seen = capacity_seen or _image_is_capacity_error(status_code, body) last_err = f"HTTP {status_code}: {body[:200]} · model={current_model}" except Exception as e: last_err = f"{type(e).__name__}: {e} · model={current_model}" body = str(e) status_code = 429 if "429" in body or "saturated" in body.lower() or "饱和" in body else 0 capacity_seen = capacity_seen or _image_is_capacity_error(status_code, body) fallbackable = current_model == GPT_IMAGE_MODEL and _image_failure_can_fallback(status_code, body, last_err) if fallbackable: _image_record_primary_failure(last_err) if any(next_model != GPT_IMAGE_MODEL for next_model in attempt_models[attempt + 1:]): print(f"[image text fallback → {IMAGE_FALLBACK_MODEL}] {last_err}", flush=True) continue if _image_should_retry(attempt, len(attempt_models), status_code, body, last_err): delay = _image_retry_delay(attempt, status_code, body, retry_after) print(f"[image text retry {attempt + 1}/{len(attempt_models)} → {current_model}, sleep {delay:.0f}s] {last_err}", flush=True) _time.sleep(delay) else: break raise RuntimeError(_image_failure_message("image text", attempts_done, last_err, capacity_seen)) def _image_path_to_data_url(path: Path) -> str: media_type = "image/png" if path.suffix.lower() == ".png" else "image/jpeg" return f"data:{media_type};base64,{base64.b64encode(path.read_bytes()).decode('ascii')}" def _vision_brief_from_images(image_paths: list[Path], prompt: str, max_images: int = 8, model: str | None = None) -> str: paths = [path for path in image_paths if path.exists()][:max_images] if not paths: return "" if not LLM_API_KEY: return "" content: list[dict] = [{"type": "text", "text": prompt}] for path in paths: content.append({"type": "image_url", "image_url": {"url": _image_path_to_data_url(path)}}) try: resp = llm().chat.completions.create( model=model or VISION_MODEL, messages=[{"role": "user", "content": content}], response_format={"type": "json_object"}, temperature=0.1, max_tokens=1400, ) raw = (resp.choices[0].message.content or "").strip() if not raw: raw = (getattr(resp.choices[0].message, "reasoning_content", "") or "").strip() match = re.search(r"\{[\s\S]*\}", raw) raw = match.group(0) if match else raw data = json.loads(raw) except Exception as e: print(f"[vision brief failed] {e}", flush=True) return "" if isinstance(data, dict): if isinstance(data.get("brief"), str) and data["brief"].strip(): return data["brief"].strip()[:1800] parts: list[str] = [] for key in ( "gender_presentation", "age_range", "body_proportion", "hair", "skin_tone", "wardrobe_style", "pose_language", "camera_visibility", "commercial_mood", "neck_shoulder_readiness", "style_constraints", ): value = data.get(key) if isinstance(value, str) and value.strip(): parts.append(f"{key.replace('_', ' ')}: {value.strip()}") if parts: return "; ".join(parts)[:1800] return "" def _describe_source_subject(job_id: str, source_indices: list[int]) -> str: """Turn source keyframes into a non-identifying visual brief for similar-subject text generation.""" paths = [_source_frame_path(job_id, idx) for idx in source_indices] prompt = ( "You are preparing a non-identifying character brief for generating a NEW similar but non-identical ad subject. " "Look at these source video keyframes as evidence of one role and style, not as a person to identify. " "Do NOT identify the person, do NOT estimate exact age, do NOT describe biometric identity, and do NOT mention celebrity or real-person likeness. " "Output strict JSON only. Use broad style traits suitable for text-to-image generation.\n" "Required keys: gender_presentation, age_range, body_proportion, hair, skin_tone, wardrobe_style, " "pose_language, camera_visibility, commercial_mood, neck_shoulder_readiness, style_constraints, brief.\n" "The brief should be 80-140 words and should preserve category, role, energy, camera readability, and commercial atmosphere while explicitly allowing a new non-identical subject." ) return _vision_brief_from_images(paths, prompt, max_images=8) def _describe_subject_template_from_images(name: str, subject_style: str, image_paths: list[Path], note: str = "") -> str: prompt = ( f"You are summarizing a saved SKG subject template named '{name}' for future text-to-image generation. " f"Subject style: {subject_style}. User note: {note[:500]}. " "Look at the subject views and describe the reusable creative direction without copying identity or pixels. " "Do NOT identify a person and do NOT describe exact facial identity. " "Output strict JSON only with keys: gender_presentation, age_range, body_proportion, material_or_skin, " "wardrobe_or_surface_style, pose_language, camera_readability, neck_shoulder_readiness, commercial_mood, brief. " "The brief should be 80-140 words and must be useful as a reference character brief for creating a new innovative variation." ) return _vision_brief_from_images(image_paths, prompt, max_images=10) def _describe_subject_consensus_from_images(name: str, subject_style: str, image_paths: list[Path], note: str = "") -> str: prompt = ( f"You are extracting the stable character bible from a generated SKG subject view pack named '{name}'. " f"Subject style: {subject_style}. User/profile note: {note[:700]}. " "These images are multiple views of ONE generated subject. Summarize the reusable identity as text for future first/last-frame generation. " "Do NOT identify a real person and do NOT mention exact facial identity. " "Output strict JSON only with keys: gender_presentation, age_range, body_proportion, hair, skin_tone, " "wardrobe_or_material_style, pose_language, camera_readability, neck_shoulder_readiness, commercial_mood, brief. " "The brief should be 90-160 words, describe one consistent subject, and explicitly allow new poses, new framing, new expressions, and new environments while preserving identity, proportions, material/style, and ad role." ) return _vision_brief_from_images(image_paths, prompt, max_images=10) def _subject_agent_model(bundle: SubjectModelBundle) -> str: return SUBJECT_AGENT_GEMINI_MODEL if bundle == "gemini" else SUBJECT_AGENT_GPT_MODEL def _subject_agent_image_model(bundle: SubjectModelBundle) -> str: return IMAGE_FALLBACK_MODEL if bundle == "gemini" and IMAGE_FALLBACK_MODEL else GPT_IMAGE_MODEL def _list_of_strings(value, limit: int = 18) -> list[str]: if isinstance(value, list): return [str(item).strip()[:80] for item in value if str(item).strip()][:limit] if isinstance(value, str) and value.strip(): return [part.strip()[:80] for part in re.split(r"[,,;;\n]", value) if part.strip()][:limit] return [] def _subject_agent_json_from_images(job_id: str, source_indices: list[int], bundle: SubjectModelBundle) -> dict: paths = [_source_frame_path(job_id, idx) for idx in source_indices] paths = [path for path in paths if path.exists()][:8] if not paths or not LLM_API_KEY: return {} prompt = ( "You are the image-generation requirements agent for an SKG ad-subject reconstruction workspace. " "Only analyze the attached reference images for future subject pack generation. Do not discuss video, audio, copywriting, download, or unrelated tasks. " "The user may later choose whether to preserve the visible subject, preserve only the creative concept with a new person, mix selected elements, or create from a new description. " "Output strict JSON only with these keys: summary_zh, summary_en, generation_brief_en, trait_chips, mode_options, questions, warnings. " "summary_zh: 2-4 concise Chinese sentences describing visible subject, concept, outfit/material, camera usefulness. " "summary_en and generation_brief_en: English only. generation_brief_en is a direct image-generation brief that preserves useful traits while avoiding copyrighted/identifying replication unless user explicitly selects source-locked mode. " "trait_chips: 8-18 short Chinese selectable traits. Include identity category, anatomy/material, clothing, color, style, framing, and useful negative constraints. " "mode_options: short Chinese labels for likely choices. questions: 2-4 Chinese questions to clarify generation. warnings: Chinese notes about identity/copyright/consistency risk." ) content: list[dict] = [{"type": "text", "text": prompt}] for path in paths: content.append({"type": "image_url", "image_url": {"url": _image_path_to_data_url(path)}}) try: resp = llm().chat.completions.create( model=_subject_agent_model(bundle), messages=[{"role": "user", "content": content}], response_format={"type": "json_object"}, temperature=0.15, max_tokens=1600, ) raw = (resp.choices[0].message.content or "").strip() if not raw: raw = (getattr(resp.choices[0].message, "reasoning_content", "") or "").strip() match = re.search(r"\{[\s\S]*\}", raw) raw = match.group(0) if match else raw data = json.loads(raw) return data if isinstance(data, dict) else {} except Exception as e: print(f"[subject agent analyze failed] bundle={bundle} error={e}", flush=True) return {} def _subject_agent_analysis(job_id: str, source_indices: list[int], bundle: SubjectModelBundle) -> SubjectAgentAnalysis: clean_indices = list(dict.fromkeys(int(idx) for idx in source_indices if isinstance(idx, int) or str(idx).isdigit()))[:8] model = _subject_agent_model(bundle) data = _subject_agent_json_from_images(job_id, clean_indices, bundle) brief_en = _ensure_english(str(data.get("generation_brief_en") or data.get("summary_en") or "").strip()) if data else "" if not data: data = { "summary_zh": "已接收参考帧,但模型没有返回可用结构化分析。你仍可以在下方描述要保留或改变的主体元素。", "summary_en": "Reference frames were received, but no structured analysis was returned.", "generation_brief_en": "Use the selected reference frames as visual evidence for a new consistent SKG ad subject pack. Keep neck and shoulder readability clear.", "trait_chips": ["同一主体", "服装统一", "肩颈清晰", "白底", "六视图"], "mode_options": ["形象锁定", "创意复刻", "元素混合", "自主描述"], "questions": ["你要保留原主体外形,还是只保留创意模式?", "是否需要改变人物年龄、性别、服装或风格?"], "warnings": ["模型分析失败时请用文字补充关键要求。"], } brief_en = str(data["generation_brief_en"]) return SubjectAgentAnalysis( model_bundle=bundle, model=model, source_frame_indices=clean_indices, summary_zh=str(data.get("summary_zh") or "").strip()[:1800], summary_en=str(data.get("summary_en") or "").strip()[:1800], generation_brief_en=brief_en[:2200], trait_chips=_list_of_strings(data.get("trait_chips"), 24), mode_options=_list_of_strings(data.get("mode_options"), 8), questions=_list_of_strings(data.get("questions"), 8), warnings=_list_of_strings(data.get("warnings"), 8), created_at=time.time(), ) _SUBJECT_AGENT_MODES: set[str] = {"realistic", "cartoon", "elements", "custom"} def _subject_agent_quantity_from_text(text: str, fallback: int) -> int: quantity = max(1, min(10, int(fallback or 6))) text = text or "" if re.fullmatch(r"\s*\d{1,2}\s*", text): return max(1, min(10, int(text.strip()))) digit_match = re.search(r"(\d{1,2})\s*(?:张|个|视图|张图|图|views?)", text, flags=re.I) if digit_match: return max(1, min(10, int(digit_match.group(1)))) cn_numbers = { "一": 1, "二": 2, "两": 2, "三": 3, "四": 4, "五": 5, "六": 6, "七": 7, "八": 8, "九": 9, "十": 10, } cn_match = re.search(r"([一二两三四五六七八九十])\s*(?:张|个|视图|张图|图)", text) if cn_match: return max(1, min(10, cn_numbers.get(cn_match.group(1), quantity))) return quantity def _subject_agent_mode_from_text(text: str, fallback: SubjectAgentMode = "custom") -> SubjectAgentMode: compact = re.sub(r"\s+", "", text or "").lower() if re.search(r"卡通|动画|插画|公仔|潮玩|二次元|cartoon|anime|illustration|toy|stylized", compact): return "cartoon" if re.search(r"创意复刻|创意模式|元素|参考创新|不像|换人|全新主体|全新人物|不同人|newperson|newactor|concept|element", compact): return "elements" if re.search(r"形象锁定|复刻这个人|复刻形象|同一主体|同一个人|保持这个人|保持原主体|完全复刻|source locked|same subject|sameperson", compact): return "realistic" if re.search(r"自主描述|只按文字|不依赖|不用参考|按描述|fromdescription|custom", compact): return "custom" return fallback def _subject_agent_mode_from_value(value: object, fallback: SubjectAgentMode) -> SubjectAgentMode: text = str(value or "").strip() return text if text in _SUBJECT_AGENT_MODES else fallback def _subject_agent_message_update(state: SubjectAgentState, user_message: str) -> tuple[str, str, str, int, list[str], SubjectAgentMode]: current_req = state.requirements_zh.strip() selected_traits = state.selected_traits[:20] quantity = _subject_agent_quantity_from_text(user_message, int(state.quantity or 6)) selected_mode = _subject_agent_mode_from_text(user_message, state.selected_mode) fallback_req = ";".join(part for part in [current_req, user_message.strip()] if part).strip(";") mode_label = { "realistic": "source-locked same visible subject reconstruction", "cartoon": "cartoon or stylized reconstruction", "elements": "creative element reconstruction with a different new subject", "custom": "custom description driven subject generation", }.get(selected_mode, "custom description driven subject generation") fallback_prompt = _ensure_english( "Subject image generation requirements: " + (fallback_req or "create a consistent SKG ad subject pack") + f". Direction mode: {mode_label}." + f" Generate exactly {quantity} separate views." + ". Keep one identity and one outfit bible across all generated views. " + (f"Selected traits: {', '.join(selected_traits)}." if selected_traits else "") ) if not LLM_API_KEY: return "已记录这条生图要求。继续补充要保留/删除的元素,确认后我会按当前要求生成。", fallback_req, fallback_prompt, quantity, selected_traits, selected_mode system = ( "You are an SKG subject image-generation requirements agent. Your scope is only image generation for a subject view pack. " "Do not answer unrelated video, audio, download, coding, copywriting, or general chat requests; redirect to subject image requirements. " "Normalize the user's fuzzy Chinese request into precise generation constraints. " "Infer selected_mode from the conversation. Allowed selected_mode values are realistic, cartoon, elements, custom. " "Use realistic when the user wants to lock or replicate the visible reference subject; cartoon for stylized/cartoon/toy/illustration; " "elements when the user wants the creative logic but a different new subject; custom when the user wants free text generation without relying on references. " "Infer quantity from Chinese or English requests such as 4张, 六视图, generate 8 views. " "Return strict JSON with keys: assistant_message_zh, updated_requirements_zh, generation_prompt_en, quantity, selected_traits, selected_mode. " "generation_prompt_en must be English and must enforce: one consistent identity, one consistent outfit bible, neck/shoulder readability, no text/watermarks/UI, and legal-safe reconstruction." ) user_payload = { "analysis": state.analysis.model_dump() if state.analysis else None, "current_requirements_zh": current_req, "current_generation_prompt_en": state.generation_prompt_en, "current_quantity": quantity, "selected_mode": state.selected_mode, "selected_traits": selected_traits, "recent_messages": [m.model_dump() for m in state.messages[-8:]], "user_message": user_message, } try: resp = llm().chat.completions.create( model=_subject_agent_model(state.model_bundle), messages=[ {"role": "system", "content": system}, {"role": "user", "content": json.dumps(user_payload, ensure_ascii=False)}, ], response_format={"type": "json_object"}, temperature=0.2, max_tokens=1200, ) raw = (resp.choices[0].message.content or "").strip() match = re.search(r"\{[\s\S]*\}", raw) data = json.loads(match.group(0) if match else raw) assistant = str(data.get("assistant_message_zh") or "已记录这条生图要求。").strip()[:1200] updated_req = str(data.get("updated_requirements_zh") or fallback_req).strip()[:2200] prompt_en = _ensure_english(str(data.get("generation_prompt_en") or fallback_prompt).strip())[:2600] out_quantity = _subject_agent_quantity_from_text(str(data.get("quantity") or ""), quantity) out_traits = _list_of_strings(data.get("selected_traits"), 24) or selected_traits out_mode = _subject_agent_mode_from_value(data.get("selected_mode"), selected_mode) return assistant, updated_req, prompt_en, out_quantity, out_traits, out_mode except Exception as e: print(f"[subject agent message failed] bundle={state.model_bundle} error={e}", flush=True) return "已先按本地规则记录这条要求;模型回复失败时仍可直接生成。", fallback_req, fallback_prompt, quantity, selected_traits, selected_mode # ---------- API 路由 ---------- class CreateJobReq(BaseModel): url: str class SubjectAgentAnalyzeReq(BaseModel): model_config = ConfigDict(protected_namespaces=()) model_bundle: SubjectModelBundle = "gpt" source_frame_indices: list[int] = Field(default_factory=list) class SubjectAgentMessageReq(BaseModel): model_config = ConfigDict(protected_namespaces=()) model_bundle: SubjectModelBundle = "gpt" source_frame_indices: list[int] = Field(default_factory=list) selected_mode: SubjectAgentMode = "custom" selected_traits: list[str] = Field(default_factory=list) requirements_zh: str = "" message: str = "" quantity: int = 6 class TranslateReq(BaseModel): text: str target: Literal["en", "zh"] = "en" class CreativeCopyReq(BaseModel): goal: str product: str = "" audience: str = "" platform: str = "TikTok / Reels" tone: str = "direct" seconds: int = 20 source_text: str = "" class CreativeCopyVariant(BaseModel): title: str = "" hook_zh: str = "" script_zh: str = "" script_en: str = "" image_prompt_en: str = "" video_prompt_en: str = "" caption_zh: str = "" hashtags: list[str] = Field(default_factory=list) class CreativeCopyResp(BaseModel): model: str variants: list[CreativeCopyVariant] class PromptPolishReq(BaseModel): text: str system_prompt: str = "" mode: Literal["image", "video", "general", "chat"] = "image" target_language: Literal["en", "zh", "keep"] = "en" class PromptPolishResp(BaseModel): model: str text: str class ScriptRewriteSegmentReq(BaseModel): index: int start: float = 0.0 end: float = 0.0 role: str = "" source: str = "" current_text: str = "" class RewriteStoryboardScriptReq(BaseModel): mode: Literal["segment", "all"] = "segment" author_intent: str = "" segments: list[ScriptRewriteSegmentReq] = Field(default_factory=list) _TRANSLATION_CACHE: dict[str, str] = {} def _contains_cjk(text: str) -> bool: return bool(re.search(r"[\u3400-\u9fff]", text or "")) def _translate_text_sync(text: str, target: Literal["en", "zh"] = "en", *, max_tokens: int = 700) -> str: text = (text or "").strip() if not text or not LLM_API_KEY: return text target_label = "English" if target == "en" else "Simplified Chinese" prompt = ( f"Translate the following creative generation planning text into concise natural {target_label}. " "Preserve concrete product, camera, subject, timing, and structure details. " "Do not add commentary, markdown, quotes, or explanations.\n\n" f"Input:\n{text}" ) resp = llm().chat.completions.create( model=TRANSLATE_MODEL, messages=[{"role": "user", "content": prompt}], temperature=0.15, max_tokens=max_tokens, ) out = (resp.choices[0].message.content or "").strip() if not out: rc = getattr(resp.choices[0].message, "reasoning_content", "") or "" if rc: out = rc.strip().splitlines()[-1].strip() return re.sub(r'^[\'"「『]+|[\'"」』]+$', "", out).strip() or text def _ensure_english(text: str) -> str: text = (text or "").strip() if not text or not _contains_cjk(text): return text key = hashlib.sha256(("en\0" + text).encode("utf-8")).hexdigest() cached = _TRANSLATION_CACHE.get(key) if cached: return cached try: translated = _translate_text_sync(text, "en", max_tokens=max(700, min(3500, len(text) // 2 + 900))) _TRANSLATION_CACHE[key] = translated return translated except Exception as e: print(f"[ensure english fallback] {e}", flush=True) return text def _creative_copy_fallback(req: CreativeCopyReq) -> CreativeCopyResp: goal = req.goal.strip() or "展示 SKG 产品的核心卖点" product = req.product.strip() or "SKG 健康科技产品" seconds = max(6, min(60, int(req.seconds or 20))) script_zh = ( f"开场 0-3 秒:直接展示{product}和使用场景,提出一个具体痛点。\n" f"中段 3-{max(4, seconds - 5)} 秒:用三个连续镜头说明{goal},画面保持产品清晰可见。\n" f"结尾 {max(4, seconds - 5)}-{seconds} 秒:给出一句明确行动口播,收在产品近景。" ) script_en = _ensure_english(script_zh) image_prompt = _ensure_english( f"{product}, premium health-tech product advertising image, clean lifestyle scene, clear product visibility, natural lighting, vertical composition" ) video_prompt = _ensure_english( f"{seconds}-second vertical short video ad for {product}. {goal}. Start with the product in use, show one clear benefit, keep camera motion smooth, realistic lifestyle lighting, no medical treatment claims." ) return CreativeCopyResp( model="fallback", variants=[ CreativeCopyVariant( title="快速成片版", hook_zh=f"{product},把一个日常痛点变成一个清楚的使用理由。", script_zh=script_zh, script_en=script_en, image_prompt_en=image_prompt, video_prompt_en=video_prompt, caption_zh=f"{product}|{goal}", hashtags=["#SKG", "#健康科技", "#短视频广告"], ) ], ) def _parse_creative_copy_response(raw: str, req: CreativeCopyReq) -> CreativeCopyResp: text = (raw or "").strip() text = re.sub(r"^```(?:json)?\s*", "", text, flags=re.I).strip() text = re.sub(r"\s*```$", "", text).strip() match = re.search(r"\{[\s\S]*\}", text) json_text = match.group(0) if match else text try: data = json.loads(json_text) except Exception: return _creative_copy_fallback(req) raw_items = data.get("variants") if isinstance(data, dict) else None if not isinstance(raw_items, list): return _creative_copy_fallback(req) variants: list[CreativeCopyVariant] = [] for item in raw_items[:3]: if not isinstance(item, dict): continue hashtags = item.get("hashtags") or [] if not isinstance(hashtags, list): hashtags = [] variants.append(CreativeCopyVariant( title=str(item.get("title") or "").strip()[:80], hook_zh=str(item.get("hook_zh") or "").strip()[:180], script_zh=str(item.get("script_zh") or "").strip()[:900], script_en=_ensure_english(str(item.get("script_en") or item.get("script_zh") or "").strip())[:1200], image_prompt_en=_ensure_english(str(item.get("image_prompt_en") or "").strip())[:1200], video_prompt_en=_ensure_english(str(item.get("video_prompt_en") or "").strip())[:1400], caption_zh=str(item.get("caption_zh") or "").strip()[:240], hashtags=[str(tag).strip()[:40] for tag in hashtags if str(tag).strip()][:8], )) if not variants: return _creative_copy_fallback(req) return CreativeCopyResp(model=REWRITE_MODEL if LLM_API_KEY else "fallback", variants=variants) _NO_PERSON_INTENT_RE = re.compile( r"(" r"无人|无人物|没有人|没有人物|不要人|不要人物|不出现人|不出现人物|无行人|无路人|空无一人|" r"no\s+(?:people|person|human|humans|faces?|characters?|crowd|bystanders)|" r"without\s+(?:people|person|human|humans|faces?|characters?|crowd|bystanders)|" r"empty\s+(?:scene|street|room|space)" r")", re.I, ) _PERSON_INTENT_RE = re.compile( r"(" r"人物|人像|真人|模特|男生|女生|男人|女人|男士|女士|女孩|男孩|少年|少女|青年|" r"主播|演员|艺人|角色|数字人|虚拟人|博主|网红|行人|路人|人群|背影|侧脸|正脸|" r"面部|脸部|脸|表情|肖像|全身|半身|" r"\b(?:person|people|human|humans|model|woman|man|girl|boy|male|female|actor|actress|" r"presenter|host|influencer|character|avatar|portrait|face|facial|full[- ]body|crowd|" r"pedestrian|bystander|passerby)\b" r")", re.I, ) def _prompt_has_person_intent(*parts: str) -> bool: text = "\n".join(part for part in parts if part).strip() if not text or _NO_PERSON_INTENT_RE.search(text): return False return bool(_PERSON_INTENT_RE.search(text)) def _prompt_person_guard(req: PromptPolishReq) -> str: if req.mode not in {"image", "video", "general"}: return "" if _prompt_has_person_intent(req.text, req.system_prompt): return ( "The user requested a person, portrait, model, or character subject. " "Describe any such subject as a fully fictional synthetic AI character or virtual avatar, " "not based on any real person, celebrity, public figure, or identifiable private individual. " "Avoid real-person likeness, biometric identity, endorsement, or impersonation.\n" ) return ( "The user did not request a person or character subject. Preserve the original object-only, " "scene-only, or product-only composition. Do not introduce people, faces, bodies, hands, " "avatars, characters, crowds, bystanders, or human silhouettes.\n" ) def _prompt_polish_fallback(req: PromptPolishReq) -> PromptPolishResp: text = req.text.strip() base = _ensure_english(text) if req.target_language == "en" else text base = re.sub(r"\s+", " ", base).strip() base = re.sub(r"[。.!!??]+$", "", base).strip() person_intent = _prompt_has_person_intent(req.text, req.system_prompt) person_guard = ( " Use a fully fictional synthetic AI character, not based on any real person, celebrity, public figure, or identifiable private individual." if person_intent else " Preserve the original object-only, scene-only, or product-only composition; do not introduce people, faces, bodies, hands, avatars, characters, crowds, bystanders, or human silhouettes." ) if req.mode == "video": polished = ( f"{base}. Smooth camera movement, clear subject continuity, stable composition, " f"natural motion, coherent lighting, no subtitles, no watermark.{person_guard}" ) elif req.mode in {"general", "chat"}: polished = base else: polished = ( f"{base}. Detailed visual prompt, clear main subject, coherent composition, " f"natural lighting, refined color palette, high-quality details.{person_guard}" ) return PromptPolishResp(model="fallback", text=polished[:1800]) @app.post("/prompt/polish", response_model=PromptPolishResp) def polish_prompt(req: PromptPolishReq) -> PromptPolishResp: text = req.text.strip() if not text: raise HTTPException(400, "text required") if not LLM_API_KEY: return _prompt_polish_fallback(req) target_label = { "en": "English", "zh": "Simplified Chinese", "keep": "the same language as the input", }.get(req.target_language, "English") mode_hint = { "image": "an image-generation prompt", "video": "a video-generation prompt", "general": "a concise generation prompt", "chat": "a professional response to the user's request", }.get(req.mode, "an image-generation prompt") user_system = req.system_prompt.strip() prompt = ( f"Rewrite the user's input into {mode_hint} in {target_label}.\n" "Preserve the user's actual subject, brand, product, place, style, and intent.\n" "Do not add SKG, health-tech, massage products, TikTok ad framing, product sales language, hashtags, captions, or any brand/product not explicitly present in the input or user-selected guidance.\n" "Do not add medical, wellness, or advertising claims unless the user asked for them.\n" "Improve concrete visual details, composition, lighting, camera language, materials, mood, and quality.\n" "Return only the rewritten prompt. No markdown, labels, JSON, quotes, explanation, or alternatives.\n" f"{_prompt_person_guard(req)}" ) if req.mode == "chat": prompt = ( f"Answer or rewrite the user's request professionally in {target_label}.\n" "Follow the user-selected guidance when provided.\n" "Do not add SKG, health-tech, massage products, TikTok ad framing, product sales language, hashtags, captions, or any brand/product not explicitly present in the input or user-selected guidance.\n" "Do not add medical, wellness, or advertising claims unless the user asked for them.\n" "Return only the final content in the format requested by the guidance. No markdown fences, labels, explanation, or alternatives unless explicitly requested.\n" ) if req.mode == "video": prompt += ( "For video, describe motion, timing, camera movement, continuity, and what changes over time. " "Do not add people for scale, atmosphere, lifestyle context, or background decoration unless the input explicitly asked for people.\n" ) if user_system: prompt += f"\nUser-selected polishing guidance:\n{user_system[:1000]}\n" prompt += f"\nInput:\n{text[:2500]}" try: resp = llm().chat.completions.create( model=REWRITE_MODEL, messages=[ {"role": "system", "content": "You are a neutral professional prompt editor. You preserve intent and never inject unrelated brands or products."}, {"role": "user", "content": prompt}, ], temperature=0.45, max_tokens=900, ) out = (resp.choices[0].message.content or "").strip() out = re.sub(r"^```(?:text)?\s*", "", out, flags=re.I).strip() out = re.sub(r"\s*```$", "", out).strip() out = re.sub(r'^[\'"「『]+|[\'"」』]+$', "", out).strip() return PromptPolishResp(model=REWRITE_MODEL, text=(out or _prompt_polish_fallback(req).text)[:1800]) except Exception as e: print(f"[prompt polish fallback] {e}", flush=True) return _prompt_polish_fallback(req) @app.post("/translate") def translate_text(req: TranslateReq) -> dict: """单条文本翻译(给生图自定义提取元素 zh→en 用)""" import re as _re text = req.text.strip() if not text: return {"text": ""} if not LLM_API_KEY: raise HTTPException(503, "LLM_API_KEY 未配置") target_label = "English" if req.target == "en" else "Simplified Chinese" prompt = ( f"Translate the following text into concise {target_label}, suitable as an element " "label in an image-generation prompt. Output only the translation itself — no quotes, " "no punctuation, no explanation, no markdown.\n\n" f"Input: {text}" ) try: resp = llm().chat.completions.create( model=TRANSLATE_MODEL, messages=[{"role": "user", "content": prompt}], temperature=0.2, max_tokens=200, ) out = (resp.choices[0].message.content or "").strip() if not out: rc = getattr(resp.choices[0].message, "reasoning_content", "") or "" if rc: out = rc.strip().splitlines()[-1].strip() out = _re.sub(r'^[\'"「『]+|[\'"」』]+$', "", out).strip() return {"text": out} except Exception as e: raise HTTPException(500, f"translate failed: {e}") @app.post("/creative/copy", response_model=CreativeCopyResp) def generate_creative_copy(req: CreativeCopyReq) -> CreativeCopyResp: goal = req.goal.strip() if not goal: raise HTTPException(400, "goal required") if not LLM_API_KEY: return _creative_copy_fallback(req) seconds = max(6, min(60, int(req.seconds or 20))) prompt = ( "You are creating practical short-form ad material for an SKG AI creative tool. " "Return strict JSON only. Create 3 distinct variants that can be pasted directly into image/video generation. " "Avoid medical treatment claims; describe comfort, relaxation, daily use, visual proof, and product clarity instead. " "Every variant must include title, hook_zh, script_zh, script_en, image_prompt_en, video_prompt_en, caption_zh, hashtags.\n\n" f"Goal: {goal}\n" f"Product: {req.product.strip() or 'SKG health-tech product'}\n" f"Audience: {req.audience.strip() or 'short-form shoppers'}\n" f"Platform: {req.platform.strip() or 'TikTok / Reels'}\n" f"Tone: {req.tone.strip() or 'direct'}\n" f"Length: {seconds}s\n" f"Source/reference text:\n{req.source_text.strip()[:1500]}" ) try: resp = llm().chat.completions.create( model=REWRITE_MODEL, messages=[ {"role": "system", "content": "Return valid JSON only. No markdown. No explanation."}, {"role": "user", "content": prompt}, ], response_format={"type": "json_object"}, temperature=0.72, max_tokens=2200, ) return _parse_creative_copy_response(resp.choices[0].message.content or "", req) except Exception as e: print(f"[creative copy fallback] {e}", flush=True) return _creative_copy_fallback(req) def _fallback_script_rewrite_item(segment: ScriptRewriteSegmentReq, author_intent: str = "") -> dict: source = (segment.source or "").strip() intent = _ensure_english(author_intent or "") role = segment.role or "" templates = { "hook": "Have you noticed that after hours of looking down, your neck and shoulders complain before you do?", "pain": "Phone scrolling, desk work, and commuting can keep your neck and shoulders tight all day.", "proof": "An SKG wearable massager sits around the neck and shoulders, bringing warm, rhythmic comfort to the spots that feel tense.", "solution": "This beat can simply show pick up, wear, fit, and relax, so the product enters a normal daily routine.", "cta": "If you want neck-and-shoulder relaxation to become a daily habit, start with this SKG massager.", "bridge": "Follow the source rhythm, but land this line in one specific neck-and-shoulder use moment.", } rewritten = templates.get(role, templates["bridge"]) if source and role not in {"hook", "cta"}: rewritten = f"{rewritten} Keep the source sentence rhythm, but replace the content with SKG wearing and relaxation experience." if intent: rewritten = f"{rewritten} Adjust the tone based on the creator note: {intent[:90]}." try: zh = _translate_text_sync(rewritten, "zh", max_tokens=260) if LLM_API_KEY else "" except Exception: zh = "" return {"index": segment.index, "text": rewritten[:260], "text_zh": zh} def _parse_script_rewrite_items(raw: str, requested: list[ScriptRewriteSegmentReq], author_intent: str = "") -> list[dict]: text = (raw or "").strip() text = re.sub(r"^```(?:json)?\s*", "", text, flags=re.I).strip() text = re.sub(r"\s*```$", "", text).strip() match = re.search(r"\{[\s\S]*\}", text) json_text = match.group(0) if match else text try: data = json.loads(json_text) except Exception: return [_fallback_script_rewrite_item(segment, author_intent) for segment in requested] raw_items = data.get("items") if isinstance(data, dict) else data if not isinstance(raw_items, list): raw_items = [] by_index: dict[int, tuple[str, str]] = {} for item in raw_items: if not isinstance(item, dict): continue try: idx = int(item.get("index")) except Exception: continue value = str(item.get("text") or item.get("rewritten_text") or "").strip() value_zh = str(item.get("text_zh") or item.get("rewritten_text_zh") or "").strip() if value: by_index[idx] = (re.sub(r"\s+", " ", value).strip()[:260], re.sub(r"\s+", " ", value_zh).strip()[:260]) items = [] for segment in requested: fallback = _fallback_script_rewrite_item(segment, author_intent) text, text_zh = by_index.get(segment.index, ("", "")) if text and not text_zh: try: text_zh = _translate_text_sync(text, "zh", max_tokens=260) if LLM_API_KEY else "" except Exception: text_zh = "" items.append({"index": segment.index, "text": text or fallback["text"], "text_zh": text_zh or fallback.get("text_zh", "")}) return items def _rewrite_storyboard_script_sync(req: RewriteStoryboardScriptReq) -> list[dict]: segments = [segment for segment in req.segments if (segment.source or segment.current_text).strip()] if not segments: return [] author_intent = _ensure_english(req.author_intent or "") if not LLM_API_KEY: return [_fallback_script_rewrite_item(segment, author_intent) for segment in segments] payload = [ { "index": segment.index, "time": f"{segment.start:.1f}-{segment.end:.1f}s", "role": segment.role, "source_reference": _ensure_english(segment.source), "current_voiceover": _ensure_english(segment.current_text), } for segment in segments ] prompt = ( "You are an information-feed ad voice-over rewrite specialist. Rewrite each segment into a new ENGLISH SKG neck-and-shoulder massager voice-over line while preserving the source rhythm and information structure.\n" "Hard rules:\n" "1. The main text field must be English short-video VO. No stage directions, no quotes.\n" "2. Do not translate word-for-word. Do not keep the original brand, price, discount code, platform CTA, or exact claims; only reuse rhythm, hook, pain-point, proof, and conversion structure.\n" "3. The product is a U-shaped neck-and-shoulder wearable massager worn around the neck. Express neck/shoulder tension, desk posture, looking down, warmth, kneading-like comfort, wearing, relaxation, and daily use.\n" "4. Avoid medical treatment, cure, pain elimination, clinical, or disease claims.\n" "5. Keep each segment short enough for its time range and natural for a creator voice.\n" "6. If mode=all, make the whole piece coherent; if mode=segment, rewrite only the given segment while matching the broader style.\n" "7. Also return a Simplified Chinese mirror for team review in text_zh; it is not for model prompts.\n" f"Creator note: {author_intent or 'No extra note; follow the source pacing and turn it into natural SKG product VO.'}\n" f"Rewrite mode: {req.mode}\n" f"SKG product context: {_ensure_english(AUDIO_PRODUCT_BRIEF)}\n\n" "Input segments JSON:\n" + json.dumps(payload, ensure_ascii=False) + '\n\nReturn strict JSON only: {"items":[{"index":0,"text":"rewritten English VO","text_zh":"中文镜像"}]}' ) models = [] for model in [AUDIO_REWRITE_MODEL, ASR_FALLBACK_MODEL, TRANSLATE_MODEL]: if model and model not in models: models.append(model) for model in models: try: resp = llm().chat.completions.create( model=model, messages=[ {"role": "system", "content": "Return valid JSON only. No markdown. No explanation."}, {"role": "user", "content": prompt}, ], response_format={"type": "json_object"}, temperature=0.68 if req.mode == "all" else 0.62, max_tokens=max(900, min(5000, 180 * len(segments) + 500)), ) message = resp.choices[0].message raw = (message.content or getattr(message, "reasoning_content", "") or "").strip() items = _parse_script_rewrite_items(raw, segments, author_intent) if any((item.get("text") or "").strip() for item in items): return items except Exception as e: print(f"[script rewrite fallback] {model}: {e}", flush=True) continue return [_fallback_script_rewrite_item(segment, author_intent) for segment in segments] @app.post("/jobs/{job_id}/script/rewrite") def rewrite_storyboard_script(job_id: str, req: RewriteStoryboardScriptReq) -> dict: if job_id not in JOBS: raise HTTPException(404, "job not found") return {"items": _rewrite_storyboard_script_sync(req)} @app.get("/health") def health() -> dict: return { "ok": True, "llm_configured": bool(LLM_API_KEY), "auth_configured": WEB_AUTH_CONFIGURED, "auth_modes": { "password": PASSWORD_AUTH_CONFIGURED, "feishu": FEISHU_AUTH_CONFIGURED, "data_isolation": AUTH_DATA_ISOLATION_ENABLED, }, "database": db.health(), "base_url": LLM_BASE_URL or "openai-default", "asr_base_url": ASR_BASE_URL or LLM_BASE_URL or "openai-default", "image_base_url": IMAGE_BASE_URL or LLM_BASE_URL or "openai-default", "voice_base_url": AZURE_OPENAI_BASE_URL, "models": { "asr": ASR_MODEL, "asr_language": _asr_language_label(), "asr_base_url": ASR_BASE_URL or LLM_BASE_URL or "openai-default", "asr_remote_enabled": ASR_REMOTE_ENABLED, "asr_local_fallback_enabled": ASR_LOCAL_FALLBACK_ENABLED, "asr_audio_fallback_enabled": ASR_AUDIO_FALLBACK_ENABLED, "faster_whisper": FASTER_WHISPER_MODEL, "local_asr": LOCAL_ASR_MODEL, "asr_fallback": ASR_FALLBACK_MODEL, "translate": TRANSLATE_MODEL, "rewrite": REWRITE_MODEL, "audio_rewrite": AUDIO_REWRITE_MODEL, "vision": VISION_MODEL, "product_view": PRODUCT_VIEW_MODEL, "image": IMAGE_MODEL, "image_base_url": IMAGE_BASE_URL or LLM_BASE_URL or "openai-default", "image_request_timeout_seconds": IMAGE_REQUEST_TIMEOUT_SECONDS, "image_options": image_model_options(), "image_size_options": image_size_options(), "ai_proxy_configured": bool(AI_HTTP_PROXY), "image_fallbacks": _image_fallback_models(), "image_circuit": _image_circuit_snapshot(), "subject_image": SUBJECT_ASSET_IMAGE_MODEL, "subject_image_fallbacks": SUBJECT_ASSET_IMAGE_MODELS, "voice_provider": VOICE_PROVIDER, "voice_base_url": AZURE_OPENAI_BASE_URL, "voice_tts": AZURE_TTS_MODEL, "voice_tts_paths": AZURE_TTS_PATHS, "voice_id": AZURE_TTS_VOICE_ID, "voice_pool": AZURE_TTS_VOICE_POOL, "voice_configured": bool(AZURE_OPENAI_API_KEY), "video": VIDEO_MODEL, "video_aliases": VIDEO_MODEL_ALIASES, "video_options": video_model_options(), "video_duration_options": video_duration_options(), "video_max_duration_seconds": max(video_duration_options()), "video_size_options": video_size_options(), "video_provider": video_provider_name(), "video_base_url": video_api_base(), "video_configured": bool(video_api_key()), "video_create_paths": VIDEO_CREATE_PATHS, }, } class JobSummary(BaseModel): id: str url: str owner_name: str = "" owner_email: str = "" owner_provider: str = "" status: JobStatus progress: int = 0 message: str = "" duration: float = 0.0 width: int = 0 height: int = 0 video_url: str = "" frame_count: int = 0 video_count: int = 0 thumbnail: str = "" error: str = "" mtime: float = 0.0 @app.get("/jobs", response_model=list[JobSummary]) def list_jobs(request: Request, limit: int | None = None) -> list[JobSummary]: """当前用户可见 job 的精简列表,按磁盘 state.json mtime 倒序(最新优先)。""" user = data_user_from_request(request) items: list[JobSummary] = [] for job_id, job in JOBS.items(): if not user_can_access_job(job, user): continue state_path = JOBS_DIR / job_id / "state.json" mtime = state_path.stat().st_mtime if state_path.exists() else 0.0 thumb = f"/jobs/{job_id}/frames/{job.frames[0].index}.jpg" if job.frames else "" items.append(JobSummary( id=job.id, url=job.url, owner_name=job.owner_name, owner_email=job.owner_email, owner_provider=job.owner_provider, status=job.status, progress=job.progress, message=job.message, duration=job.duration, width=job.width, height=job.height, video_url=job.video_url, frame_count=len(job.frames), video_count=len(job.generated_videos), thumbnail=thumb, error=job.error, mtime=mtime, )) items.sort(key=lambda s: s.mtime, reverse=True) if limit is not None and limit > 0: items = items[:limit] return items @app.post("/jobs", response_model=Job) async def create_job(req: CreateJobReq, bg: BackgroundTasks, request: Request) -> Job: if not req.url.strip(): raise HTTPException(400, "url required") user = data_user_from_request(request) job_id = uuid.uuid4().hex[:12] job = Job(id=job_id, url=req.url.strip()) assign_owner(job, user) JOBS[job_id] = job save_state(job) db.audit(user, "job.create", "job", job_id, {"url": job.url}, request) bg.add_task(pipeline_download, job_id) return job @app.post("/jobs/{job_id}/download/retry", response_model=Job) async def retry_job_download(job_id: str, bg: BackgroundTasks) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") source_kind = getattr(job, "source_kind", "") if source_kind == "upload" or job.url.startswith("upload://"): raise HTTPException(409, "uploaded videos cannot be redownloaded; upload the file again") if job.status in {"downloading", "splitting", "transcribing"}: raise HTTPException(409, f"job is busy: {job.status}") mp4 = job_dir(job_id) / "source.mp4" if mp4.exists() and mp4.stat().st_size == 0: mp4.unlink() update( job, status="downloading", progress=1, error="", message="重新提交下载…", video_url="", ) bg.add_task(pipeline_download, job_id) return job @app.post("/jobs/upload", response_model=Job) async def create_job_from_upload(bg: BackgroundTasks, request: Request, file: UploadFile = File(...)) -> Job: if not file.filename: raise HTTPException(400, "file required") ext = Path(file.filename).suffix.lower() if ext not in {".mp4", ".mov", ".webm", ".mkv", ".m4v"}: raise HTTPException(400, f"unsupported video format: {ext}") user = data_user_from_request(request) job_id = uuid.uuid4().hex[:12] d = job_dir(job_id) mp4 = d / "source.mp4" with mp4.open("wb") as f: while chunk := await file.read(1024 * 1024): f.write(chunk) if not mp4.exists() or mp4.stat().st_size == 0: raise HTTPException(500, "upload failed") job = Job(id=job_id, url=f"upload://{file.filename}") assign_owner(job, user) JOBS[job_id] = job save_state(job) db.audit(user, "job.upload", "job", job_id, {"filename": file.filename}, request) bg.add_task(pipeline_download, job_id) return job def _write_creative_reference_frame(job_id: str, file_bytes: bytes | None = None) -> tuple[int, int]: frames_dir = job_dir(job_id) / "frames" frames_dir.mkdir(parents=True, exist_ok=True) out = frames_dir / "000.jpg" if file_bytes: try: with Image.open(io.BytesIO(file_bytes)) as raw: im = ImageOps.exif_transpose(raw).convert("RGB") im.thumbnail((1600, 1600), Image.LANCZOS) width, height = im.size im.save(out, "JPEG", quality=92) return width, height except Exception as e: raise HTTPException(400, f"invalid image file: {e}") im = Image.new("RGB", (1024, 1024), (246, 248, 246)) im.save(out, "JPEG", quality=92) return im.size @app.post("/creative/jobs/image", response_model=Job) async def create_creative_image_job(request: Request) -> Job: user = data_user_from_request(request) job_id = uuid.uuid4().hex[:12] file_bytes: bytes | None = None source_label = "blank" file: UploadFile | None = None content_type = request.headers.get("content-type", "").lower() if "multipart/form-data" in content_type: content_length = request.headers.get("content-length", "0") if "boundary=" not in content_type and content_length in {"", "0"}: file = None else: try: form = await request.form() except Exception as e: raise HTTPException(400, f"invalid multipart body: {e}") maybe_file = form.get("file") if getattr(maybe_file, "filename", "") and hasattr(maybe_file, "read"): file = maybe_file if file and file.filename: ext = Path(file.filename).suffix.lower() if ext not in {".jpg", ".jpeg", ".png", ".webp"}: raise HTTPException(400, f"unsupported image format: {ext}") file_bytes = await file.read() source_label = file.filename width, height = _write_creative_reference_frame(job_id, file_bytes) frame = KeyFrame(index=0, timestamp=0, url=f"/jobs/{job_id}/frames/0.jpg") job = Job( id=job_id, url=f"creative://{source_label}", status="frames_extracted", progress=100, message="创作任务已就绪", width=width, height=height, duration=0, frames=[frame], ) assign_owner(job, user) JOBS[job_id] = job save_state(job) db.audit(user, "creative_job.create", "job", job_id, {"source": source_label}, request) return job @app.post("/jobs/{job_id}/analyze", response_model=Job) async def trigger_analyze( job_id: str, bg: BackgroundTasks, frames: int = KEYFRAME_COUNT, target: FrameExtractTarget = "transparent_human", mode: FrameExtractMode = "replace", quality: FrameExtractQuality = "auto", ) -> Job: global ANALYZE_WORKER_RUNNING job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") if job.status not in {"downloaded", "frames_extracted", "transcribed", "transcribing", "failed"}: raise HTTPException(409, f"status must be downloaded/transcribing/failed, got {job.status}") ANALYZE_QUEUE.append((job_id, frames, target, mode, quality)) position = len(ANALYZE_QUEUE) update( job, status="splitting", progress=30, error="", message="排队等待抽帧" if ANALYZE_WORKER_RUNNING or position > 1 else "准备抽帧…", ) if not ANALYZE_WORKER_RUNNING: ANALYZE_WORKER_RUNNING = True bg.add_task(analyze_queue_worker) return job @app.post("/jobs/{job_id}/frames", response_model=Job) def add_manual_frame(job_id: str, t: float) -> Job: """从指定时间戳手动抽 1 帧追加到 job.frames""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") if not job.video_url: raise HTTPException(400, "video not ready") d = job_dir(job_id) mp4 = d / "source.mp4" if not mp4.exists(): raise HTTPException(400, "source.mp4 missing") frames_dir = d / "frames" frames_dir.mkdir(parents=True, exist_ok=True) # 新 index:max(existing)+1(即使列表已按 ts 排序,文件名用 index 保持稳定) next_idx = max((f.index for f in job.frames), default=-1) + 1 out = frames_dir / f"{next_idx:03d}.jpg" try: run([ "ffmpeg", "-y", "-ss", str(t), "-i", str(mp4), "-frames:v", "1", "-pix_fmt", "yuvj420p", "-q:v", "3", str(out), ]) except RuntimeError as e: raise HTTPException(500, f"ffmpeg failed: {e}") new_frame = KeyFrame( index=next_idx, timestamp=round(float(t), 2), url=f"/jobs/{job_id}/frames/{next_idx}.jpg", ) merged = sorted(list(job.frames) + [new_frame], key=lambda f: f.timestamp) update(job, frames=merged, message=f"已手动加帧({t:.1f}s),共 {len(merged)} 张") return job @app.post("/jobs/{job_id}/frames/upload", response_model=Job) async def upload_reference_frame(job_id: str, file: UploadFile = File(...)) -> Job: """把用户拖入的图片保存为一张参考帧,供转换层和主体生成复用。""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") content_type = (file.content_type or "").lower() suffix = Path(file.filename or "").suffix.lower() if content_type and not content_type.startswith("image/"): raise HTTPException(400, "only image uploads are supported") if not content_type and suffix not in {".jpg", ".jpeg", ".png", ".webp", ".bmp"}: raise HTTPException(400, "only image uploads are supported") d = job_dir(job_id) frames_dir = d / "frames" frames_dir.mkdir(parents=True, exist_ok=True) next_idx = max((f.index for f in job.frames), default=-1) + 1 tmp = frames_dir / f"{next_idx:03d}.upload" out = frames_dir / f"{next_idx:03d}.jpg" try: await _save_upload_to_path(file, tmp) with Image.open(tmp) as raw: img = ImageOps.exif_transpose(raw).convert("RGB") img.thumbnail((2400, 2400), Image.LANCZOS) img.save(out, "JPEG", quality=92, optimize=True) except Exception as e: try: out.unlink() except OSError: pass raise HTTPException(400, f"reference image upload failed: {e}") finally: try: tmp.unlink() except OSError: pass next_timestamp = max((float(f.timestamp) for f in job.frames), default=float(job.duration or 0)) + 0.01 new_frame = KeyFrame( index=next_idx, timestamp=round(next_timestamp, 2), url=f"/jobs/{job_id}/frames/{next_idx}.jpg", description={ "scene": "用户拖入的转换层参考图", "objects": [], "style": "uploaded reference image", "suggested_prompt": "", }, ) merged = sorted(list(job.frames) + [new_frame], key=lambda f: f.timestamp) update(job, frames=merged, message=f"已加入上传参考图,共 {len(merged)} 张") return job @app.get("/jobs/{job_id}", response_model=Job) def get_job(job_id: str) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") return job_with_artifacts(job) @app.post("/jobs/{job_id}/subject-agent/analyze", response_model=Job) def analyze_subject_agent(job_id: str, req: SubjectAgentAnalyzeReq) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") source_indices = [idx for idx in req.source_frame_indices if any(frame.index == idx for frame in job.frames)][:8] if not source_indices: raise HTTPException(400, "source_frame_indices required") analysis = _subject_agent_analysis(job_id, source_indices, req.model_bundle) state = job.subject_agent.model_copy(deep=True) assistant_text = ( f"我已用 {req.model_bundle.upper()} 套件分析这些参考帧。" "接下来直接告诉我要复刻形象、卡通化、参考创意换新人,还是只按文字生成;数量、风格、服装和人物大小也都写在对话里。" ) messages = (state.messages + [SubjectAgentMessage(role="assistant", content=assistant_text, created_at=time.time())])[-30:] state = state.model_copy(update={ "model_bundle": req.model_bundle, "source_frame_indices": source_indices, "analysis": analysis, "messages": messages, "generation_prompt_en": analysis.generation_brief_en, "selected_traits": analysis.trait_chips[:6], "updated_at": time.time(), }) update(job, subject_agent=state, message="转换层分析完成") return job_with_artifacts(job) @app.post("/jobs/{job_id}/subject-agent/message", response_model=Job) def message_subject_agent(job_id: str, req: SubjectAgentMessageReq) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") state = job.subject_agent.model_copy(deep=True) source_indices = [idx for idx in req.source_frame_indices if any(frame.index == idx for frame in job.frames)][:8] fallback_mode = req.selected_mode or state.selected_mode state = state.model_copy(update={ "model_bundle": req.model_bundle, "source_frame_indices": source_indices or state.source_frame_indices, "selected_mode": fallback_mode, "selected_traits": [str(item).strip()[:80] for item in req.selected_traits if str(item).strip()][:24], "requirements_zh": req.requirements_zh.strip()[:2200] or state.requirements_zh, "quantity": max(1, min(10, int(req.quantity or state.quantity or 6))), }) user_message = req.message.strip() if not user_message: user_message = state.requirements_zh or "按当前设置准备主体套图生成要求" assistant_text, requirements_zh, prompt_en, quantity, selected_traits, selected_mode = _subject_agent_message_update(state, user_message) messages = ( state.messages + [SubjectAgentMessage(role="user", content=user_message, created_at=time.time())] + [SubjectAgentMessage(role="assistant", content=assistant_text, created_at=time.time())] )[-30:] state = state.model_copy(update={ "requirements_zh": requirements_zh, "generation_prompt_en": prompt_en, "selected_mode": selected_mode, "quantity": quantity, "selected_traits": selected_traits, "messages": messages, "updated_at": time.time(), }) update(job, subject_agent=state, message="转换层生图要求已更新") return job_with_artifacts(job) @app.delete("/jobs/{job_id}") def delete_job(job_id: str) -> dict[str, bool | str]: d = (JOBS_DIR / job_id).resolve() if JOBS_DIR not in d.parents: raise HTTPException(400, "invalid job id") job = JOBS.pop(job_id, None) if not job and not d.exists(): raise HTTPException(404, "job not found") if d.exists(): shutil.rmtree(d) return {"ok": True, "id": job_id} @app.post("/jobs/{job_id}/transcribe", response_model=Job) async def trigger_transcribe(job_id: str, bg: BackgroundTasks) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") mp4 = job_dir(job_id) / "source.mp4" if job.status in {"created", "downloading"} or not mp4.exists(): raise HTTPException(409, f"video not ready, got {job.status}") if job.audio_script.status == "rewriting" or job_id in AUDIO_WORKERS_RUNNING: raise HTTPException(409, f"job is busy, got {job.status}") manage_job_status = job.status != "splitting" audio_payload = AudioScript( status="rewriting", speaker_profile="正在分析原音频讲话人和口播节奏…", rhythm_profile="正在按原音频时长、语速和停顿分析口播节奏…", background_audio_profile="正在分析背景音乐、环境声和音效…", product_brief=AUDIO_PRODUCT_BRIEF, rewrite_model=ASR_FALLBACK_MODEL, ) if manage_job_status: update(job, status="transcribing", progress=max(45, min(job.progress, 70)), error="", message="准备提取音频…", audio_script=audio_payload) else: update(job, error="", audio_script=audio_payload) if not start_audio_processing(job_id, manage_job_status=manage_job_status): update(job, message="音频已在处理中") return job_with_artifacts(job) @app.get("/jobs/{job_id}/video.mp4") def get_video(job_id: str): p = job_dir(job_id) / "source.mp4" if not p.exists(): raise HTTPException(404, "video not found") return FileResponse(p, media_type="video/mp4") @app.get("/jobs/{job_id}/audio.wav") def get_source_audio(job_id: str): p = job_dir(job_id) / "audio.wav" if not p.exists(): raise HTTPException(404, "audio not found") return FileResponse(p, media_type="audio/wav") @app.get("/jobs/{job_id}/audio-script.mp3") def get_audio_script(job_id: str): p = job_dir(job_id) / "audio_script.mp3" if not p.exists(): raise HTTPException(404, "audio script not found") return FileResponse(p, media_type="audio/mpeg") @app.get("/jobs/{job_id}/frames/{idx}.jpg") def get_frame(job_id: str, idx: int): p = job_dir(job_id) / "frames" / f"{idx:03d}.jpg" if not p.exists(): raise HTTPException(404, "frame not found") return FileResponse(p, media_type="image/jpeg") class GenerateReq(BaseModel): prompt: str extra_prompt: str = "" # ✓ 需要的元素(正向) negative_prompt: str = "" # ✗ 不需要的元素(负向) model: str = "auto" # auto / gpt-image-2 / gemini-3-pro-image-preview size: str = "auto" # auto / 1024x1536 / 1024x1024 / 1536x1024 mode: str = "edit" # "edit" 带参考图,"text" 纯文字 from_selected: bool = False # True 时优先用 frame.selected 的生成图作 reference(迭代),否则原关键帧 @app.post("/jobs/{job_id}/frames/{idx}/generate", response_model=Job) def generate_image(job_id: str, idx: int, req: GenerateReq) -> Job: """根据关键帧 + prompt 生成新图(image-to-image 或 text-to-image)""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") frame_path = job_dir(job_id) / "frames" / f"{idx:03d}.jpg" if not frame_path.exists(): raise HTTPException(404, "frame file missing") # 决定 i2i 参考图:from_selected=True 且存在 selected 生成图 → 用它(迭代);否则原关键帧 reference_path = frame_path reference_source = "keyframe" if req.from_selected: sel = next((g for g in frame.generated_images if g.selected), None) if sel: sel_path = job_dir(job_id) / "gen" / f"{idx:03d}_{sel.id}.jpg" if sel_path.exists(): reference_path = sel_path reference_source = f"gen:{sel.id[:6]}" raw_prompt = req.prompt.strip() if req.extra_prompt.strip(): raw_prompt = f"{raw_prompt}. Include: {req.extra_prompt.strip()}" if req.negative_prompt.strip(): raw_prompt = f"{raw_prompt}. Avoid: {req.negative_prompt.strip()}" if not raw_prompt: raise HTTPException(400, "prompt required") full_prompt = _ensure_english(raw_prompt) image_size = _normalize_image_size(req.size) if not IMAGE_API_KEY: raise HTTPException(503, "IMAGE_API_KEY 或 LLM_API_KEY 未配置") model = GPT_IMAGE_MODEL gen_id = uuid.uuid4().hex[:12] import base64 as b64lib import time as _time import httpx img_bytes_in: bytes | None = None if req.mode == "edit": img_bytes_in = reference_path.read_bytes() # 尝试 i2i;auto 允许按熔断策略兜底,显式模型只走用户所选模型。 model_candidates = _image_model_candidates(preference=req.model) plan: list[str] = ([req.mode] if model_candidates != [GPT_IMAGE_MODEL] else [req.mode] * 3) if req.mode == "edit" else [req.mode] if req.mode == "edit": plan.append("text") # i2i 都失败时自动降级 attempt_steps = [(current_mode, current_model) for current_mode in plan for current_model in model_candidates] resp_data: dict = {} last_err = "" effective_mode = req.mode capacity_seen = False attempts_done = 0 for attempt, (current_mode, current_model) in enumerate(attempt_steps): attempts_done = attempt + 1 status_code = 0 body = "" retry_after: str | None = None try: if current_mode == "edit": if img_bytes_in is None: raise RuntimeError("edit mode reference image missing") with ai_http_client(timeout=IMAGE_REQUEST_TIMEOUT_SECONDS) as client: r = client.post( _image_endpoint("/images/edits"), headers={ "Authorization": f"Bearer {IMAGE_API_KEY}", }, data={"model": current_model, "prompt": full_prompt, "n": "1", **_image_size_payload(image_size)}, files={"image": ("reference.jpg", img_bytes_in, "image/jpeg")}, ) r.raise_for_status() resp_data = r.json() else: # text-only resp_data = _image_generation_response(full_prompt, current_model, image_size) if resp_data.get("data"): effective_mode = f"{current_mode}:{current_model}" model = current_model if current_model == GPT_IMAGE_MODEL: _image_record_primary_success() break err_obj = resp_data.get("error") or {} last_err = f"empty data · {err_obj.get('code', '')} · {str(err_obj.get('message', ''))[:200]} · model={current_model}" except httpx.HTTPStatusError as e: body = e.response.text status_code = e.response.status_code retry_after = e.response.headers.get("retry-after") capacity_seen = capacity_seen or _image_is_capacity_error(status_code, body) transient = ( status_code == 429 or status_code >= 500 or "incomplete_generation" in body or "rate_limit" in body or "timeout" in body.lower() or _image_is_capacity_error(status_code, body) ) last_err = f"HTTP {status_code}: {body[:200]} · model={current_model}" if not transient: raise HTTPException(500, f"image gen HTTP {status_code}: {body[:300]}") except Exception as e: last_err = f"{type(e).__name__}: {e} · model={current_model}" fallbackable = current_model == GPT_IMAGE_MODEL and _image_failure_can_fallback(status_code, body, last_err) if fallbackable: _image_record_primary_failure(last_err) if any(next_model != GPT_IMAGE_MODEL for _next_mode, next_model in attempt_steps[attempt + 1:]): print(f"[image gen fallback → {IMAGE_FALLBACK_MODEL}] {last_err}", flush=True) continue next_mode_changed = attempt < len(attempt_steps) - 1 and attempt_steps[attempt + 1][0] != current_mode if _image_should_retry(attempt, len(attempt_steps), status_code, body, last_err, next_mode_changed): next_mode = attempt_steps[attempt + 1][0] tag = f"fallback → {next_mode}" if next_mode != current_mode else f"retry {attempt + 1}/{len(attempt_steps)}" print(f"[image gen {tag}] {last_err}", flush=True) _time.sleep(_image_retry_delay(attempt, status_code, body, retry_after)) else: break data_arr = resp_data.get("data", []) if not data_arr: raise HTTPException(503 if capacity_seen else 500, _image_failure_message("image gen", attempts_done, last_err, capacity_seen)) item = data_arr[0] b64 = item.get("b64_json") if b64: out_bytes = b64lib.b64decode(b64) elif item.get("url"): with ai_http_client(timeout=IMAGE_REQUEST_TIMEOUT_SECONDS) as client: image_resp = client.get(item["url"]) image_resp.raise_for_status() out_bytes = image_resp.content else: raise HTTPException(500, "image gen returned no b64_json") # 保存到本地 jobs//gen/_.jpg gen_dir = job_dir(job_id) / "gen" gen_dir.mkdir(parents=True, exist_ok=True) out_path = gen_dir / f"{idx:03d}_{gen_id}.jpg" out_path.write_bytes(out_bytes) new_gen = GeneratedImage( id=gen_id, prompt=full_prompt, model=model, mode=effective_mode, url=f"/jobs/{job_id}/frames/{idx}/gen/{gen_id}.jpg", selected=False, created_at=_time.time(), ) # 写回 job.frames for f in job.frames: if f.index == idx: f.generated_images = f.generated_images + [new_gen] update(job, frames=job.frames, message=f"生图完成 · 分镜 {idx + 1}") return job @app.get("/jobs/{job_id}/frames/{idx}/gen/{gen_id}.jpg") def get_generated_image(job_id: str, idx: int, gen_id: str): p = job_dir(job_id) / "gen" / f"{idx:03d}_{gen_id}.jpg" if not p.exists(): raise HTTPException(404, "generated image not found") return FileResponse(p, media_type="image/jpeg") class SelectGenReq(BaseModel): selected: bool @app.post("/jobs/{job_id}/frames/{idx}/gen/{gen_id}/select", response_model=Job) def select_generated(job_id: str, idx: int, gen_id: str, req: SelectGenReq) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") for f in job.frames: if f.index != idx: continue for g in f.generated_images: # 单选:该帧只能选一张 if g.id == gen_id: g.selected = req.selected else: g.selected = False break update(job, frames=job.frames) return job @app.post("/jobs/{job_id}/frames/{idx}/describe", response_model=Job) def describe_frame(job_id: str, idx: int) -> Job: """调 vision 模型识别该关键帧,返回结构化描述。""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") p = job_dir(job_id) / "frames" / f"{idx:03d}.jpg" if not p.exists(): raise HTTPException(404, "frame file not found") import base64 as b64lib import re as _re img_b64 = b64lib.b64encode(p.read_bytes()).decode("ascii") prompt = ( "请识别这张图,输出严格 JSON(不要 markdown 不要解释,不要思考):\n" '{\n' ' "scene": "一句话描述场景",\n' ' "objects": [{"name": "物体名(中文)", "position": "在画面哪里", "color": "颜色", "extract_prompt": "用于提取该元素的英文 prompt"}],\n' ' "style": "整体风格 / 打光 / 色调(一句话)",\n' ' "suggested_prompt": "适合用作下游生图的完整英文 prompt",\n' ' "transparent_human_assessment": {"transparent_body_score": 0, "skeleton_visible_score": 0, "human_prominence_score": 0, "clarity_score": 0, "commercial_style_score": 0, "product_usefulness_score": 0, "qualified": false, "reject_reason": "如果不合格说明原因"}\n' '}\n' "要求:objects 列出 3-8 个画面里**可独立提取**的主要元素,extract_prompt 用于后续 image edit 模型。" "transparent_human_assessment 按透明骨架人标准评分:" + TRANSPARENT_HUMAN_POSITIVE_PROMPT + " " + TRANSPARENT_HUMAN_NEGATIVE_PROMPT + " " + TRANSPARENT_HUMAN_QUALIFIED_STANDARD ) last_err = "" data = None for attempt in range(3): try: resp = llm().chat.completions.create( model=VISION_MODEL, messages=[{"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}}, ]}], response_format={"type": "json_object"}, temperature=0.3, max_tokens=3000, ) content = (resp.choices[0].message.content or "").strip() if not content: # thinking 模型可能 content 空;尝试取 reasoning_content 里挖 JSON rc = getattr(resp.choices[0].message, "reasoning_content", "") or "" m = _re.search(r"\{[\s\S]*\}", rc) content = m.group(0) if m else "" # 剥掉 ```json ... ``` 包装 content = _re.sub(r"^```(?:json)?\s*|\s*```$", "", content).strip() if not content: last_err = f"empty content (attempt {attempt + 1})" continue data = json.loads(content) break except json.JSONDecodeError as e: last_err = f"json decode (attempt {attempt + 1}): {e} · raw[:200]={content[:200]}" print(f"[vision retry] {last_err}", flush=True) continue except Exception as e: last_err = f"vision call (attempt {attempt + 1}): {e}" print(f"[vision retry] {last_err}", flush=True) continue if data is None: raise HTTPException(500, last_err or "vision failed after 3 retries") # 写回 job new_frames = [] for f in job.frames: if f.index == idx: f.description = data new_frames.append(f) update(job, frames=new_frames, message=f"识别完成 · 分镜 {idx + 1}") return job # ---------- 清洗水印 / 元素提取(关键帧二阶段加工) ---------- class CleanupReq(BaseModel): # 多个相对坐标矩形 0-1,限制清洗范围;空 / None = 全图清洗 regions: list[dict] | None = None # [{"x","y","w","h"}, ...] def _region_to_phrase(r: dict) -> str: """把相对坐标矩形转成简短方位描述给 prompt 用(避免百分号 / 括号触发模型异常)""" x = max(0.0, min(1.0, float(r.get("x", 0)))) y = max(0.0, min(1.0, float(r.get("y", 0)))) w = max(0.0, min(1.0 - x, float(r.get("w", 0)))) h = max(0.0, min(1.0 - y, float(r.get("h", 0)))) if w <= 0 or h <= 0: return "" cx, cy = x + w / 2, y + h / 2 hpos = "left" if cx < 0.4 else "right" if cx > 0.6 else "middle" vpos = "top" if cy < 0.4 else "bottom" if cy > 0.6 else "middle" if hpos == "middle" and vpos == "middle": return "center" if hpos == "middle": return vpos if vpos == "middle": return hpos return f"{vpos} {hpos}" @app.post("/jobs/{job_id}/frames/{idx}/cleanup", response_model=Job) def cleanup_frame(job_id: str, idx: int, req: CleanupReq | None = None) -> Job: """调 gpt-image-2 image edit 清洗关键帧:去水印 / @用户名 / 字幕 / 平台 logo。 输出干净版到 jobs//cleaned/.jpg,写回 frame.cleaned_url。 可选 region: 限定只清洗框内区域。""" import time as _time job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") frame_path = job_dir(job_id) / "frames" / f"{idx:03d}.jpg" if not frame_path.exists(): raise HTTPException(404, "frame file missing") region_phrases: list[str] = [] if req and req.regions: for r in req.regions: p = _region_to_phrase(r) if p: region_phrases.append(p) region_phrases = list(dict.fromkeys(region_phrases)) # prompt 用"重画一张副本"语义而非"erase / remove only X" — 避免 Gemini 走 mask/inpainting # function call 路径(实测该路径在 SKG 网关上 100% 触发 incomplete_generation) if region_phrases: if len(region_phrases) == 1: zones = f"the {region_phrases[0]} area" else: zones = ", ".join(region_phrases) + " areas" prompt = ( f"Recreate this image as a clean version: remove the text and graphics in {zones}, " "keep the rest of the scene identical." ) else: prompt = ( "Recreate this image as a clean version without watermarks, captions, " "hashtags, usernames, or platform logos. Keep the composition and style." ) models = [GPT_IMAGE_MODEL] try: img_bytes, _mode = _image_edit_call( frame_path, prompt, models=models, fallback_text=False, max_attempts=3, ) except RuntimeError as e: raise HTTPException(500, f"cleanup failed: {e}") out_dir = job_dir(job_id) / "cleaned" out_dir.mkdir(parents=True, exist_ok=True) out_path = out_dir / f"{idx:03d}.jpg" out_path.write_bytes(img_bytes) new_frames = [] for f in job.frames: if f.index == idx: f.cleaned_url = f"/jobs/{job_id}/frames/{idx}/cleaned.jpg?t={int(_time.time())}" f.cleaned_applied = False # 重新清洗:重置"已应用"状态 new_frames.append(f) update(job, frames=new_frames, message=f"清洗完成 · 分镜 {idx + 1}") return job @app.get("/jobs/{job_id}/frames/{idx}/cleaned.jpg") def get_cleaned_frame(job_id: str, idx: int): p = job_dir(job_id) / "cleaned" / f"{idx:03d}.jpg" if not p.exists(): raise HTTPException(404, "cleaned frame not found") return FileResponse(p, media_type="image/jpeg") @app.delete("/jobs/{job_id}/frames/{idx}/cleanup", response_model=Job) def discard_cleaned(job_id: str, idx: int) -> Job: """丢弃待应用的清洗版(不影响已应用的)""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") p = job_dir(job_id) / "cleaned" / f"{idx:03d}.jpg" if p.exists(): try: p.unlink() except OSError: pass new_frames = [] for f in job.frames: if f.index == idx: f.cleaned_url = None new_frames.append(f) update(job, frames=new_frames, message=f"丢弃清洗版 · 分镜 {idx + 1}") return job @app.post("/jobs/{job_id}/frames/{idx}/cleanup/apply", response_model=Job) def apply_cleaned(job_id: str, idx: int) -> Job: """用清洗版替换原关键帧:物理覆盖 frames/{idx}.jpg ← cleaned/{idx}.jpg。 原图作备份 → orig/{idx}.jpg(首次替换时备份,后续替换跳过)。 替换后 frame.cleaned_url 清空(不再有"待应用"清洗版)""" import shutil as _shutil job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") cleaned_path = job_dir(job_id) / "cleaned" / f"{idx:03d}.jpg" if not cleaned_path.exists(): raise HTTPException(404, "no cleaned version to apply") frame_path = job_dir(job_id) / "frames" / f"{idx:03d}.jpg" # 首次替换:把原图备份到 orig/{idx}.jpg orig_dir = job_dir(job_id) / "orig" orig_dir.mkdir(parents=True, exist_ok=True) orig_backup = orig_dir / f"{idx:03d}.jpg" if not orig_backup.exists() and frame_path.exists(): _shutil.copy2(frame_path, orig_backup) # 用 cleaned 覆盖 frames/ _shutil.copy2(cleaned_path, frame_path) # 删 cleaned 文件(已经"应用",不再是单独的待选版本) try: cleaned_path.unlink() except OSError: pass new_frames = [] for f in job.frames: if f.index == idx: f.cleaned_url = None f.cleaned_applied = True new_frames.append(f) update(job, frames=new_frames, message=f"已替换分镜 {idx + 1} 为清洗版") return job class AddElementReq(BaseModel): name_zh: str name_en: str = "" position: str = "" source: Literal["auto", "manual", "region"] = "manual" region: dict | None = None class UpdateElementReq(BaseModel): name_zh: str | None = None name_en: str | None = None position: str | None = None subject_consensus_brief: str | None = None subject_consensus_brief_zh: str | None = None class GenerateSceneAssetReq(BaseModel): quality: AssetQuality = "hd" size: AssetSize = "source" scene_mode: SceneMode = "remove_subject" scene_style: SceneStyle = "source" asset_role: SceneAssetRole = "scene" prompt: str = "" subject_brief: str = "" source_frame_indices: list[int] | None = None subject_images: list[dict] = Field(default_factory=list) product_images: list[dict] = Field(default_factory=list) class SubjectProfilePreference(BaseModel): mode: Literal["random", "manual"] = "random" gender: str = "" age: str = "" wardrobe: str = "" region_ethnicity: str = "" skin_tone: str = "" body: str = "" hair: str = "" mood: str = "" resolved_summary: str = "" prompt_summary: str = "" class GenerateSubjectAssetsReq(BaseModel): subject_kind: SubjectKind = "object" background: AssetBackground = "white" quality: AssetQuality = "hd" size: AssetSize = "source" source_frame_indices: list[int] | None = None views: list[str] | None = None character_id: str = "" subject_template_id: str = "" subject_style: Literal["transparent_human", "source_actor", "cartoon_subject"] = "transparent_human" reconstruction_mode: Literal["same", "similar"] = "same" subject_profile: SubjectProfilePreference | None = None prompt: str = "" image_model_preference: str = "auto" replace_views: bool = False source_subject_brief: str = "" pack_id: str = "" pack_label: str = "" pack_mode: str = "" pack_created_at: float = 0.0 def _subject_profile_prompt_clause(profile: SubjectProfilePreference | None) -> str: if not profile: return "" prompt_summary = _ensure_english(profile.prompt_summary or "") resolved_summary = _ensure_english(profile.resolved_summary or "") if prompt_summary: body = prompt_summary[:1400] else: parts = [ ("gender presentation", profile.gender), ("age range", profile.age), ("wardrobe style", profile.wardrobe), ("regional/ethnic appearance cues", profile.region_ethnicity), ("skin tone", profile.skin_tone), ("body proportion", profile.body), ("hair style", profile.hair), ("commercial mood", profile.mood), ] body = "; ".join(f"{name}: {_ensure_english(value.strip())}" for name, value in parts if value and value.strip())[:1400] if not body and not resolved_summary: return "" mode = "random-composed" if profile.mode == "random" else "manually selected" resolved = f" UI summary: {resolved_summary[:700]}." if resolved_summary else "" return ( f"Structured subject casting profile ({mode}, locked for this request): {body}. " "This profile overrides ambiguous source/template traits for gender presentation, age range, wardrobe, regional/ethnic appearance cues, skin tone, body proportion, hair, and commercial mood. " "Apply the same profile uniformly to every requested view; do not mix different genders, ages, skin tones, wardrobes, or character identities inside the pack." + resolved + " " ) class UpdateProductRefsReq(BaseModel): items: list[dict] = Field(default_factory=list) @app.put("/jobs/{job_id}/product-refs", response_model=Job) def update_product_refs(job_id: str, req: UpdateProductRefsReq) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") items: list[dict] = [] for item in req.items[:300]: if isinstance(item, dict) and isinstance(item.get("ref"), dict): items.append(item) update(job, product_refs=items) return job @app.post("/jobs/{job_id}/frames/{idx}/elements", response_model=Job) def add_element(job_id: str, idx: int, req: AddElementReq) -> Job: """加一条元素 · 若 name_en 缺则自动 zh→en 翻译""" import time as _time import re as _re job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") name_zh = req.name_zh.strip() if not name_zh: raise HTTPException(400, "name_zh required") name_en = req.name_en.strip() if not name_en and LLM_API_KEY: try: prompt = ( "Translate the following text into concise English, suitable as an element label " "in an image-generation prompt. Output only the translation — no quotes, no punctuation, " f"no explanation.\n\nInput: {name_zh}" ) resp = llm().chat.completions.create( model=TRANSLATE_MODEL, messages=[{"role": "user", "content": prompt}], temperature=0.2, max_tokens=200, ) out = (resp.choices[0].message.content or "").strip() if not out: rc = getattr(resp.choices[0].message, "reasoning_content", "") or "" if rc: out = rc.strip().splitlines()[-1].strip() name_en = _re.sub(r'^[\'"「『]+|[\'"」』]+$', "", out).strip() except Exception as e: print(f"[add_element translate failed] {e}", flush=True) name_en = "" el = KeyElement( id=uuid.uuid4().hex[:8], name_zh=name_zh, name_en=name_en, position=req.position.strip(), source=req.source, region=req.region, created_at=_time.time(), ) new_frames = [] for f in job.frames: if f.index == idx: f.elements = f.elements + [el] new_frames.append(f) update(job, frames=new_frames, message=f"加入元素 · 分镜 {idx + 1} · {name_zh}") return job @app.patch("/jobs/{job_id}/frames/{idx}/elements/{element_id}", response_model=Job) def update_element(job_id: str, idx: int, element_id: str, req: UpdateElementReq) -> Job: """更新元素标签 / 英文提示。提取不准时允许用户修正,不强制重建元素。""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") changed_name = "" found = False new_frames = [] for f in job.frames: if f.index == idx: for e in f.elements: if e.id == element_id: found = True if req.name_zh is not None: name_zh = req.name_zh.strip() if not name_zh: raise HTTPException(400, "name_zh required") e.name_zh = name_zh changed_name = name_zh if req.name_en is not None: e.name_en = req.name_en.strip() if req.position is not None: e.position = req.position.strip() if req.subject_consensus_brief is not None: e.subject_consensus_brief = _ensure_english(req.subject_consensus_brief.strip())[:2200] if req.subject_consensus_brief_zh is not None: e.subject_consensus_brief_zh = req.subject_consensus_brief_zh.strip()[:2200] new_frames.append(f) if not found: raise HTTPException(404, "element not found") update(job, frames=new_frames, message=f"更新元素 · 分镜 {idx + 1} · {changed_name or element_id}") return job @app.delete("/jobs/{job_id}/frames/{idx}/elements/{element_id}", response_model=Job) def delete_element(job_id: str, idx: int, element_id: str) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") new_frames = [] removed = False for f in job.frames: if f.index == idx: before = len(f.elements) f.elements = [e for e in f.elements if e.id != element_id] removed = len(f.elements) < before # 若有提取图也删(含多版本) if removed: elements_dir = job_dir(job_id) / "elements" if elements_dir.exists(): for pat in (f"{idx:03d}_{element_id}.jpg", f"{idx:03d}_{element_id}.png", f"{idx:03d}_{element_id}_*.jpg"): for p in elements_dir.glob(pat): try: p.unlink() except OSError: pass new_frames.append(f) if not removed: raise HTTPException(404, "element not found") update(job, frames=new_frames, message=f"删除元素 · 分镜 {idx + 1}") return job @app.post("/jobs/{job_id}/frames/{idx}/scene-asset", response_model=Job) def generate_scene_asset(job_id: str, idx: int, req: GenerateSceneAssetReq) -> Job: """为关键帧生成一张资产图。 scene: 去主体背景板;first_frame/last_frame: 纯文字生成视频首尾帧,参考帧只用于理解统一人物形象。""" import time as _time job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = _find_frame(job, idx) src = _source_frame_path(job_id, idx) if not src.exists(): raise HTTPException(404, "source frame file missing") source_indices = [int(x) for x in (req.source_frame_indices or [idx]) if isinstance(x, int) or str(x).isdigit()] if not source_indices: source_indices = [idx] source_indices = list(dict.fromkeys(source_indices))[:8] model_src = src sheet_tmp: Path | None = None if req.asset_role == "scene" and len(source_indices) > 1: sheet_tmp = job_dir(job_id) / "tmp" / f"scene_refs_{idx:03d}_{uuid.uuid4().hex[:6]}.jpg" sheet = _make_reference_contact_sheet(job_id, source_indices, sheet_tmp) if sheet: model_src = sheet # Endpoint frames deliberately ignore subject image references. Character identity comes # from subject_brief text, while only 1-2 product images remain hard visual truth. product_ref_paths = [p for p in (storyboard_ref_path(job_id, r) for r in req.product_images[:2]) if p and p.exists()] confirmed_subjects = [ (e.name_en or e.name_zh).strip() for ref_frame in job.frames for e in (ref_frame.elements or []) if (e.subject_assets or []) ] if not confirmed_subjects: confirmed_subjects = [ (e.name_en or e.name_zh).strip() for ref_frame in job.frames for e in (ref_frame.elements or []) if (e.name_en or e.name_zh).strip() ][:3] confirmed_subjects = list(dict.fromkeys([x for x in confirmed_subjects if x]))[:3] subject_clause = ( "Confirmed foreground subject(s) to remove: " + ", ".join(confirmed_subjects) + ". " if confirmed_subjects else "Remove the main foreground subject from the frame if present. " ) subject_brief = _ensure_english(req.subject_brief.strip()) subject_brief_clause = ( f"Subject identity (text only, no image reference): {subject_brief[:1800]}. " "Maintain this identity across this and other endpoint frames in the same storyboard. " "Vary pose, framing, expression, gesture, camera distance, and environment freely according to the user prompt; do not fall back to any specific reference photo or ID-card pose. " if subject_brief else "No subject identity brief was provided. Do not add a main character unless the user scene direction explicitly asks for one. " ) mode_clause = { "remove_subject": ( "Keep the original environment, camera angle, perspective, composition, lighting direction, color mood, and spatial layout. " "The result should be an empty clean scene/background plate with the subject removed and the occluded background reconstructed." ), "similar": ( "Create a similar but not identical scene/background plate: keep the same camera angle, rough spatial layout, lighting direction, and usage context, " "but vary props, surface details, textures, and small environmental details so it is not a duplicate of the source." ), "style": ( "Create a scene/background plate with the same camera angle and spatial layout, but reinterpret the environment in the selected visual style. " "Keep it believable and useful for image-to-video generation." ), }[req.scene_mode] style_clause = { "source": "Follow the original source style.", "premium_product": "Use a premium product-advertising style: polished, high-end, clean commercial lighting, refined materials.", "clean_studio": "Use a clean studio style: simple surfaces, controlled lighting, minimal distractions.", "warm_lifestyle": "Use a warm lifestyle style: realistic lived-in details, soft natural light, approachable atmosphere.", "cinematic": "Use a cinematic style: dramatic but natural lighting, richer depth, filmic contrast, not fantasy.", }[req.scene_style] user_prompt = _ensure_english(req.prompt.strip()) user_prompt_clause = ( "User scene direction: " + user_prompt[:1200] + " " if user_prompt else "" ) if req.asset_role != "scene" and product_ref_paths: reference_clause = ( f"Use the provided {len(product_ref_paths)} SKG product image(s) only as rigid product reference. " "Do not use the original keyframe as the first/last-frame truth; it is only a storage anchor for this row. No subject image reference is attached. " ) elif req.asset_role != "scene": reference_clause = ( "No image reference is attached for this endpoint frame. Generate from text only. " "Do not use the original keyframe as the first/last-frame truth; it is only a storage anchor for this row. " ) else: reference_clause = ( f"Use the selected reference frame contact sheet as visual evidence for location, composition, lighting, materials, and atmosphere. Reference frame indices: {', '.join(str(i + 1) for i in source_indices)}. " if len(source_indices) > 1 else "Use the provided frame as the primary visual reference. " ) product_asset_clause = ( "The provided product image(s) are the only product truth. The product is a white U-shaped neck-and-shoulder wearable massage device worn around the neck/shoulders, not headphones, a collar pillow, skincare, food, or a medical prop. Do not vary left/right asymmetry, button placement, contact pad position, side thickness, opening direction, inner/outer shell relationship, or wearable scale relative to the human neck. Preserve all structural details exactly while integrating it into the new scene. " if product_ref_paths else "Do not invent a random product. Only include an SKG product if the user prompt explicitly asks for it. " ) subject_asset_clause = ( (TRANSPARENT_HUMAN_POSITIVE_PROMPT + " " + TRANSPARENT_HUMAN_NEGATIVE_PROMPT + " ") if subject_brief and ("透明" in subject_brief or "transparent" in subject_brief.lower() or "skeleton" in subject_brief.lower()) else "" ) if req.asset_role == "scene": prompt = ( "Create one clean high-definition scene/background reference image from this frame. " + subject_clause + "Do not include the removed subject, duplicate people, animals, products, text, watermark, platform UI, captions, usernames, hashtags, logos, or overlay graphics. " + reference_clause + user_prompt_clause + mode_clause + " " + style_clause + " " + "Enhance clarity and texture while avoiding over-smoothing, warped geometry, or changing important perspective details. " + "Do not create multiple views. Do not isolate objects." ) else: role_clause = ( "This is the FIRST frame for an image-to-video clip: create a clear beginning pose and composition. " if req.asset_role == "first_frame" else "This is the LAST frame for an image-to-video clip: create a clear ending pose that can naturally follow the first frame, not a duplicate. " ) prompt = ( "Create one premium 9:16 high-definition video endpoint frame from text direction. " + role_clause + subject_brief_clause + reference_clause + user_prompt_clause + style_clause + " " + product_asset_clause + subject_asset_clause + "Do not create a plain background plate. Do not include SKG product unless the user prompt explicitly asks for it. " + "The output should be ready as a first/last frame for Seedance video generation, with stable composition, believable perspective, clear subject, no text, no watermark, no gore, no medical surgery imagery." ) models = [GPT_IMAGE_MODEL] try: if req.asset_role == "scene": img_bytes, _mode = _image_edit_call(model_src, prompt, models=models, fallback_text=False, max_attempts=3, max_side=1280) elif product_ref_paths: print( f"[scene asset] role={req.asset_role} endpoint=/images/edits product_refs={len(product_ref_paths)} subject_refs=0 contact_sheet=0 model={GPT_IMAGE_MODEL}", flush=True, ) img_bytes, _mode = _image_edit_call(product_ref_paths, prompt, models=models, fallback_text=False, max_attempts=3, max_side=1600) else: print( f"[scene asset] role={req.asset_role} endpoint=/images/generations product_refs=0 subject_refs=0 contact_sheet=0 model={GPT_IMAGE_MODEL}", flush=True, ) img_bytes, _mode = _image_text_call(prompt, models=models, max_attempts=3) except RuntimeError as e: raise HTTPException(500, f"{req.asset_role} asset failed: {e}") finally: if sheet_tmp and sheet_tmp.exists(): try: sheet_tmp.unlink() except OSError: pass asset_id = f"scene_{idx:03d}_{uuid.uuid4().hex[:8]}" out_path = job_dir(job_id) / "assets" / f"{asset_id}.jpg" width, height = _normalize_asset_image(img_bytes, out_path, src, req.size, "white", square=False) report = _image_quality_report(out_path) scene = SceneAsset( id=asset_id, label=( f"分镜 {idx + 1} 场景图" if req.asset_role == "scene" else f"分镜 {idx + 1} {'首帧' if req.asset_role == 'first_frame' else '尾帧'}" ), url=_asset_url(job_id, asset_id), width=width, height=height, quality=req.quality, size=req.size, scene_mode=req.scene_mode, scene_style=req.scene_style, asset_role=req.asset_role, quality_report=report, created_at=_time.time(), ) new_frames = [] for f in job.frames: if f.index == idx: f.quality_report = _image_quality_report(src) f.scene_assets = (f.scene_assets or []) + [scene] new_frames.append(f) asset_label = "场景图" if req.asset_role == "scene" else ("首帧" if req.asset_role == "first_frame" else "尾帧") update(job, frames=new_frames, message=f"{asset_label}生成完成 · 分镜 {idx + 1}") return job @app.post("/jobs/{job_id}/frames/{idx}/elements/{element_id}/cutout", response_model=Job) def cutout_element(job_id: str, idx: int, element_id: str) -> Job: """AI 提取元素 · 每次累积一张新图: 调 gpt-image-2 生成**完整、清晰**的元素图(即使原图只露出部分也补全)。 region 元素:先把 region + 30% padding 区域裁出作为 focus,再发给模型聚焦补全。""" from PIL import Image as _PILImage import io as _io import tempfile as _tempfile job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") el = next((e for e in frame.elements if e.id == element_id), None) if not el: raise HTTPException(404, "element not found") cleaned_path = job_dir(job_id) / "cleaned" / f"{idx:03d}.jpg" src = cleaned_path if cleaned_path.exists() else job_dir(job_id) / "frames" / f"{idx:03d}.jpg" if not src.exists(): raise HTTPException(404, "source frame file missing") out_dir = job_dir(job_id) / "elements" out_dir.mkdir(parents=True, exist_ok=True) new_cutout_id = uuid.uuid4().hex[:8] out_path = out_dir / f"{idx:03d}_{element_id}_{new_cutout_id}.jpg" # region 元素:先 PIL 裁出 region + 30% padding 作为 focus 给模型(让它聚焦在该元素) tmp_focus: Path | None = None model_src = src if el.region: try: im = _PILImage.open(src).convert("RGB") W, H = im.size r = el.region x = max(0.0, min(1.0, float(r.get("x", 0)))) y = max(0.0, min(1.0, float(r.get("y", 0)))) w = max(0.0, min(1.0 - x, float(r.get("w", 0)))) h = max(0.0, min(1.0 - y, float(r.get("h", 0)))) cx, cy = x + w / 2, y + h / 2 # 扩大 30% 给上下文(避免裁到正好边界丢失补全 hint) ew, eh = w * 1.6, h * 1.6 x0 = max(0.0, cx - ew / 2); y0 = max(0.0, cy - eh / 2) x1 = min(1.0, cx + ew / 2); y1 = min(1.0, cy + eh / 2) left, top, right, bottom = int(x0 * W), int(y0 * H), int(x1 * W), int(y1 * H) if right - left > 8 and bottom - top > 8: cropped = im.crop((left, top, right, bottom)) tmp = _tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) cropped.save(tmp.name, format="JPEG", quality=92) tmp.close() tmp_focus = Path(tmp.name) model_src = tmp_focus except Exception as e: print(f"[cutout region crop failed, fallback to full frame] {e}", flush=True) target = (el.name_en or el.name_zh).strip() prompt = ( f"Identify the {target} in this image. " f"Generate a complete, high-resolution, sharply detailed image of the entire {target} as a standalone asset. " f"If the {target} is only partially visible in the source (cropped at edges, occluded by other objects, or out of frame), " "intelligently reconstruct the missing parts based on visual context so the result shows the FULL element. " "Place the complete element on a pure white background, isolated, with no other objects, no scene fragments, no shadows from the original scene. " "Preserve the element's original color palette, style, lighting character, and proportions. " "Output must be a clean, high-quality asset image suitable for downstream composition." ) models = [GPT_IMAGE_MODEL] img_bytes: bytes try: try: img_bytes, _mode = _image_edit_call( model_src, prompt, models=models, fallback_text=False, max_attempts=3, ) except RuntimeError as e: raise HTTPException(500, f"extract failed: {e}") finally: if tmp_focus and tmp_focus.exists(): try: tmp_focus.unlink() except OSError: pass out_path.write_bytes(img_bytes) new_frames = [] for f in job.frames: if f.index == idx: for e in f.elements: if e.id == element_id: e.cutouts = (e.cutouts or []) + [new_cutout_id] if not e.cutout_id: e.cutout_id = new_cutout_id new_frames.append(f) update(job, frames=new_frames, message=f"提取完成 · {el.name_zh}") return job def _subject_source_indices(req: GenerateSubjectAssetsReq, idx: int) -> list[int]: source_indices = [int(x) for x in (req.source_frame_indices or [idx]) if isinstance(x, int) or str(x).isdigit()] if idx not in source_indices: source_indices = [idx] + source_indices return list(dict.fromkeys(source_indices))[:12] def _normalize_subject_pack_id(value: str, idx: int, element_id: str) -> str: cleaned = "".join(ch for ch in (value or "").strip() if ch.isalnum() or ch in {"_", "-"}) return cleaned[:96] or f"subject_pack_{idx:03d}_{element_id}_{uuid.uuid4().hex[:8]}" def _update_subject_asset_status( job_id: str, idx: int, element_id: str, asset_id: str, *, status: SubjectAssetStatus, progress: int, error: str = "", message: str = "", ) -> None: job = JOBS.get(job_id) if not job: return new_frames = [] for f in job.frames: if f.index == idx: for e in f.elements: if e.id == element_id: updated_assets = [] for asset in e.subject_assets or []: if asset.id == asset_id: updated_assets.append(asset.model_copy(update={ "status": status, "progress": max(0, min(100, int(progress))), "error": error, "ai_completed": status == "completed", })) else: updated_assets.append(asset) e.subject_assets = updated_assets new_frames.append(f) update(job, frames=new_frames, message=message or job.message, error=error if status == "failed" else job.error) def _subject_assets_background_worker( job_id: str, idx: int, element_id: str, req: GenerateSubjectAssetsReq, queued: list[tuple[SubjectView, str, str]], ) -> None: if not req.source_subject_brief.strip() and _subject_source_indices(req, idx): try: req.source_subject_brief = _describe_source_subject(job_id, _subject_source_indices(req, idx)) except Exception as e: print(f"[subject assets] source brief failed job={job_id} error={e}", flush=True) for position, (view, view_label, placeholder_id) in enumerate(queued, start=1): _update_subject_asset_status( job_id, idx, element_id, placeholder_id, status="in_progress", progress=10, message=f"主体资产生成中 · {view_label} · {position}/{len(queued)}", ) one_req = req.model_copy(deep=True) one_req.views = [view] one_req.replace_views = True try: _generate_subject_assets_sync(job_id, idx, element_id, one_req) except HTTPException as e: detail = str(e.detail) _update_subject_asset_status( job_id, idx, element_id, placeholder_id, status="failed", progress=100, error=detail, message=f"主体资产生成失败 · {view_label}", ) except Exception as e: detail = str(e) _update_subject_asset_status( job_id, idx, element_id, placeholder_id, status="failed", progress=100, error=detail, message=f"主体资产生成失败 · {view_label}", ) @app.post("/jobs/{job_id}/frames/{idx}/elements/{element_id}/subject-assets", response_model=Job) def generate_subject_assets(job_id: str, idx: int, element_id: str, req: GenerateSubjectAssetsReq) -> Job: """提交主体多视角生成任务,立即返回占位卡;后台逐张生成并逐张写回。""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = _find_frame(job, idx) el = next((e for e in frame.elements if e.id == element_id), None) if not el: raise HTTPException(404, "element not found") views = _subject_view_labels(req.subject_kind, req.views) source_indices = _subject_source_indices(req, idx) target_views = {view for view, _label in views} now = time.time() explicit_pack_id = bool((req.pack_id or "").strip()) pack_id = _normalize_subject_pack_id(req.pack_id, idx, element_id) pack_label = (req.pack_label or "").strip()[:120] or f"{el.name_zh} · 主体套图" pack_mode = (req.pack_mode or "").strip()[:40] or req.subject_style pack_created_at = req.pack_created_at or now placeholders: list[SubjectAsset] = [] queued: list[tuple[SubjectView, str, str]] = [] for view, view_label in views: asset_id = f"subject_{idx:03d}_{element_id}_{view}_{uuid.uuid4().hex[:8]}" placeholders.append(SubjectAsset( id=asset_id, view=view, label=f"{el.name_zh} · {view_label}", url="", width=0, height=0, background=req.background, quality=req.quality, size=req.size, source_frame_indices=source_indices, ai_completed=False, status="queued", progress=0, error="", pack_id=pack_id, pack_label=pack_label, pack_mode=pack_mode, pack_created_at=pack_created_at, created_at=now, )) queued.append((view, view_label, asset_id)) new_frames = [] for f in job.frames: if f.index == idx: for e in f.elements: if e.id == element_id: e.subject_kind = req.subject_kind e.cutout_background = req.background current_assets = e.subject_assets or [] if req.replace_views: for old_asset in current_assets: should_replace = old_asset.view in target_views and ( old_asset.pack_id == pack_id if explicit_pack_id else True ) if should_replace and old_asset.url: _delete_subject_asset_file(job_id, old_asset.id) current_assets = [ asset for asset in current_assets if not ( asset.view in target_views and ( asset.pack_id == pack_id if explicit_pack_id else True ) ) ] e.subject_assets = current_assets + placeholders new_frames.append(f) update(job, frames=new_frames, message=f"主体资产已提交 · {el.name_zh} · {len(placeholders)} 张逐张生成中", error="") worker_req = req.model_copy(deep=True) worker_req.views = [view for view, _label in views] worker_req.pack_id = pack_id worker_req.pack_label = pack_label worker_req.pack_mode = pack_mode worker_req.pack_created_at = pack_created_at threading.Thread( target=_subject_assets_background_worker, args=(job_id, idx, element_id, worker_req, queued), daemon=True, ).start() return job def _generate_subject_assets_sync(job_id: str, idx: int, element_id: str, req: GenerateSubjectAssetsReq) -> Job: """为一个主体生成多视角资产包。 如果传入 source_frame_indices 或内置 character_id,则把多张参考图作为独立 image[] 证据提交。""" import time as _time job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = _find_frame(job, idx) el = next((e for e in frame.elements if e.id == element_id), None) if not el: raise HTTPException(404, "element not found") source_indices = _subject_source_indices(req, idx) similar_mode = req.reconstruction_mode == "similar" character_reference_paths: list[Path] = [] template_brief_clause = "" selected_template_brief = "" character_label = "" subject_template_id = (req.subject_template_id or "").strip() character_id = (req.character_id or "").strip() if subject_template_id: template = find_subject_template_item(subject_template_id) character_label = template.name template_paths = [subject_template_image_file(image.filename) for image in template.images[:10]] character_reference_paths.extend(template_paths) brief = template.prompt_brief.strip() or template.note.strip() or template.description.strip() if similar_mode and not brief: brief = _describe_subject_template_from_images(template.name, template.subject_style, template_paths, template.note) brief = _ensure_english(brief) selected_template_brief = brief.strip() template_brief_clause = ( f"Reference character brief from saved database template '{template.name}': {brief}. " "Use this as a high-quality creative direction and identity bible only; do not copy a face, exact pose, pixels, file artifacts, labels, or accidental defects. " "Create a new innovative variation that keeps the same broad subject type, transparent wellness character language, camera readability, shoulder/neck product compatibility, and commercial role. " if brief else f"Selected reusable subject template from database: {template.name}. Create a new innovative variation, not a duplicate. " ) elif character_id: character = find_character_library_item(character_id) character_label = character.name character_reference_paths.extend(character_library_file(image.filename) for image in character.images[:7]) brief = character.prompt_brief.strip() or character.description.strip() brief = _ensure_english(brief) selected_template_brief = brief.strip() template_brief_clause = ( f"Reference character brief from built-in creative character '{character.name}': {brief}. " "Use this planned character brief as a high-quality creative direction and anatomy/style bible only; " "do not copy the exact face, exact pose, exact silhouette, pixels, or make a duplicate. " "Create a new innovative variation that keeps the same broad role, transparent wellness character language, camera readability, and shoulder/neck product compatibility. " ) tmp_focus: Path | None = None model_src: Path | list[Path] | None = None frame_reference_paths = [p for p in (_source_frame_path(job_id, i) for i in source_indices) if p.exists()] source_subject_brief = ( _ensure_english(req.source_subject_brief.strip()) if req.source_subject_brief.strip() else (_describe_source_subject(job_id, source_indices) if source_indices else "") ) source_subject_clause = ( f"Source video role brief from selected keyframes: {source_subject_brief}. " + ( "Use this brief as secondary text evidence while preserving the same visible source subject from the attached reference image(s). " if req.reconstruction_mode == "same" else "Use this brief to preserve role category, creator-ad energy, camera readability, and broad styling, while creating a new non-identical subject. " ) if source_subject_brief else ( "Source video role brief unavailable; use the attached source reference image(s) as primary evidence for the same visible subject. " if req.reconstruction_mode == "same" else "Source video role brief unavailable; create a new non-identical ad subject guided by the user direction, template brief, and requested view. " ) ) if similar_mode: if character_reference_paths: remaining = max(0, 10 - len(character_reference_paths)) model_src = character_reference_paths + frame_reference_paths[:remaining] elif frame_reference_paths: model_src = frame_reference_paths[:10] else: model_src, tmp_focus = _focus_source_for_element(job_id, idx, el) if character_reference_paths: remaining = max(0, 10 - len(character_reference_paths)) model_src = character_reference_paths + frame_reference_paths[:remaining] elif frame_reference_paths: model_src = frame_reference_paths[:10] try: with Image.open(_source_frame_path(job_id, idx)) as src_im: source_is_portrait = src_im.height > src_im.width except Exception: source_is_portrait = False canvas_clause = ( "Canvas and aspect ratio: the reference video frame is vertical, so output a vertical portrait 9:16-style image, not a square canvas and not a horizontal layout. " if source_is_portrait else "Canvas and aspect ratio: keep a single clean reference-image canvas with the same broad orientation as the source evidence. " ) target = (el.name_en or el.name_zh).strip() bg_phrase = "pure white" if req.background == "white" else "pure black" similar_actor = req.subject_kind == "living" and req.subject_style == "source_actor" and req.reconstruction_mode == "similar" cartoon_subject = req.subject_kind == "living" and req.subject_style == "cartoon_subject" kind_phrase = ( "original stylized cartoon or illustrative living character" if cartoon_subject else "human actor or living character" if req.subject_kind == "living" else "object or product-like subject" ) transparent_character_clause = ( TRANSPARENT_HUMAN_POSITIVE_PROMPT + " The generated living character must be a friendly transparent humanoid with transparent or translucent outer body and clean white skeleton visible inside the same body. " + TRANSPARENT_HUMAN_NEGATIVE_PROMPT + " Do not render a normal human, ordinary skeleton-only character, horror skeleton, medical anatomy, organs, veins, blood, corpse, zombie, hospital, surgery, or autopsy visual. " if req.subject_kind == "living" and req.subject_style == "transparent_human" else "" ) actor_style_clause = ( "Generate a believable normal commercial video actor, not a transparent or skeleton character. " "Use the text briefs to understand the source video's casting direction, age range, gender presentation, body proportion, wardrobe category, gesture vocabulary, framing, energy, lighting, and creator-ad style. " "Do not recreate the exact person's face, biometric identity, unique likeness, tattoos, scars, logos, watermarks, captions, or platform UI. " "The output must be a newly designed similar actor that could play the same role in a new ad, with consistent identity across all views. " if similar_actor else "" ) cartoon_style_clause = ( "Generate an original stylized cartoon or illustrated advertising character, not a photoreal person and not a copied likeness. " "Use the source brief only for broad role, pose logic, mood, body proportion category, neck-and-shoulder readability, and commercial energy. " "Change the face, exact silhouette, clothing details, marks, logos, watermarks, captions, and any identifiable source-video features. " "Keep one consistent cartoon design system, proportions, materials, color language, and character identity across all requested views. " if cartoon_subject else "" ) identity_clause = ( "Create a similar but non-identical original subject: match the performance role, silhouette category, styling direction, camera-readability, and commercial mood, while changing exact identity and unique personal features. " if req.reconstruction_mode == "similar" else "Preserve identity, proportions, silhouette, material, colors, styling, and distinctive details across all generated views. " ) prompt_extra = _ensure_english(req.prompt.strip()) prompt_extra_clause = f"User direction: {prompt_extra[:1200]} " if prompt_extra else "" subject_profile_clause = _subject_profile_prompt_clause(req.subject_profile) identity_lock_clause = ( "Identity lock: these API calls generate one high-definition multi-view pack for ONE single subject, but each individual output file must show only its one requested view. " "Before rendering, infer one consistent character bible from the supplied text brief and generation instructions: gender presentation, age range, body proportions, head shape, face direction cues, material, silhouette, wardrobe/material style, and commercial mood. " "Keep that same character bible unchanged across every generated view in separate files. " "By default, inherit the reference frames' broad gender presentation, regional/ethnic appearance category, skin-tone family, body-proportion category, and ad-role energy unless the user explicitly overrides them. " "The pack must depict the same newly designed person or character in every view: same face design, same hair design, same body proportions, same skin tone, same age range, and same commercial styling. " "If user direction requests a gender, age, or style change, apply that one change uniformly to all views; never mix male/female, young/old, or multiple style identities inside the same pack. " "For transparent humanoids, keep the same transparent skin shell, skeleton proportions, visible spine/rib cage/pelvis/limb bones, and non-horror wellness character style in every view. " ) wardrobe_lock_clause = ( "Wardrobe lock: choose one outfit bible before rendering and keep it identical across all views. " "The same garment type, color palette, neckline, sleeve shape, straps, fabric/material, fit, seam logic, and visible accessories must remain consistent from front, side, three-quarter, and back views. " "Do not change clothing between views; do not switch from sportswear to casualwear, dress, coat, hoodie, uniform, or underwear unless the user explicitly requests that single outfit for the whole pack. " "If the reference outfit is useful, inherit its broad wardrobe category and color family, but redraw it as a new non-identical clean commercial outfit. " ) pack_bible_clause = ( ( "PACK BIBLE - source-locked mode. " "Subject bible: use the attached source frame(s) as the primary identity and wardrobe reference for one same visible subject. " "Preserve the visible gender presentation, regional/ethnic appearance category, skin-tone family, age range impression, body-proportion category, hair length/color/silhouette, face-structure impression, posture energy, neck/shoulder readability, outfit category, garment colors, material finish, and accessory logic across every generated view. " "Do not replace the source subject with a different actor, different body type, different ethnicity, different gender, different hairstyle, different outfit, or generic wellness model. " "Remove only source-video artifacts such as background, captions, watermarks, platform UI, compression noise, and accidental occlusion; redraw missing angles as the same subject. " "Lock the exact top color, bottom color, shoe color, neckline shape, sleeve/strap structure, seams, trim, fabric finish, fit, and accessories before rendering the first view, then repeat those same clothing decisions in every other view. " ) if req.reconstruction_mode == "same" else ( "PACK BIBLE - this exact bible applies to every view in this generated set. " "Subject bible: one newly designed commercial wellness-ad subject; inherit only broad non-identifying casting traits from the source such as gender presentation, regional/ethnic appearance category, skin-tone family, age range, body-proportion category, hair-length family, posture energy, and neck/shoulder readability. " "Do not copy the source person's biometric identity, exact face, exact hairstyle, marks, tattoos, captions, logos, or watermarks. " "Keep the same new face design, same head shape, same hair color and hair silhouette, same skin tone, same body proportions, same height impression, and same character age across front, side, three-quarter, and back views. " "Wardrobe bible: if the user direction names a specific outfit, use that one outfit uniformly across every view. Otherwise use one clean SKG wellness-ad activewear outfit for the entire pack: fitted short-sleeve performance top with a visible neck/collarbone area, slim athletic pants, and low-profile sneakers. " "Lock the exact top color, bottom color, shoe color, neckline shape, sleeve/strap structure, seams, trim, fabric finish, fit, and accessories before rendering the first view, then repeat those same clothing decisions in every other view. " "Never add or remove a jacket, blazer, hoodie, coat, dress, skirt, scarf, hat, bag, jewelry, logo, stripe pattern, or extra layer in only one view. " "Back and side views must show the same garment wrapping around the same body, not a redesigned outfit. " ) ) neck_product_clause = ( "This subject pack is for SKG neck-and-shoulder wearable massage device videos. " "Make the neck, collarbone, shoulder line, upper back, side neck, and shoulder slope clear and product-ready. " "Avoid bulky collars, scarves, hair, hoods, props, or poses that hide the neck/shoulder placement area. " "For back and close-up views, prioritize the cervical spine, shoulder blades, upper trapezius, and clean wearable-device contact area. " ) models = SUBJECT_ASSET_IMAGE_MODELS model_preference = _normalize_image_model_preference(req.image_model_preference) reference_image_count = len(model_src) if isinstance(model_src, list) else (1 if model_src else 0) generated: list[SubjectAsset] = [] generation_errors: list[str] = [] first_generation_error: RuntimeError | None = None pack_force_fallback_model = model_preference == "auto" and _image_primary_circuit_open() try: for view, view_label in _subject_view_labels(req.subject_kind, req.views): closeup_view = view in {"bust", "back_detail", "bust_front", "bust_left_45", "bust_right_45", "back_neck_detail"} or "detail" in view if req.subject_kind == "living": if closeup_view: view_prompt = f"upper-body shoulder-and-neck close-up character reference, {view_label}" elif view.startswith("expression_"): emotion = view_label.replace("表情", "") view_prompt = f"full-body upright standing character reference with a clear {emotion} facial expression" elif view.startswith("action_") or view == "side_walk": view_prompt = f"full-body upright standing character reference, {view_label}, consistent actor proportions" else: view_prompt = f"full-body upright standing character reference, {view_label}" else: view_prompt = f"complete object/product reference, {view_label} view" view_name = view.replace("_", " ") projection_clause = _subject_view_projection_clause(view) single_view_clause = ( f"Single-image output rule: this output file is ONLY for the {view_label} view ({view_name}). " "Render exactly one subject, one time, in one pose and one camera angle. " "Do not create a multi-view sheet, contact sheet, grid, storyboard, lineup, comparison layout, before/after layout, mirrored pair, duplicate subjects, thumbnails, labels, captions, arrows, view names, panel borders, or multiple versions in the same image. " "Do not include any other views in this image. " + projection_clause ) framing_clause = ( "For this close-up view, intentionally crop as an upper-body asset from head/neck to chest or upper back; the neck, shoulders, collarbone or upper spine area must be large, clear, and useful for placing a neck-and-shoulder massage device. " "Do not force full-body framing for close-ups. " if closeup_view and req.subject_kind == "living" else "The subject must be complete, centered, full body or full object, head-to-feet visible when applicable, not cropped by the canvas. Make the subject large and readable: it should occupy about 88-94% of the image height, with the head close to the top margin and feet close to the bottom margin. No tiny character, no miniature person, no distant full-body figure, no large empty white margins. " ) if similar_mode and reference_image_count: reference_strategy_clause = ( f"Image-conditioned reference reconstruction mode: {reference_image_count} selected source reference image(s) are attached to this request. " "First read the attached frames and the written source brief, then generate a new similar but non-identical subject. " "Use the images as visual evidence for broad role, gender presentation, regional/ethnic appearance category, skin-tone family, body proportion, hair family, outfit category/color family, pose language, and creator-ad energy. " "Do not copy exact face, biometric identity, unique marks, source pixels, captions, watermarks, or background. " + source_subject_clause + template_brief_clause ) elif similar_mode: reference_strategy_clause = ( "Text-only generation mode: no source image is attached to this image request. Use only the written source/video/template briefs below as creative constraints. " "This is intentionally NOT image editing and NOT identity replication. " + source_subject_clause + template_brief_clause ) else: reference_strategy_clause = ( f"Source-locked image reference mode: {reference_image_count} selected source reference image(s) are attached and are the primary visual evidence. " "Preserve the visible source subject's identity impression, proportions, silhouette, material, colors, wardrobe, styling, and distinctive non-artifact details across all generated views. " "Do not crop, cut out, paste, trace, or extract pixels from the source; redraw a clean production-ready asset of the same visible subject. " + source_subject_clause + template_brief_clause ) prompt = ( reference_strategy_clause + f"Generate one newly rendered {view_prompt} for {target}. " f"The subject is a {kind_phrase}. Treat all source evidence as one role and one consistent subject bible, not multiple subjects. " + single_view_clause + identity_clause + identity_lock_clause + wardrobe_lock_clause + pack_bible_clause + neck_product_clause + canvas_clause + prompt_extra_clause + subject_profile_clause + actor_style_clause + cartoon_style_clause + framing_clause + f"Create a high-definition standalone asset on a solid {bg_phrase} background. " "No extra objects, no props, no additional products, no background elements, no original scene fragments, no shadows from the original scene, no text, no watermark, no UI. " "If the source is incomplete, partially visible, occluded, or low resolution, reconstruct the missing parts by redrawing a clean complete subject while staying consistent with the reference. " "For living standard full-body views, keep a normal upright standing pose; do not create sitting, walking, medical, horror, or distorted anatomy unless explicitly requested by the view label. " + transparent_character_clause ) try: if similar_mode and model_src is not None: print( f"[subject assets] reconstruction_mode=similar endpoint=/images/edits view={view} image_refs={reference_image_count} model_preference={model_preference}", flush=True, ) img_bytes, _mode = _image_edit_call(model_src, prompt, models=models, fallback_text=False, max_attempts=3, max_side=1280, force_fallback_model=pack_force_fallback_model, image_model_preference=model_preference) if model_preference == "auto" and _mode.endswith(f":{IMAGE_FALLBACK_MODEL}"): pack_force_fallback_model = True elif similar_mode: print( f"[subject assets] reconstruction_mode=similar endpoint=/images/generations view={view} image_refs=0 model_preference={model_preference}", flush=True, ) img_bytes, _mode = _image_text_call(prompt, models=models, max_attempts=3, force_fallback_model=pack_force_fallback_model, image_model_preference=model_preference) if model_preference == "auto" and _mode.endswith(f":{IMAGE_FALLBACK_MODEL}"): pack_force_fallback_model = True else: if model_src is None: raise RuntimeError("subject asset edit reference image missing") img_bytes, _mode = _image_edit_call(model_src, prompt, models=models, fallback_text=False, max_attempts=3, max_side=1280, force_fallback_model=pack_force_fallback_model, image_model_preference=model_preference) if model_preference == "auto" and _mode.endswith(f":{IMAGE_FALLBACK_MODEL}"): pack_force_fallback_model = True except RuntimeError as e: if first_generation_error is None: first_generation_error = e generation_errors.append(f"{view_label}: {e}") print(f"[subject assets] view failed job={job_id} view={view} error={e}", flush=True) continue asset_id = f"subject_{idx:03d}_{element_id}_{view}_{uuid.uuid4().hex[:8]}" out_path = job_dir(job_id) / "assets" / f"{asset_id}.jpg" width, height = _normalize_asset_image(img_bytes, out_path, _source_frame_path(job_id, idx), req.size, req.background, square=False, fill_subject=True) generated.append(SubjectAsset( id=asset_id, view=view, label=f"{el.name_zh} · {view_label}" + (f" · {character_label}" if character_label else ""), url=_asset_url(job_id, asset_id), width=width, height=height, background=req.background, quality=req.quality, size=req.size, source_frame_indices=source_indices, status="completed", progress=100, error="", pack_id=req.pack_id, pack_label=req.pack_label, pack_mode=req.pack_mode, pack_created_at=req.pack_created_at or _time.time(), created_at=_time.time(), )) finally: for p in (tmp_focus,): if p and p.exists(): try: p.unlink() except OSError: pass if not generated: if first_generation_error: raise HTTPException(_image_error_status(first_generation_error), f"subject assets failed: {'; '.join(generation_errors[:3])}") raise HTTPException(500, "subject assets failed: no images generated") src = _source_frame_path(job_id, idx) new_frames = [] for f in job.frames: if f.index == idx: f.quality_report = _image_quality_report(src, el.region) for e in f.elements: if e.id == element_id: e.subject_kind = req.subject_kind e.cutout_background = req.background current_assets = e.subject_assets or [] if req.replace_views: replaced_views = {asset.view for asset in generated} replace_pack_id = (req.pack_id or "").strip() for old_asset in current_assets: should_replace = old_asset.view in replaced_views and ( old_asset.pack_id == replace_pack_id if replace_pack_id else True ) if should_replace: _delete_subject_asset_file(job_id, old_asset.id) current_assets = [ asset for asset in current_assets if not ( asset.view in replaced_views and ( asset.pack_id == replace_pack_id if replace_pack_id else True ) ) ] final_assets = current_assets + generated e.subject_assets = final_assets if req.subject_kind == "living": current_brief = (e.subject_consensus_brief or "").strip() should_refresh_brief = bool(selected_template_brief) or not current_brief or len(generated) >= 3 if should_refresh_brief: fallback_parts = [ selected_template_brief, (req.subject_profile.resolved_summary if req.subject_profile else ""), source_subject_brief, prompt_extra, ] fallback_brief = " ".join(part.strip() for part in fallback_parts if part and part.strip())[:1800] if selected_template_brief: e.subject_consensus_brief = _ensure_english(selected_template_brief)[:1800] else: asset_paths = [ job_dir(job_id) / "assets" / f"{asset.id}.jpg" for asset in final_assets[:10] if asset.id ] brief = _describe_subject_consensus_from_images( e.name_zh or e.name_en or "generated subject", req.subject_style, asset_paths, fallback_brief, ) e.subject_consensus_brief = _ensure_english(brief or current_brief or fallback_brief or ( "Generated SKG ad subject; identity brief unavailable. Keep one consistent commercial subject with clear neck and shoulder placement area." ))[:1800] if e.subject_consensus_brief and not e.subject_consensus_brief_zh: try: e.subject_consensus_brief_zh = _translate_text_sync(e.subject_consensus_brief, "zh", max_tokens=500)[:1800] except Exception: e.subject_consensus_brief_zh = "" new_frames.append(f) if generation_errors: msg = f"主体资产包部分生成完成 · {el.name_zh} · {len(generated)} 张,失败 {len(generation_errors)} 张" error_msg = ";".join(generation_errors[:3]) else: msg = f"主体资产包生成完成 · {el.name_zh} · {len(generated)} 张" error_msg = "" update(job, frames=new_frames, message=msg, error=error_msg) return job @app.delete("/jobs/{job_id}/frames/{idx}/elements/{element_id}/subject-assets/{asset_id}", response_model=Job) def delete_subject_asset(job_id: str, idx: int, element_id: str, asset_id: str) -> Job: """删除某张主体白底视图。""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = _find_frame(job, idx) el = next((e for e in frame.elements if e.id == element_id), None) if not el: raise HTTPException(404, "element not found") assets = el.subject_assets or [] if not any(asset.id == asset_id for asset in assets): raise HTTPException(404, "subject asset not found") _delete_subject_asset_file(job_id, asset_id) new_frames = [] for f in job.frames: if f.index == idx: for e in f.elements: if e.id == element_id: e.subject_assets = [asset for asset in (e.subject_assets or []) if asset.id != asset_id] new_frames.append(f) update(job, frames=new_frames, message=f"主体视图已删除 · {el.name_zh}") return job @app.delete("/jobs/{job_id}/frames/{idx}/elements/{element_id}/cutouts/{cutout_id}", response_model=Job) def delete_cutout(job_id: str, idx: int, element_id: str, cutout_id: str) -> Job: """删除该元素的某张提取图""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") p = job_dir(job_id) / "elements" / f"{idx:03d}_{element_id}_{cutout_id}.jpg" if p.exists(): try: p.unlink() except OSError: pass removed = False new_frames = [] for f in job.frames: if f.index == idx: for e in f.elements: if e.id == element_id: if cutout_id in (e.cutouts or []): e.cutouts = [c for c in e.cutouts if c != cutout_id] removed = True # cutout_id 兼容字段:若指向被删的就清空 / 移到 cutouts 第一个 if e.cutout_id == cutout_id: e.cutout_id = e.cutouts[0] if e.cutouts else None new_frames.append(f) if not removed: raise HTTPException(404, "cutout not found in element") update(job, frames=new_frames, message=f"删除提取图") return job class UpdateStoryboardReq(BaseModel): duration: float = 0 first_image: dict | None = None last_image: dict | None = None product_images: list[dict] = Field(default_factory=list) subject_images: list[dict] = Field(default_factory=list) product_fusion_shots: list[dict] = Field(default_factory=list) visual_mode: Literal["person_only", "person_product", "product_only", "environment"] = "person_product" needs_product: bool = True needs_subject: bool = True storyboard_row_idx: int | None = None subject_brief: str = "" skg_copy_en: str = "" skg_copy_zh: str = "" scene_one_line_en: str = "" scene_one_line_zh: str = "" action_one_line_en: str = "" action_one_line_zh: str = "" selected_video_id: str = "" first_frame_plan: str = "" last_frame_plan: str = "" product_placement: str = "" subject_image: dict | None = None scene_image: dict | None = None product_image: dict | None = None action_image: dict | None = None # v1 字段(前端可不传) subject: str = "" product: str = "" scene: str = "" action: str = "" reference_ids: list[str] = [] class GenerateStoryboardVideoReq(BaseModel): prompt: str = "" duration: float = 4 count: int = 1 seed: int | None = None storyboard_row_idx: int | None = None first_image: dict | None = None last_image: dict | None = None product_images: list[dict] = Field(default_factory=list) subject_image: dict | None = None subject_images: list[dict] = Field(default_factory=list) scene_image: dict | None = None product_image: dict | None = None action_image: dict | None = None source_ref: VideoSourceRef | None = None model: str = "" size: str = "720x1280" class QuickStoryboardPlanReq(BaseModel): skg_copy_en: str = "" skg_copy_zh: str = "" scene_one_line_en: str = "" scene_one_line_zh: str = "" action_one_line_en: str = "" action_one_line_zh: str = "" subject_brief: str = "" duration: float = 4 visual_mode: Literal["person_only", "person_product", "product_only", "environment"] = "person_product" needs_product: bool = True needs_subject: bool = True class RefineStoryboardReq(BaseModel): current_plan: QuickStoryboardPlanReq = Field(default_factory=QuickStoryboardPlanReq) user_feedback: str = "" class BatchGenerateStoryboardReq(BaseModel): count_per_row: int = 4 concurrency: int = 1 model: str = "" size: str = "720x1280" def _quick_field_en(en: str, zh: str) -> str: text = (en or "").strip() if text: return _ensure_english(text) return _ensure_english((zh or "").strip()) def _subject_brief_for_frame(frame: KeyFrame | None) -> str: if not frame: return "" briefs = [ (element.subject_consensus_brief or element.subject_consensus_brief_zh or "").strip() for element in (frame.elements or []) if (element.subject_consensus_brief or element.subject_consensus_brief_zh or "").strip() ] return "\n".join(briefs[:3]) def _fallback_quick_storyboard_plan(req: QuickStoryboardPlanReq, frame: KeyFrame | None = None) -> StoryboardScene: copy_en = _quick_field_en(req.skg_copy_en, req.skg_copy_zh) or "Show the SKG massage product as a natural upgrade in this short-video beat." scene_en = _quick_field_en(req.scene_one_line_en, req.scene_one_line_zh) or "Clean vertical short-video scene with premium wellness lighting." action_en = _quick_field_en(req.action_one_line_en, req.action_one_line_zh) or "A natural creator-style subject introduces and uses the SKG neck-and-shoulder massager." subject_brief = (req.subject_brief or _subject_brief_for_frame(frame)).strip() product_placement = ( "Show the SKG white U-shaped neck-and-shoulder massager worn externally around the neck and shoulders; " "preserve realistic scale, contact pads, button placement, side thickness, and left-right asymmetry." if req.needs_product else "Do not show the SKG product in this beat unless it is only a subtle background context." ) return StoryboardScene( duration=max(3.2, min(8.0, float(req.duration or 4))), visual_mode=req.visual_mode, needs_product=bool(req.needs_product), needs_subject=bool(req.needs_subject), subject_brief=subject_brief, skg_copy_en=copy_en, skg_copy_zh=(req.skg_copy_zh or "").strip(), scene_one_line_en=scene_en, scene_one_line_zh=(req.scene_one_line_zh or "").strip(), action_one_line_en=action_en, action_one_line_zh=(req.action_one_line_zh or "").strip(), first_frame_plan=f"First frame: {scene_en}. Establish the subject state and visual problem clearly. {action_en}", last_frame_plan=f"Last frame: continue from the first frame and land on a clearer SKG product benefit moment. {action_en}", product_placement=product_placement, subject=subject_brief or ("Use a consistent similar commercial subject with clear neck and shoulder area." if req.needs_subject else "No main character required."), scene=f"{scene_en}\nVoice-over reference: {copy_en}", product=product_placement, action=f"{action_en}\nEnglish voice-over: {copy_en}", reference_ids=[], ) def _quick_storyboard_plan_sync(req: QuickStoryboardPlanReq, frame: KeyFrame | None = None) -> StoryboardScene: fallback = _fallback_quick_storyboard_plan(req, frame) if not LLM_API_KEY: return fallback subject_brief = (req.subject_brief or _subject_brief_for_frame(frame)).strip() payload = { "skg_copy_en": _quick_field_en(req.skg_copy_en, req.skg_copy_zh), "skg_copy_zh": req.skg_copy_zh, "scene_one_line_en": _quick_field_en(req.scene_one_line_en, req.scene_one_line_zh), "scene_one_line_zh": req.scene_one_line_zh, "action_one_line_en": _quick_field_en(req.action_one_line_en, req.action_one_line_zh), "action_one_line_zh": req.action_one_line_zh, "subject_brief": subject_brief, "duration": req.duration, "visual_mode": req.visual_mode, "needs_product": req.needs_product, "needs_subject": req.needs_subject, } prompt = ( "Expand this compact SKG TikTok recreation row into a complete video generation storyboard plan. " "Return strict JSON only. All English fields must be English. Chinese mirror fields may be Simplified Chinese.\n" "Schema: {\"visual_mode\":\"person_only|person_product|product_only|environment\"," "\"needs_product\":true,\"needs_subject\":true," "\"skg_copy_en\":\"...\",\"skg_copy_zh\":\"...\"," "\"scene_one_line_en\":\"...\",\"scene_one_line_zh\":\"...\"," "\"action_one_line_en\":\"...\",\"action_one_line_zh\":\"...\"," "\"subject_brief\":\"...\",\"first_frame_plan\":\"...\",\"last_frame_plan\":\"...\"," "\"product_placement\":\"...\",\"subject\":\"...\",\"scene\":\"...\",\"product\":\"...\",\"action\":\"...\"}.\n" "Rules: keep the row compact semantics; do not add medical treatment claims; product is an SKG white U-shaped neck-and-shoulder wearable massager; " "the final video prompt must be usable without user-visible first/last frame steps.\n\n" f"Input:\n{json.dumps(payload, ensure_ascii=False)}" ) try: resp = llm().chat.completions.create( model=REWRITE_MODEL, messages=[ {"role": "system", "content": "Return valid JSON only. No markdown. No commentary."}, {"role": "user", "content": prompt}, ], response_format={"type": "json_object"}, temperature=0.35, max_tokens=1400, ) raw = (resp.choices[0].message.content or "").strip() if raw.startswith("```"): match = re.search(r"\{[\s\S]*\}", raw) raw = match.group(0) if match else raw data = json.loads(raw) return StoryboardScene( duration=max(3.2, min(8.0, float(req.duration or 4))), visual_mode=data.get("visual_mode") if data.get("visual_mode") in {"person_only", "person_product", "product_only", "environment"} else fallback.visual_mode, needs_product=bool(data.get("needs_product", fallback.needs_product)), needs_subject=bool(data.get("needs_subject", fallback.needs_subject)), subject_brief=str(data.get("subject_brief") or fallback.subject_brief).strip(), skg_copy_en=_ensure_english(str(data.get("skg_copy_en") or fallback.skg_copy_en).strip()), skg_copy_zh=str(data.get("skg_copy_zh") or fallback.skg_copy_zh).strip(), scene_one_line_en=_ensure_english(str(data.get("scene_one_line_en") or fallback.scene_one_line_en).strip()), scene_one_line_zh=str(data.get("scene_one_line_zh") or fallback.scene_one_line_zh).strip(), action_one_line_en=_ensure_english(str(data.get("action_one_line_en") or fallback.action_one_line_en).strip()), action_one_line_zh=str(data.get("action_one_line_zh") or fallback.action_one_line_zh).strip(), first_frame_plan=_ensure_english(str(data.get("first_frame_plan") or fallback.first_frame_plan).strip()), last_frame_plan=_ensure_english(str(data.get("last_frame_plan") or fallback.last_frame_plan).strip()), product_placement=_ensure_english(str(data.get("product_placement") or fallback.product_placement).strip()), subject=_ensure_english(str(data.get("subject") or fallback.subject).strip()), scene=_ensure_english(str(data.get("scene") or fallback.scene).strip()), product=_ensure_english(str(data.get("product") or fallback.product).strip()), action=_ensure_english(str(data.get("action") or fallback.action).strip()), reference_ids=[], ) except Exception as e: print(f"[quick storyboard fallback] {e}", flush=True) return fallback def _storyboard_video_prompt(scene: StoryboardScene, seed: int | None = None) -> str: parts = [ "Create one vertical 9:16 short-form ad video clip for SKG.", f"English voice-over line: {_ensure_english(scene.skg_copy_en or scene.action or '')}", f"Scene: {_ensure_english(scene.scene_one_line_en or scene.scene or '')}", f"Subject + product + action: {_ensure_english(scene.action_one_line_en or scene.action or '')}", f"First frame intent: {_ensure_english(scene.first_frame_plan or '')}", f"Last frame intent: {_ensure_english(scene.last_frame_plan or '')}", f"Product placement: {_ensure_english(scene.product_placement or scene.product or '')}", f"Subject brief: {_ensure_english(scene.subject_brief or scene.subject or '')}", "Keep motion natural, creator-ad style, premium clean wellness lighting, no subtitles, no platform UI, no watermark, no medical treatment claims.", ] if seed is not None: parts.append(f"Creative variation seed: {seed}.") return "\n".join([p for p in parts if p.strip()]) class ProductFusionDescriptionReq(BaseModel): shots: list[ProductFusionShot] = Field(default_factory=list) def video_seconds(duration: float) -> str: if video_uses_ark(): if duration <= 0: return "5" return str(max(4, min(15, round(duration)))) if duration <= 6: return "4" if duration <= 10: return "8" return "12" def resolve_video_model(raw: str | None) -> str: requested = (raw or VIDEO_MODEL or "seedance").strip() lowered = requested.lower() if lowered in {"sora", "sora-2", "sora_2"}: raise HTTPException(400, "Sora 已停用,请选择当前已接入的 Seedance") return VIDEO_MODEL_ALIASES.get(lowered, requested) def normalize_video_status(status: str | None) -> Literal["queued", "in_progress", "completed", "failed"]: s = (status or "queued").lower() if s in {"completed", "complete", "succeeded", "success", "done"}: return "completed" if s in {"failed", "failure", "error", "cancelled", "canceled", "expired"}: return "failed" if s in {"running", "processing", "in_progress", "generating", "started"}: return "in_progress" return "queued" def video_progress(data: dict, fallback: int) -> int: raw = data.get("progress", data.get("percentage", data.get("percent", fallback))) try: value = int(float(raw)) except Exception: value = fallback return max(0, min(100, value)) def video_url_from_response(data: dict) -> str: for key in ("url", "video_url", "output_url", "download_url"): v = data.get(key) if isinstance(v, str) and v: return v arr = data.get("data") if isinstance(arr, list) and arr: first = arr[0] if isinstance(first, dict): for key in ("url", "video_url", "output_url", "download_url"): v = first.get(key) if isinstance(v, str) and v: return v output = data.get("output") if isinstance(output, dict): for key in ("url", "video_url", "download_url"): v = output.get(key) if isinstance(v, str) and v: return v content = data.get("content") if isinstance(content, dict): for key in ("video_url", "url", "download_url", "file_url"): v = content.get(key) if isinstance(v, str) and v: return v return "" def _video_public_error(raw: object) -> str: text = str(raw or "").strip() lower = text.lower() if any(token.lower() in lower for token in ( "InputImageSensitiveContentDetected.PrivacyInformation".lower(), "privacyinformation", "privacy information", "real person", "input image may contain real person", "human face", "face detected", "肖像", "隐私", "真人", "人脸", )): return ( "视频生成失败:参考图里有清晰人物或疑似真实人脸,视频模型出于肖像/隐私风控拒绝生成。" "请换成无可识别人脸的首帧,或先裁掉/模糊人物脸,再重新生成视频。" ) if any(token in lower for token in ( "sensitivecontent", "sensitive content", "content policy", "violate", "violation", "not allowed", "risk control", "moderation", "敏感", "安全审核", "风控", "违规", )): return ( "视频生成失败:参考图或提示词触发了视频模型的内容安全审核。" "请换一张更中性的参考图,避免真实人物、暴露、医疗夸大、危险动作或敏感文字后重试。" ) if any(token in lower for token in ("unauthorized", "invalid api key", "permission denied", "forbidden", "http 401", "http 403")): return "视频生成失败:视频通道认证或权限异常,请联系管理员检查服务器上的视频 API Key 和模型权限。" if any(token in lower for token in ("http 429", "rate limit", "too many requests", "quota", "insufficient", "balance", "限流", "额度", "余额")): return "视频生成失败:视频模型当前限流或额度不足,请稍后重试;如果持续出现,请联系管理员检查视频通道额度。" if any(token in lower for token in ("timeout", "timed out", "readtimeout", "connecttimeout", "超时")): return "视频生成失败:视频模型响应超时,可能是上游繁忙或网络不稳定。请稍后重试,或缩短时长后再生成。" if any(token in lower for token in ( "name or service not known", "temporary failure in name resolution", "nodename nor servname", "connection refused", "network is unreachable", "connecterror", "ssl:", "网络", "dns", )): return "视频生成失败:服务器连接视频模型网关异常,请稍后重试;如果连续失败,请联系管理员检查视频网关网络。" if any(token in lower for token in ("http 404", "http 405", "unsupported", "not found", "method not allowed")): return "视频生成失败:当前视频模型接口路径不可用,请联系管理员检查视频网关配置。" if lower.startswith("video status: failed") or "video status: failed" in lower: return "视频生成失败:视频模型返回生成失败。请换一张更清晰、主体更稳定的参考图,或简化提示词后重试。" if text.startswith("视频生成失败:"): return text[:500] if text: return f"视频生成失败:{text[:460]}" return "视频生成失败:未知错误,请换一张参考图或稍后重试。" def _video_create_failure_message(create_errors: list[str]) -> str: raw = " | ".join(create_errors) public = _video_public_error(raw) if public.startswith("视频生成失败:当前视频模型接口路径不可用"): return public if public.startswith("视频生成失败:") and public != f"视频生成失败:{raw[:460]}": return public return "视频生成失败:视频模型没有接受本次请求。请换一张参考图或简化提示词后重试;如果持续失败,请联系管理员。" def download_generated_video(client, base: str, headers: dict, provider_id: str, direct_url: str, out_mp4: Path) -> None: if direct_url: url = direct_url if direct_url.startswith("http") else f"{base}{direct_url if direct_url.startswith('/') else '/' + direct_url}" r = client.get(url, headers=headers if url.startswith(base) else None) else: r = client.get(f"{base}{video_path(VIDEO_CONTENT_PATH, id=provider_id)}", headers=headers) r.raise_for_status() out_mp4.write_bytes(r.content) def size_to_video_ratio(size: str) -> str: try: w, h = [int(x) for x in size.lower().replace(" ", "").split("x", 1)] except Exception: return "9:16" if w <= 0 or h <= 0: return "9:16" ratio = w / h known = { "16:9": 16 / 9, "9:16": 9 / 16, "1:1": 1, "4:3": 4 / 3, "3:4": 3 / 4, "21:9": 21 / 9, } return min(known, key=lambda key: abs(known[key] - ratio)) def ark_reference_data_url(ref_img: Path) -> str: mime = "image/png" if ref_img.suffix.lower() == ".png" else "image/jpeg" return f"data:{mime};base64,{base64.b64encode(ref_img.read_bytes()).decode('ascii')}" def submit_video_create( client, url: str, headers: dict, ref_img: Path, payload: dict, source_ref: VideoSourceRef | None = None, last_img: Path | None = None, product_imgs: list[Path] | None = None, primary_role: str = "first_frame", ): if video_uses_ark(): content = [{"type": "text", "text": payload["prompt"]}] if source_ref and source_ref.kind == "source_video" and source_ref.url: content.append( { "type": "video_url", "video_url": {"url": source_ref.url}, "role": "reference_video", } ) content.append( { "type": "image_url", "image_url": {"url": ark_reference_data_url(ref_img)}, "role": primary_role, } ) if last_img and last_img.exists(): content.append( { "type": "image_url", "image_url": {"url": ark_reference_data_url(last_img)}, "role": "last_frame", } ) for product_img in (product_imgs or [])[:6]: if product_img.exists(): content.append( { "type": "image_url", "image_url": {"url": ark_reference_data_url(product_img)}, "role": "reference_image", } ) data = { "model": payload["model"], "content": content, "ratio": size_to_video_ratio(str(payload.get("size", ""))), "duration": int(float(str(payload.get(VIDEO_DURATION_FIELD, 5)))), "watermark": False, "resolution": "720p", } return client.post(url, headers={**headers, "Content-Type": "application/json"}, json=data) if video_uses_poe(): data = dict(payload) data[VIDEO_DURATION_FIELD] = int(float(str(data.get(VIDEO_DURATION_FIELD, 4)))) data["input_image"] = base64.b64encode(ref_img.read_bytes()).decode("ascii") return client.post(url, headers=headers, json=data) with ref_img.open("rb") as fh: return client.post( url, headers=headers, data=payload, files={"input_reference": ("reference.jpg", fh, "image/jpeg")}, ) def render_storyboard_video( job_id: str, local_id: str, provider_id: str, ref_path: Path, prompt: str, model: str, seconds: str, size: str, source_ref: VideoSourceRef | None = None, last_ref_path: Path | None = None, product_ref_paths: list[Path] | None = None, primary_role: str = "first_frame", ) -> None: import httpx out_dir = job_dir(job_id) / "storyboard_videos" / local_id ref_img = out_dir / "reference.jpg" last_img = out_dir / "last_reference.jpg" out_mp4 = out_dir / "video.mp4" base = video_api_base() headers = {"Authorization": f"Bearer {video_api_key()}"} try: prepare_video_reference(ref_path, ref_img) prepared_last_img: Path | None = None if last_ref_path and last_ref_path.exists(): prepare_video_reference(last_ref_path, last_img) prepared_last_img = last_img prepared_product_imgs: list[Path] = [] for i, product_ref_path in enumerate((product_ref_paths or [])[:6], start=1): if product_ref_path.exists(): product_img = out_dir / f"product_reference_{i}.jpg" prepare_video_reference(product_ref_path, product_img) prepared_product_imgs.append(product_img) update_generated_video(job_id, local_id, status="in_progress", progress=5, queue_message="准备素材…") with httpx.Client(timeout=120) as client: payload = {"model": model, "prompt": prompt, "size": size} payload[VIDEO_DURATION_FIELD] = seconds create = None create_errors: list[str] = [] for create_path in VIDEO_CREATE_PATHS: resp = submit_video_create(client, f"{base}{video_path(create_path)}", headers, ref_img, payload, source_ref, prepared_last_img, prepared_product_imgs, primary_role) if video_uses_ark() and source_ref and resp.status_code in {400, 422}: create_errors.append(f"{video_path(create_path)} + reference_video -> HTTP {resp.status_code}: {resp.text[:700]}") resp = submit_video_create(client, f"{base}{video_path(create_path)}", headers, ref_img, payload, None, prepared_last_img, prepared_product_imgs, primary_role) if video_uses_ark() and prepared_last_img and resp.status_code in {400, 422}: create_errors.append(f"{video_path(create_path)} + last_frame -> HTTP {resp.status_code}: {resp.text[:700]}") resp = submit_video_create(client, f"{base}{video_path(create_path)}", headers, ref_img, payload, None, None, prepared_product_imgs, primary_role) if video_uses_ark() and prepared_product_imgs and resp.status_code in {400, 422}: create_errors.append(f"{video_path(create_path)} + product_reference -> HTTP {resp.status_code}: {resp.text[:700]}") resp = submit_video_create(client, f"{base}{video_path(create_path)}", headers, ref_img, payload, None, prepared_last_img, None, primary_role) if resp.status_code < 400: create = resp break create_errors.append(f"{video_path(create_path)} -> HTTP {resp.status_code}: {resp.text[:700]}") if resp.status_code not in {400, 404, 405}: resp.raise_for_status() if create is None: print(f"[video create failed] job={job_id} video={local_id} errors={' | '.join(create_errors)[:1800]}", flush=True) raise RuntimeError(_video_create_failure_message(create_errors)) data = create.json() video_api_id = data.get("id") or provider_id or local_id status = normalize_video_status(data.get("status")) progress = video_progress(data, 5) direct_url = video_url_from_response(data) status_payload = data update_generated_video( job_id, local_id, provider_id=video_api_id, status=status, progress=progress, queue_message="生成中…" if status in {"queued", "in_progress"} else "", ) deadline = time.time() + VIDEO_POLL_TIMEOUT_SECONDS while status in {"queued", "in_progress"} and time.time() < deadline: time.sleep(8) poll = client.get(f"{base}{video_path(VIDEO_STATUS_PATH, id=video_api_id)}", headers=headers) poll.raise_for_status() pdata = poll.json() status = normalize_video_status(pdata.get("status")) progress = video_progress(pdata, progress) direct_url = video_url_from_response(pdata) or direct_url status_payload = pdata update_generated_video( job_id, local_id, status=status, progress=progress, queue_message="生成中…" if status in {"queued", "in_progress"} else "", ) if status != "completed": raw_error = "" if isinstance(status_payload, dict): raw_error = str( status_payload.get("error") or status_payload.get("message") or status_payload.get("reason") or status_payload.get("fail_reason") or status_payload ) print(f"[video status failed] job={job_id} video={local_id} status={status} error={raw_error[:1200]}", flush=True) update_generated_video(job_id, local_id, status="failed", error=_video_public_error(raw_error or f"video status: {status}"), progress=progress, queue_message="") return download_generated_video(client, base, headers, video_api_id, direct_url, out_mp4) update_generated_video( job_id, local_id, status="completed", progress=100, url=f"/jobs/{job_id}/storyboard-videos/{local_id}.mp4", error="", queue_position=0, queue_size=0, queue_message="", ) except Exception as e: print(f"[video task failed] job={job_id} video={local_id} error={str(e)[:1200]}", flush=True) update_generated_video(job_id, local_id, status="failed", error=_video_public_error(e), queue_message="") @app.post("/jobs/{job_id}/frames/{idx}/storyboard/quick-plan", response_model=StoryboardScene) def quick_plan_storyboard(job_id: str, idx: int, req: QuickStoryboardPlanReq) -> StoryboardScene: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") return _quick_storyboard_plan_sync(req, frame) @app.post("/jobs/{job_id}/frames/{idx}/storyboard/refine") def refine_storyboard(job_id: str, idx: int, req: RefineStoryboardReq) -> dict: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") current = req.current_plan feedback = req.user_feedback.strip() if not feedback: raise HTTPException(400, "user_feedback required") fallback = { "skg_copy_en": _quick_field_en(current.skg_copy_en, current.skg_copy_zh), "skg_copy_zh": current.skg_copy_zh.strip(), "scene_one_line_en": _quick_field_en(current.scene_one_line_en, current.scene_one_line_zh), "scene_one_line_zh": current.scene_one_line_zh.strip(), "action_one_line_en": _quick_field_en(current.action_one_line_en, current.action_one_line_zh), "action_one_line_zh": current.action_one_line_zh.strip(), } if not LLM_API_KEY: return {"items": fallback, "model": "fallback"} prompt = ( "Rewrite this compact SKG storyboard row according to user feedback. " "Keep meaning and timing, improve clarity and video-generation usefulness. " "Return strict JSON only with exactly these fields: " "skg_copy_en, skg_copy_zh, scene_one_line_en, scene_one_line_zh, action_one_line_en, action_one_line_zh. " "English fields must be English; Chinese fields must be Simplified Chinese. " "No medical treatment claims.\n\n" f"Current:\n{json.dumps(fallback, ensure_ascii=False)}\n\n" f"User feedback:\n{feedback}" ) try: resp = llm().chat.completions.create( model=REWRITE_MODEL, messages=[ {"role": "system", "content": "Return valid JSON only. No markdown. No explanation."}, {"role": "user", "content": prompt}, ], response_format={"type": "json_object"}, temperature=0.55, max_tokens=900, ) data = json.loads((resp.choices[0].message.content or "{}").strip()) out = { "skg_copy_en": _ensure_english(str(data.get("skg_copy_en") or fallback["skg_copy_en"]).strip()), "skg_copy_zh": str(data.get("skg_copy_zh") or fallback["skg_copy_zh"]).strip(), "scene_one_line_en": _ensure_english(str(data.get("scene_one_line_en") or fallback["scene_one_line_en"]).strip()), "scene_one_line_zh": str(data.get("scene_one_line_zh") or fallback["scene_one_line_zh"]).strip(), "action_one_line_en": _ensure_english(str(data.get("action_one_line_en") or fallback["action_one_line_en"]).strip()), "action_one_line_zh": str(data.get("action_one_line_zh") or fallback["action_one_line_zh"]).strip(), } return {"items": out, "model": REWRITE_MODEL} except Exception as e: return {"items": fallback, "model": "fallback", "error": str(e)[:300]} def _enqueue_storyboard_videos(job: Job, frame: KeyFrame, req: GenerateStoryboardVideoReq, bg: BackgroundTasks | None = None) -> list[str]: ensure_video_api_configured() prompt = _ensure_english(req.prompt.strip()) if not prompt and frame.storyboard: prompt = _storyboard_video_prompt(frame.storyboard, req.seed) if not prompt: raise HTTPException(400, "prompt required") count = max(1, min(12, int(req.count or 1))) ref = req.first_image or req.subject_image or req.product_image or req.scene_image or req.action_image primary_role = "first_frame" if req.first_image else "reference_image" ref_path = storyboard_ref_path(job.id, ref) or (job_dir(job.id) / "frames" / f"{frame.index:03d}.jpg") if not ref_path.exists(): raise HTTPException(404, "reference image missing") poster = storyboard_ref_url(job.id, ref) or f"/jobs/{job.id}/frames/{frame.index}.jpg" last_ref_path = storyboard_ref_path(job.id, req.last_image) raw_product_refs = req.product_images[:6] if req.product_images else ([req.product_image] if req.product_image else []) product_ref_paths = [p for p in (storyboard_ref_path(job.id, r) for r in raw_product_refs) if p] subject_ref_paths = [p for p in (storyboard_ref_path(job.id, r) for r in req.subject_images[:8]) if p] reference_ref_paths = [] seen_ref_paths: set[str] = {str(ref_path)} # Product fusion is sensitive to object drift. Send product references before # extra character references so the rigid SKG device keeps its real shape. for p in [*product_ref_paths, *subject_ref_paths]: key = str(p) if key not in seen_ref_paths: reference_ref_paths.append(p) seen_ref_paths.add(key) model = resolve_video_model(req.model) seconds = video_seconds(float(req.duration or 4)) video_size = _normalize_video_size(req.size) source_ref = req.source_ref if source_ref and source_ref.kind == "source_video" and not source_ref.url: source_ref = None items: list[GeneratedVideo] = [] ids: list[str] = [] queued_tasks: list[tuple[str, tuple]] = [] for i in range(count): local_id = uuid.uuid4().hex[:12] ids.append(local_id) variant_seed = (req.seed + i) if req.seed is not None else random.randint(100000, 999999) variant_prompt = _ensure_english(f"{prompt}\n\nCreate variation {i + 1} of {count}. Variation seed: {variant_seed}. Keep the same compact row meaning but vary camera motion, gesture timing, and composition.") items.append(GeneratedVideo( id=local_id, provider_id="", frame_idx=frame.index, storyboard_row_idx=req.storyboard_row_idx, prompt=variant_prompt, model=model, status="queued", url="", poster_url=poster, duration=float(seconds), progress=0, created_at=time.time(), queue_message="排队中…", )) task_args = (job.id, local_id, "", ref_path, variant_prompt, model, seconds, video_size, source_ref, last_ref_path, reference_ref_paths, primary_role) queued_tasks.append((local_id, task_args)) update(job, generated_videos=items + job.generated_videos, message=f"视频候选已提交 · 分镜 {frame.index + 1} · {count} 条") for local_id, task_args in queued_tasks: enqueue_video_task(job, local_id, task_args) return ids @app.post("/jobs/{job_id}/frames/{idx}/storyboard/video", response_model=Job) def generate_storyboard_video(job_id: str, idx: int, req: GenerateStoryboardVideoReq, bg: BackgroundTasks) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") _enqueue_storyboard_videos(job, frame, req, bg) return job def _batch_generate_worker(job_id: str, req: BatchGenerateStoryboardReq) -> None: from concurrent.futures import ThreadPoolExecutor, wait job = JOBS.get(job_id) if not job: return count = max(1, min(12, int(req.count_per_row or 4))) concurrency = max(1, min(8, int(req.concurrency or 4))) frames = list(job.frames) update(job, message=f"整片视频候选排队生成已启动 · 0/{len(frames)} 条", error="") done = 0 def submit_one(frame: KeyFrame) -> None: nonlocal done try: scene = frame.storyboard if scene is None: quick_req = QuickStoryboardPlanReq( scene_one_line_en=(frame.description or {}).get("scene", "") if isinstance(frame.description, dict) else "", action_one_line_en="Use the source beat as a compact SKG product ad action with a clear subject, product, and motion.", subject_brief=_subject_brief_for_frame(frame), duration=5, ) scene = _quick_storyboard_plan_sync(quick_req, frame) frame.storyboard = scene update(job, frames=job.frames) prompt = _storyboard_video_prompt(scene) video_req = GenerateStoryboardVideoReq( prompt=prompt, duration=scene.duration or 4, count=count, first_image=scene.first_image, last_image=scene.last_image, product_images=scene.product_images, subject_images=scene.subject_images, subject_image=scene.subject_image, scene_image=scene.scene_image, product_image=scene.product_image, action_image=scene.action_image, model=req.model, size=req.size, ) _enqueue_storyboard_videos(job, frame, video_req, None) except Exception as e: update(job, error=f"分镜 {frame.index + 1} 候选生成失败:{str(e)[:220]}") finally: done += 1 update(job, message=f"整片视频候选生成中 · {done}/{len(frames)} 条") with ThreadPoolExecutor(max_workers=concurrency) as executor: futures = [executor.submit(submit_one, frame) for frame in frames] wait(futures) update(job, message=f"整片视频候选已提交 · {len(frames)}/{len(frames)} 条分镜 · 每条 {count} 个候选") @app.post("/jobs/{job_id}/storyboard/batch-generate-all", response_model=Job) def batch_generate_all_storyboard(job_id: str, req: BatchGenerateStoryboardReq) -> Job: job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") ensure_video_api_configured() if not job.frames: raise HTTPException(400, "no frames to generate") threading.Thread(target=_batch_generate_worker, args=(job_id, req), daemon=True).start() update(job, message=f"整片视频候选排队生成已启动 · {len(job.frames)} 条分镜 · 每条 {max(1, min(12, int(req.count_per_row or 4)))} 个候选") return job @app.get("/jobs/{job_id}/storyboard-videos/{video_id}.mp4") def get_storyboard_video(job_id: str, video_id: str): p = job_dir(job_id) / "storyboard_videos" / video_id / "video.mp4" if not p.exists(): raise HTTPException(404, "storyboard video not found") return FileResponse(p, media_type="video/mp4") class CopyProductLibraryAssetReq(BaseModel): product_id: str class CopyCharacterLibraryAssetReq(BaseModel): character_id: str class GenerateProductAngleAssetReq(BaseModel): source_ref: dict source_refs: list[dict] = Field(default_factory=list) source_notes: list[str] = Field(default_factory=list) target_view: str note: str = "" class AnalyzeProductViewsReq(BaseModel): refs: list[dict] = Field(default_factory=list) class SaveSubjectTemplateReq(BaseModel): name: str note: str = "" frame_idx: int element_id: str asset_ids: list[str] = Field(default_factory=list) subject_style: Literal["transparent_human", "source_actor", "cartoon_subject"] = "transparent_human" @app.get("/product-library/skg", response_model=list[ProductLibraryItem]) def list_skg_product_library() -> list[ProductLibraryItem]: """内置 SKG 白底产品图库。来源是本地筛选后的产品图 manifest。""" return load_product_library_items() @app.get("/product-library/skg/images/{filename}") def get_skg_product_library_image(filename: str): items = load_product_library_items() item = next((x for x in items if Path(x.filename).name == filename), None) if not item: raise HTTPException(404, "product library image not found") return FileResponse(product_library_file(item), media_type="image/jpeg") @app.get("/character-library/skg", response_model=list[CharacterLibraryItem]) def list_skg_character_library() -> list[CharacterLibraryItem]: """内置透明骨架人角色库。来源是桌面生成的 5 个角色参考组。""" return load_character_library_items() @app.get("/character-library/skg/images/{filename:path}") def get_skg_character_library_image(filename: str): p = character_library_file(filename) media_type = "image/png" if p.suffix.lower() == ".png" else "image/jpeg" return FileResponse(p, media_type=media_type) @app.get("/subject-templates", response_model=list[SubjectTemplateItem]) def list_subject_templates() -> list[SubjectTemplateItem]: """数据库化主体模板库。保存后的相似主体可被后续任务复用为创意参考。""" return load_subject_template_items() @app.get("/subject-templates/images/{filename:path}") def get_subject_template_image(filename: str): p = subject_template_image_file(filename) return FileResponse(p, media_type="image/jpeg") @app.post("/jobs/{job_id}/subject-templates", response_model=SubjectTemplateItem) def save_subject_template(job_id: str, req: SaveSubjectTemplateReq) -> SubjectTemplateItem: """把当前 job 里已确认的相似主体视图复制到主体模板库。""" import time as _time job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") name = req.name.strip() if not name: raise HTTPException(400, "template name required") frame = _find_frame(job, req.frame_idx) element = next((e for e in frame.elements if e.id == req.element_id), None) if not element: raise HTTPException(404, "element not found") requested_ids = [x.strip() for x in req.asset_ids if x.strip()] selected_assets = [asset for asset in (element.subject_assets or []) if not requested_ids or asset.id in requested_ids] if requested_ids: selected_assets.sort(key=lambda asset: requested_ids.index(asset.id) if asset.id in requested_ids else 999) else: selected_assets.sort(key=lambda asset: asset.created_at, reverse=True) if not selected_assets: raise HTTPException(400, "no subject assets to save") template_id = f"subject-template-{uuid.uuid4().hex[:10]}" template_dir = SUBJECT_TEMPLATE_IMAGE_DIR / template_id template_dir.mkdir(parents=True, exist_ok=True) now = _time.time() images: list[SubjectTemplateImage] = [] saved_image_paths: list[Path] = [] for asset in selected_assets: src = job_dir(job_id) / "assets" / f"{asset.id}.jpg" if not src.exists(): continue image_id = f"{asset.view}_{uuid.uuid4().hex[:8]}" filename = f"{template_id}/{image_id}.jpg" dst = SUBJECT_TEMPLATE_IMAGE_DIR / filename shutil.copy2(src, dst) saved_image_paths.append(dst) images.append(SubjectTemplateImage( id=image_id, view=asset.view, label=asset.label or asset.view, filename=filename, url=f"/subject-templates/images/{filename}", width=asset.width, height=asset.height, background=asset.background, quality=asset.quality, size=asset.size, source_asset_id=asset.id, source_frame_indices=asset.source_frame_indices, created_at=asset.created_at or now, )) if not images: raise HTTPException(404, "subject asset files missing") primary = next((image.id for image in images if image.view == "front"), images[0].id) prompt_brief = _ensure_english(_describe_subject_template_from_images( name, req.subject_style, saved_image_paths, req.note.strip(), ) or req.note.strip()) try: prompt_brief_zh = _translate_text_sync(prompt_brief, "zh", max_tokens=500) if prompt_brief else "" except Exception: prompt_brief_zh = "" item = SubjectTemplateItem( id=template_id, name=name, description=req.note.strip(), note=req.note.strip(), prompt_brief=prompt_brief, prompt_brief_zh=prompt_brief_zh, source_job_id=job_id, source_frame_idx=frame.index, source_element_id=element.id, subject_style=req.subject_style, primary_image=primary, images=images, created_at=now, updated_at=now, ) items = [item] + [existing for existing in load_subject_template_items() if existing.id != item.id] save_subject_template_items(items) try: library_id = f"lib_subjects_{uuid.uuid4().hex[:12]}" library_dir = _asset_library_item_dir("subjects", library_id) library_images: list[AssetLibraryImage] = [] for image in images: src = SUBJECT_TEMPLATE_IMAGE_DIR / image.filename if not src.exists(): continue view = re.sub(r"[^a-zA-Z0-9_-]+", "_", image.view or image.id).strip("_") or image.id dst = library_dir / "images" / f"{view}.jpg" dst.parent.mkdir(parents=True, exist_ok=True) shutil.copy2(src, dst) width, height = _library_media_size(dst) library_images.append(AssetLibraryImage( id=view, view=image.view, label=image.label or image.view, filename=f"images/{view}.jpg", width=width or image.width, height=height or image.height, created_at=image.created_at or now, )) if library_images: library_item = AssetLibraryItem( id=library_id, kind="subjects", name=name, name_zh=name, note=req.note.strip(), tags=["主体模板"], source_job_id=job_id, prompt_brief=prompt_brief, prompt_brief_zh=prompt_brief_zh, subject_style=req.subject_style, images=library_images, views=library_images, created_at=now, updated_at=now, ) _hydrate_asset_library_urls(library_item) _write_asset_item(library_item) except Exception as e: print(f"[asset library] subject template mirror failed: {e}", flush=True) return item def normalize_product_asset_image(src: Path, out: Path) -> dict: original_bytes = src.stat().st_size if src.exists() else 0 actions: list[str] = [] warnings: list[str] = [] with Image.open(src) as opened: img = ImageOps.exif_transpose(opened) original_width, original_height = img.size if img.mode in {"RGBA", "LA"} or ("transparency" in img.info): rgba = img.convert("RGBA") base = Image.new("RGB", img.size, (255, 255, 255)) base.paste(rgba, mask=rgba.getchannel("A")) img = base actions.append("透明背景已铺白") elif img.mode != "RGB": img = img.convert("RGB") actions.append("已转 RGB/JPEG") max_side = max(img.size) if max_side > PRODUCT_ASSET_MAX_SIDE: ratio = PRODUCT_ASSET_MAX_SIDE / max_side next_size = (max(1, round(img.width * ratio)), max(1, round(img.height * ratio))) img = img.resize(next_size, Image.Resampling.LANCZOS) actions.append(f"最长边压缩到 {PRODUCT_ASSET_MAX_SIDE}px") if max(original_width, original_height) >= 2400: warnings.append("原图过大已自动压缩;超高清不会提升识别稳定性") elif max_side < PRODUCT_ASSET_MIN_LONG_SIDE: ratio = PRODUCT_ASSET_MIN_LONG_SIDE / max_side next_size = (max(1, round(img.width * ratio)), max(1, round(img.height * ratio))) img = img.resize(next_size, Image.Resampling.LANCZOS) actions.append(f"低分辨率图已放大到最长边 {PRODUCT_ASSET_MIN_LONG_SIDE}px") warnings.append("原始分辨率偏低,已放大为工作图,但真实细节不会增加") if min(img.size) < PRODUCT_ASSET_MIN_SHORT_SIDE: warnings.append(f"短边低于 {PRODUCT_ASSET_MIN_SHORT_SIDE}px,细节/比例识别可能不稳") if original_bytes >= 5 * 1024 * 1024: warnings.append("原文件较大,已生成轻量 AI 工作副本") out.parent.mkdir(parents=True, exist_ok=True) img.save(out, "JPEG", quality=PRODUCT_ASSET_JPEG_QUALITY, optimize=True, progressive=True, subsampling=0) work_width, work_height = img.size return { "standard": f"AI工作副本:最长边≤{PRODUCT_ASSET_MAX_SIDE}px,建议长边≥{PRODUCT_ASSET_MIN_LONG_SIDE}px,短边≥{PRODUCT_ASSET_MIN_SHORT_SIDE}px,JPEG q{PRODUCT_ASSET_JPEG_QUALITY}", "original_width": original_width, "original_height": original_height, "width": work_width, "height": work_height, "original_bytes": original_bytes, "work_bytes": out.stat().st_size if out.exists() else 0, "max_side": PRODUCT_ASSET_MAX_SIDE, "min_long_side": PRODUCT_ASSET_MIN_LONG_SIDE, "min_short_side": PRODUCT_ASSET_MIN_SHORT_SIDE, "quality": PRODUCT_ASSET_JPEG_QUALITY, "actions": actions, "warnings": warnings, "normalized": bool(actions or warnings), } @app.post("/jobs/{job_id}/assets") async def upload_storyboard_asset(job_id: str, file: UploadFile = File(...)) -> dict: if job_id not in JOBS: raise HTTPException(404, "job not found") asset_id = uuid.uuid4().hex[:12] out_dir = job_dir(job_id) / "assets" out_dir.mkdir(parents=True, exist_ok=True) tmp = out_dir / f"{asset_id}.upload" out = out_dir / f"{asset_id}.jpg" try: tmp.write_bytes(await file.read()) asset_meta = normalize_product_asset_image(tmp, out) except Exception as e: raise HTTPException(400, f"product image upload failed: {e}") finally: try: tmp.unlink() except Exception: pass return { "kind": "asset", "frame_idx": -1, "element_id": asset_id, "cutout_id": asset_id, "label": file.filename or "SKG 产品图", "asset_meta": asset_meta, } PRODUCT_VIEW_VALUES = ["front", "left_45", "right_45", "side_thickness", "inner_contacts", "back_bottom"] PRODUCT_VIEW_BATCH_SIZE = max(1, min(12, int(os.getenv("PRODUCT_VIEW_BATCH_SIZE", "8")))) PRODUCT_VIEW_LABELS = { "front": "正面/外侧主外观", "left_45": "佩戴者左 45", "right_45": "佩戴者右 45", "side_thickness": "侧面厚度", "inner_contacts": "贴颈内侧/触点", "back_bottom": "背面/底部", } PRODUCT_BACKGROUND_VALUES = ["white", "black", "simple", "complex", "unknown"] PRODUCT_USE_TAG_VALUES = [ "hero_packshot", "wearing_scale", "inner_contact", "side_thickness", "asymmetry", "button_detail", "back_bottom", "material_texture", ] def default_product_use_tags(view: str) -> list[str]: defaults = { "front": ["hero_packshot", "asymmetry"], "left_45": ["hero_packshot", "asymmetry", "button_detail"], "right_45": ["hero_packshot", "asymmetry", "button_detail"], "side_thickness": ["side_thickness", "wearing_scale"], "inner_contacts": ["inner_contact", "wearing_scale"], "back_bottom": ["back_bottom", "material_texture"], } return defaults.get(view, ["hero_packshot"]) def normalize_product_use_tags(tags: object, view: str) -> list[str]: if isinstance(tags, str): raw_tags = re.split(r"[,,/、\s]+", tags) elif isinstance(tags, list): raw_tags = [str(x) for x in tags] else: raw_tags = [] result = [] for tag in raw_tags + default_product_use_tags(view): tag = str(tag).strip() if tag in PRODUCT_USE_TAG_VALUES and tag not in result: result.append(tag) return result[:4] def fallback_product_view(index: int) -> dict: view = PRODUCT_VIEW_VALUES[min(index, len(PRODUCT_VIEW_VALUES) - 1)] return { "view": view, "background": "unknown", "use_tags": default_product_use_tags(view), "orientation": default_product_orientation(view), "landmarks": default_product_landmarks(view), "note": f"{PRODUCT_VIEW_LABELS.get(view, view)}参考;模型识别不可用时按上传顺序自动标注,请重点复核佩戴者左/右、上/下和贴颈内侧。", "risk": "模型识别不可用,按上传顺序兜底", "confidence": 0.25, } PRODUCT_ORIENTATION_KEYS = [ "product_left", "product_right", "top", "bottom", "inner_side", "outer_side", "opening_direction", ] def default_product_orientation(view: str) -> dict: base = { "product_left": "佩戴者左侧;需人工复核图中位置", "product_right": "佩戴者右侧;需人工复核图中位置", "top": "靠近下巴/脸/颈部上沿", "bottom": "靠近锁骨/肩部下沿", "inner_side": "贴近脖子皮肤的一侧,通常可见按摩触点", "outer_side": "外壳展示面,通常可见按键/Logo/材质", "opening_direction": "U 形开口方向需结合图片复核", } if view == "inner_contacts": base["inner_side"] = "本图重点:贴颈内侧/按摩触点" elif view == "side_thickness": base["outer_side"] = "本图重点:侧厚、边缘和机身厚度" elif view in {"left_45", "right_45"}: base["opening_direction"] = "注意不要把图片左右直接当成产品佩戴者左右" return base def default_product_landmarks(view: str) -> list[str]: defaults = { "front": ["U形开口", "外壳主轮廓", "左右臂"], "left_45": ["佩戴者左侧臂", "侧边弧度", "按键/结构差异"], "right_45": ["佩戴者右侧臂", "侧边弧度", "按键/结构差异"], "side_thickness": ["机身厚度", "侧边轮廓", "佩戴比例"], "inner_contacts": ["贴颈内侧", "按摩触点", "皮肤接触面"], "back_bottom": ["背面/底部", "接口/底面", "材质细节"], } return defaults.get(view, ["U形挂脖轮廓"]) def normalize_product_orientation(value: object, view: str) -> dict: base = default_product_orientation(view) if isinstance(value, dict): for key in PRODUCT_ORIENTATION_KEYS: raw = value.get(key) if raw is None: continue text = re.sub(r"\s+", " ", str(raw)).strip().strip('"\' ,,。') if text: base[key] = text[:80] return base def normalize_product_landmarks(value: object, view: str) -> list[str]: if isinstance(value, str): raw_items = re.split(r"[,,/、\n]+", value) elif isinstance(value, list): raw_items = [str(item) for item in value] else: raw_items = [] result = [] for item in raw_items + default_product_landmarks(view): text = re.sub(r"\s+", " ", str(item)).strip().strip('"\' ,,。') if text and text not in result: result.append(text[:24]) return result[:8] def normalize_product_view_data(data: dict, index: int) -> dict: view = str(data.get("view") or "").strip().strip('"\' ,。') if view not in PRODUCT_VIEW_VALUES: return fallback_product_view(index) background = str(data.get("background") or "unknown").strip().strip('"\' ,。') if background not in PRODUCT_BACKGROUND_VALUES: background = "unknown" use_tags = normalize_product_use_tags(data.get("use_tags"), view) orientation = normalize_product_orientation(data.get("orientation"), view) landmarks = normalize_product_landmarks(data.get("landmarks"), view) note = str(data.get("note") or "").strip().strip('"\' ,,。') note = re.sub(r"\s+", " ", note)[:320] or f"{PRODUCT_VIEW_LABELS.get(view, view)}参考" risk = str(data.get("risk") or "").strip().strip('"\' ,,。') risk = re.sub(r"\s+", " ", risk)[:160] try: confidence = max(0.0, min(1.0, float(data.get("confidence", 0.5)))) except Exception: confidence = 0.5 if confidence <= 0 and not risk and landmarks: confidence = 0.65 return { "view": view, "background": background, "use_tags": use_tags, "orientation": orientation, "landmarks": landmarks, "note": note, "risk": risk, "confidence": confidence, } def parse_product_view_response(raw: str, index: int) -> dict: text = (raw or "").strip() text = re.sub(r"^```(?:json)?\s*", "", text, flags=re.I).strip() text = re.sub(r"\s*```$", "", text).strip() match = re.search(r"\{[\s\S]*\}", text) json_text = match.group(0) if match else text try: data = json.loads(json_text) except Exception: view_match = re.search(r'["\']?view["\']?\s*[::]\s*["\']?([a-z0-9_]+)', text, flags=re.I) note_match = re.search( r'["\']?note["\']?\s*[::]\s*["\']?([\s\S]*?)(?:["\']?\s*,\s*["\']?confidence|["\']?\s*[,}]\s*$)', text, flags=re.I, ) confidence_match = re.search(r'["\']?confidence["\']?\s*[::]\s*["\']?([0-9.]+)', text, flags=re.I) background_match = re.search(r'["\']?background["\']?\s*[::]\s*["\']?([a-z0-9_]+)', text, flags=re.I) tags_match = re.search(r'["\']?use_tags["\']?\s*[::]\s*\[([\s\S]*?)\]', text, flags=re.I) landmarks_match = re.search(r'["\']?landmarks["\']?\s*[::]\s*\[([\s\S]*?)(?:\]|\}\s*$)', text, flags=re.I) risk_match = re.search( r'["\']?risk["\']?\s*[::]\s*["\']?([\s\S]*?)(?:["\']?\s*[,}]\s*$)', text, flags=re.I, ) orientation = {} for key in PRODUCT_ORIENTATION_KEYS: orientation_match = re.search( rf'["\']?{key}["\']?\s*[::]\s*["\']?([^"\',,}}\]]+)', text, flags=re.I, ) if orientation_match: orientation[key] = orientation_match.group(1) data = { "view": view_match.group(1) if view_match else "", "background": background_match.group(1) if background_match else "unknown", "use_tags": re.findall(r"[a-z_]+", tags_match.group(1)) if tags_match else [], "orientation": orientation, "landmarks": re.findall(r"[\u4e00-\u9fffA-Za-z0-9/_-]+", landmarks_match.group(1)) if landmarks_match else [], "note": note_match.group(1) if note_match else "", "risk": risk_match.group(1) if risk_match else "", "confidence": confidence_match.group(1) if confidence_match else 0.45, } return normalize_product_view_data(data, index) def parse_product_view_batch_response(raw: str, indices: list[int]) -> dict[int, dict]: text = (raw or "").strip() text = re.sub(r"^```(?:json)?\s*", "", text, flags=re.I).strip() text = re.sub(r"\s*```$", "", text).strip() match = re.search(r"\{[\s\S]*\}", text) json_text = match.group(0) if match else text try: data = json.loads(json_text) except Exception: starts: list[tuple[int, int]] = [] for index in indices: found = re.search(rf'["\']?index["\']?\s*[::]\s*["\']?{index}["\']?', text) if found: starts.append((index, found.start())) if not starts and len(indices) == 1: return {indices[0]: parse_product_view_response(text, indices[0])} starts.sort(key=lambda item: item[1]) tolerant: dict[int, dict] = {} for offset, (index, start_pos) in enumerate(starts): end_pos = starts[offset + 1][1] if offset + 1 < len(starts) else len(text) tolerant[index] = parse_product_view_response(text[start_pos:end_pos], index) return tolerant raw_items = data.get("items") if isinstance(data, dict) else data if not isinstance(raw_items, list): raise ValueError("product view batch response missing items[]") allowed = set(indices) results: dict[int, dict] = {} for offset, item in enumerate(raw_items): if not isinstance(item, dict): continue try: item_index = int(item.get("index", indices[offset] if offset < len(indices) else -1)) except Exception: item_index = indices[offset] if offset < len(indices) else -1 if item_index not in allowed: continue results[item_index] = normalize_product_view_data(item, item_index) return results def product_view_batch_prompt(indices: list[int]) -> str: count = len(indices) return ( "你在识别同一款 SKG 挂脖肩颈按摩仪的产品参考图。所有图片都是同一产品,不要判断是不是不同产品,也不要把它当耳机、头戴设备或护颈枕;它是套在脖子上、外置佩戴在肩颈位置的 U 形/围脖式按摩仪,可能有内侧按摩触点、外壳按键、厚度、底部接口和左右不对称结构。\n" "先建立产品坐标系,再逐图识别:product_left=产品戴在真人脖子上时佩戴者左肩那一侧;product_right=佩戴者右肩那一侧;top=靠近下巴/脸/颈部上沿;bottom=靠近锁骨/肩部下沿;inner_side=贴近脖子皮肤/按摩触点的一侧;outer_side=外壳/按键/Logo/材质展示面。不要把图片左侧直接等同于产品左侧,必须在 orientation 里说明产品左/右/上/下分别对应图中的哪一边;不确定就写不确定并在 risk 里提醒。\n" "每张图的 view 必须从 enum 选一个:front(正面/外侧主外观), left_45(佩戴者左侧45度), right_45(佩戴者右侧45度), side_thickness(侧面厚度), inner_contacts(贴颈内侧/按摩触点), back_bottom(背面/底部/接口)。left_45/right_45 指佩戴者身体左右,不是画面左右。\n" "background enum:white, black, simple, complex, unknown。use_tags 只能从 enum 选:hero_packshot, wearing_scale, inner_contact, side_thickness, asymmetry, button_detail, back_bottom, material_texture。\n" "landmarks 用中文短词列出可见结构,例如:佩戴者左侧臂、佩戴者右侧臂、U形开口、贴颈内侧、按摩触点、侧边厚度、按键、充电口、底部、外壳材质、局部细节。note 必须用中文写给生视频模型,重点说明这张图适合约束什么,尤其要写清楚左/右/上/下、内/外侧、触点或局部细节。risk 只在可能误导生视频时写中文,如局部裁切、无法判断产品左右、上下颠倒风险、反光、遮挡、分辨率低、背景干扰;否则为空。\n" f"本次共有 {count} 张图片,图片前的 Image index 就是输出 index。必须输出同样数量的 items,且 index 不要改。只输出一行严格 JSON,不要 markdown,不要换行。\n" "{\"items\":[{\"index\":0,\"view\":\"front|left_45|right_45|side_thickness|inner_contacts|back_bottom\",\"background\":\"white|black|simple|complex|unknown\",\"use_tags\":[\"hero_packshot\"],\"orientation\":{\"product_left\":\"图中哪一侧/不可见/不确定\",\"product_right\":\"图中哪一侧/不可见/不确定\",\"top\":\"图中哪一侧/不可见/不确定\",\"bottom\":\"图中哪一侧/不可见/不确定\",\"inner_side\":\"图中哪一侧/是否可见\",\"outer_side\":\"图中哪一侧/是否可见\",\"opening_direction\":\"U形开口朝图中哪一侧/不可见/不确定\"},\"landmarks\":[\"U形开口\"],\"note\":\"中文备注\",\"risk\":\"\",\"confidence\":0.86}]}" ) def analyze_product_view(ref_path: Path, index: int) -> dict: if not (IMAGE_API_KEY if PRODUCT_VIEW_MODEL == GPT_IMAGE_MODEL else LLM_API_KEY): return fallback_product_view(index) img_b64 = base64.b64encode(ref_path.read_bytes()).decode("ascii") prompt = ( "你在识别同一款 SKG 挂脖肩颈按摩仪的一张产品参考图。它是套在脖子上的 U 形/围脖式按摩仪,不是耳机、头戴设备或护颈枕;所有上传图都属于同一产品,不要判断不同产品身份。 " "必须使用产品坐标系:product_left=戴在真人脖子上时佩戴者左肩一侧,product_right=佩戴者右肩一侧,top=靠近下巴/脸/颈部上沿,bottom=靠近锁骨/肩部下沿,inner_side=贴颈皮肤/按摩触点,outer_side=外壳/按键/Logo。不要把图片左侧直接当产品左侧;在 orientation 里写清楚产品左/右/上/下对应图中哪边,不确定就说明不确定并写 risk。 " "view 从 enum 选一个:front, left_45, right_45, side_thickness, inner_contacts, back_bottom。left_45/right_45 指佩戴者身体左右,不是画面左右。 " "background 从 enum 选:white, black, simple, complex, unknown。use_tags 只能从 enum 选:hero_packshot, wearing_scale, inner_contact, side_thickness, asymmetry, button_detail, back_bottom, material_texture。 " "landmarks 用中文短词列出可见结构,例如佩戴者左侧臂、佩戴者右侧臂、U形开口、贴颈内侧、按摩触点、侧边厚度、按键、充电口、底部、外壳材质、局部细节。note 用中文写给生视频模型,重点说明左/右/上/下、内/外侧、触点或局部细节。risk 只在可能误导生视频时写中文,否则为空。 " "Output one-line strict JSON only. Do not use markdown or line breaks. " "{\"view\":\"front|left_45|right_45|side_thickness|inner_contacts|back_bottom\",\"background\":\"white|black|simple|complex|unknown\",\"use_tags\":[\"hero_packshot\"],\"orientation\":{\"product_left\":\"图中哪一侧/不可见/不确定\",\"product_right\":\"图中哪一侧/不可见/不确定\",\"top\":\"图中哪一侧/不可见/不确定\",\"bottom\":\"图中哪一侧/不可见/不确定\",\"inner_side\":\"图中哪一侧/是否可见\",\"outer_side\":\"图中哪一侧/是否可见\",\"opening_direction\":\"U形开口朝图中哪一侧/不可见/不确定\"},\"landmarks\":[\"U形开口\"],\"note\":\"中文备注\",\"risk\":\"\",\"confidence\":0.86}." ) try: resp = product_view_llm().chat.completions.create( model=PRODUCT_VIEW_MODEL, messages=[{"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}}, ]}], response_format={"type": "json_object"}, temperature=0.1, max_tokens=1600, ) raw = (resp.choices[0].message.content or "").strip() if not raw: raw = (getattr(resp.choices[0].message, "reasoning_content", "") or "").strip() return parse_product_view_response(raw, index) except Exception as e: fallback = fallback_product_view(index) fallback["note"] = f"{fallback['note']} 识别失败:{str(e)[:80]}" return fallback def analyze_product_views_batch(paths_by_index: list[tuple[int, Path]]) -> dict[int, dict]: if not (IMAGE_API_KEY if PRODUCT_VIEW_MODEL == GPT_IMAGE_MODEL else LLM_API_KEY): return {index: fallback_product_view(index) for index, _path in paths_by_index} results: dict[int, dict] = {} for start in range(0, len(paths_by_index), PRODUCT_VIEW_BATCH_SIZE): chunk = paths_by_index[start:start + PRODUCT_VIEW_BATCH_SIZE] indices = [index for index, _path in chunk] content: list[dict] = [{"type": "text", "text": product_view_batch_prompt(indices)}] for index, path in chunk: img_b64 = base64.b64encode(path.read_bytes()).decode("ascii") content.append({"type": "text", "text": f"Image index {index}"}) content.append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}}) try: resp = product_view_llm().chat.completions.create( model=PRODUCT_VIEW_MODEL, messages=[{"role": "user", "content": content}], response_format={"type": "json_object"}, temperature=0.05, max_tokens=max(2400, min(7000, 1200 * len(chunk))), ) raw = (resp.choices[0].message.content or "").strip() if not raw: raw = (getattr(resp.choices[0].message, "reasoning_content", "") or "").strip() parsed = parse_product_view_batch_response(raw, indices) for index in indices: results[index] = parsed.get(index) or analyze_product_view(chunk[indices.index(index)][1], index) except Exception as e: for index, path in chunk: try: result = analyze_product_view(path, index) except Exception: result = fallback_product_view(index) if result.get("risk"): result["risk"] = f"{result['risk']};批量识别失败后单图兜底" else: result["risk"] = f"批量识别失败后单图兜底:{str(e)[:60]}" results[index] = result return results @app.post("/jobs/{job_id}/assets/product-views/analyze") def analyze_product_views(job_id: str, req: AnalyzeProductViewsReq) -> dict: if job_id not in JOBS: raise HTTPException(404, "job not found") path_items: list[tuple[int, Path]] = [] missing_results: dict[int, dict] = {} for index, ref in enumerate(req.refs): ref_path = storyboard_ref_path(job_id, ref) if not ref_path or not ref_path.exists(): missing_results[index] = fallback_product_view(index) else: path_items.append((index, ref_path)) batch_results = analyze_product_views_batch(path_items) if path_items else {} items = [] for index, _ref in enumerate(req.refs): result = batch_results.get(index) or missing_results.get(index) or fallback_product_view(index) items.append({ "index": index, "view": result["view"], "background": result.get("background", "unknown"), "use_tags": result.get("use_tags", default_product_use_tags(result["view"])), "orientation": result.get("orientation", default_product_orientation(result["view"])), "landmarks": result.get("landmarks", default_product_landmarks(result["view"])), "note": result["note"], "risk": result.get("risk", ""), "confidence": result["confidence"], }) used = {item["view"] for item in items} missing = [view for view in PRODUCT_VIEW_VALUES if view not in used] return {"items": items, "missing_views": missing} @app.post("/jobs/{job_id}/assets/product-angle") def generate_product_angle_asset(job_id: str, req: GenerateProductAngleAssetReq) -> dict: if job_id not in JOBS: raise HTTPException(404, "job not found") raw_refs = [req.source_ref] + list(req.source_refs or []) source_paths: list[Path] = [] seen_paths: set[str] = set() for ref in raw_refs: ref_path = storyboard_ref_path(job_id, ref) if ref_path and ref_path.exists(): key = str(ref_path) if key not in seen_paths: seen_paths.add(key) source_paths.append(ref_path) if len(source_paths) >= 6: break if not source_paths: raise HTTPException(404, "source product image not found") source_path = source_paths[0] target_view = (req.target_view or "目标视角").strip() note = (req.note or "").strip() source_notes = [re.sub(r"\s+", " ", str(item)).strip()[:180] for item in (req.source_notes or []) if str(item).strip()] source_note_clause = ( "Uploaded reference notes from the operator/view recognizer: " + " | ".join(source_notes[:6]) + ". " if source_notes else "" ) prompt = ( "Use all provided reference images as evidence for the same SKG neck-and-shoulder wearable massage product. " "Each input image is one uploaded view of the same product; do not output a board, collage, or multiple products. " f"Generate a clean product-only white-background reference image in this missing view: {target_view}. " + source_note_clause + "Preserve the exact product identity: white U-shaped wearable neck and shoulder massager that sits around the neck, asymmetric wearer-left and wearer-right details, side buttons, inner metal massage contacts, opening width, material, thickness, curvature, and real shoulder-neck wearing scale. " "Use product coordinates: wearer-left/right are the user's body left/right when worn, top is near chin/upper neck, bottom is near collarbone/shoulders, inner side touches skin, outer side is the shell/buttons. " "Do not mirror both sides into identical shapes; keep visible left/right asymmetry and believable shoulder-neck wearable proportions. " "The product should be complete, centered, isolated on pure white, large enough to inspect, with no hands, people, packaging, text, UI, watermark, extra accessories, or scene background. " "If the target view is not fully visible in the source, infer the missing surfaces conservatively from the same product design without inventing a new model. " + (f"Additional operator note: {note}. " if note else "") ) models = [GPT_IMAGE_MODEL] try: img_bytes, _mode = _image_edit_call(source_paths, prompt, models=models, fallback_text=False, max_attempts=5, max_side=1600) except RuntimeError as e: raise HTTPException(_image_error_status(e), f"product angle generation failed: {e}") asset_id = f"product_angle_{uuid.uuid4().hex[:10]}" out_path = job_dir(job_id) / "assets" / f"{asset_id}.jpg" _normalize_asset_image(img_bytes, out_path, source_path, "1024", "white", square=True, fill_subject=True) return { "kind": "asset", "frame_idx": -1, "element_id": asset_id, "cutout_id": asset_id, "label": f"AI 补角度 · {target_view}", } @app.post("/jobs/{job_id}/assets/product-library") def copy_product_library_asset(job_id: str, req: CopyProductLibraryAssetReq) -> dict: if job_id not in JOBS: raise HTTPException(404, "job not found") item = find_product_library_item(req.product_id) src = product_library_file(item) asset_id = uuid.uuid4().hex[:12] out_dir = job_dir(job_id) / "assets" out_dir.mkdir(parents=True, exist_ok=True) out = out_dir / f"{asset_id}.jpg" try: asset_meta = normalize_product_asset_image(src, out) except Exception as e: raise HTTPException(400, f"product library copy failed: {e}") label = f"产品融合 · {item.title} #{item.image_index}" return { "kind": "asset", "frame_idx": -1, "element_id": asset_id, "cutout_id": asset_id, "label": label, "asset_meta": asset_meta, } @app.post("/jobs/{job_id}/assets/character-library") def copy_character_library_assets(job_id: str, req: CopyCharacterLibraryAssetReq) -> dict: if job_id not in JOBS: raise HTTPException(404, "job not found") character = find_character_library_item(req.character_id) out_dir = job_dir(job_id) / "assets" out_dir.mkdir(parents=True, exist_ok=True) refs = [] for image in character.images: src = character_library_file(image.filename) asset_id = uuid.uuid4().hex[:12] out = out_dir / f"{asset_id}.jpg" try: img = Image.open(src).convert("RGB") img.thumbnail((1600, 1600), Image.Resampling.LANCZOS) img.save(out, "JPEG", quality=94) except Exception as e: raise HTTPException(400, f"character library copy failed: {e}") refs.append({ "kind": "asset", "frame_idx": -1, "element_id": asset_id, "cutout_id": asset_id, "label": f"角色 · {character.name} · {image.label}", }) return { "character_id": character.id, "character_name": character.name, "images": refs, } class AgentRunLog(BaseModel): ts: float level: Literal["info", "warn", "error"] = "info" message: str class AgentRun(BaseModel): id: str job_id: str owner_id: str = "" owner_name: str = "" owner_email: str = "" owner_provider: str = "" tenant_key: str = "" status: Literal["draft", "queued", "executing", "reviewing", "completed", "failed"] = "queued" stage: str = "queued" progress: int = 0 logs: list[AgentRunLog] = Field(default_factory=list) video_ids: list[str] = Field(default_factory=list) final_video_url: str = "" contact_sheet_url: str = "" error: str = "" created_at: float = Field(default_factory=time.time) updated_at: float = Field(default_factory=time.time) AGENT_RUNS: dict[str, AgentRun] = {} AGENT_DEFAULT_PRODUCT_IDS = [ "desktop-skg-product-angle-01", "desktop-skg-product-angle-02", "desktop-skg-product-angle-03", "desktop-skg-product-angle-04", ] AGENT_DEFAULT_CHARACTER_ID = os.getenv("AGENT_DEFAULT_CHARACTER_ID", "character-02").strip() or "character-02" AGENT_SHOT_COUNT = max(8, min(12, int(os.getenv("AGENT_SHOT_COUNT", "12")))) AGENT_SHOT_DURATION_SECONDS = max(4.0, min(8.0, float(os.getenv("AGENT_SHOT_DURATION_SECONDS", "5")))) AGENT_VIDEO_TIMEOUT_SECONDS = max(300, int(os.getenv("AGENT_VIDEO_TIMEOUT_SECONDS", "1500"))) def agent_run_dir(run_id: str) -> Path: return AGENT_RUNS_DIR / run_id def agent_run_path(run_id: str) -> Path: return agent_run_dir(run_id) / "state.json" def save_agent_run(run: AgentRun) -> None: run.updated_at = time.time() d = agent_run_dir(run.id) d.mkdir(parents=True, exist_ok=True) agent_run_path(run.id).write_text(run.model_dump_json(indent=2), encoding="utf-8") AGENT_RUNS[run.id] = run db.index_agent_run(run.model_dump()) def agent_log( run: AgentRun, message: str, *, stage: str | None = None, progress: int | None = None, status: Literal["draft", "queued", "executing", "reviewing", "completed", "failed"] | None = None, level: Literal["info", "warn", "error"] = "info", ) -> None: if stage is not None: run.stage = stage if progress is not None: run.progress = max(0, min(100, int(progress))) if status is not None: run.status = status run.logs = (run.logs + [AgentRunLog(ts=time.time(), level=level, message=message)])[-240:] save_agent_run(run) async def save_agent_product_upload(job_id: str, upload: UploadFile, index: int) -> dict: if not upload.filename: raise HTTPException(400, "product image filename required") content_type = (upload.content_type or "").lower() suffix = Path(upload.filename).suffix.lower() if content_type and not content_type.startswith("image/"): raise HTTPException(400, f"product image must be image/*, got {content_type}") if not content_type and suffix not in {".jpg", ".jpeg", ".png", ".webp", ".bmp"}: raise HTTPException(400, f"unsupported product image: {suffix}") out_dir = job_dir(job_id) / "assets" out_dir.mkdir(parents=True, exist_ok=True) asset_id = uuid.uuid4().hex[:12] tmp = out_dir / f"{asset_id}.upload" out = out_dir / f"{asset_id}.jpg" try: await _save_upload_to_path(upload, tmp) meta = normalize_product_asset_image(tmp, out) except Exception as e: try: out.unlink() except OSError: pass raise HTTPException(400, f"product upload failed: {e}") finally: try: tmp.unlink() except OSError: pass return { "kind": "asset", "frame_idx": -1, "element_id": asset_id, "cutout_id": asset_id, "label": f"用户产品图 {index} · {upload.filename}", "asset_meta": meta, } def agent_fallback_product_refs(job_id: str) -> list[dict]: refs: list[dict] = [] for product_id in AGENT_DEFAULT_PRODUCT_IDS: try: refs.append(copy_product_library_asset(job_id, CopyProductLibraryAssetReq(product_id=product_id))) except Exception: continue return refs def agent_subject_refs(job_id: str) -> list[dict]: try: payload = copy_character_library_assets(job_id, CopyCharacterLibraryAssetReq(character_id=AGENT_DEFAULT_CHARACTER_ID)) except Exception: return [] images = payload.get("images") or [] preferred = [] for ref in images: label = str(ref.get("label") or "") if any(key in label for key in ("正面", "左45", "半身近景", "侧面")): preferred.append(ref) return (preferred or images)[:4] def agent_base_prompt() -> str: return ( "Vertical 9:16 original SKG short-form ad. Do not copy the real person from the source video. " "Use the provided transparent anatomy subject as the recurring character when a person is needed. " "Use the provided SKG white U-shaped neck-and-shoulder massager product references as rigid product truth: " "one clean U-shaped wearable device, silver contact pads, red heat/light accents, premium white shell, correct scale around the neck and shoulders. " "No captions, no platform UI, no watermark, no medical treatment claims. Natural creator-demo pacing, clean premium lighting." ) def agent_shot_plan() -> list[dict]: base = agent_base_prompt() shots = [ ("hook", "Hook close-up: transparent anatomy character faces camera and raises the SKG neck-and-shoulder massager into the foreground, fast creator-ad opening energy, clean blue-white studio background."), ("pain", "Pain-point scene: the character sits at a desk after long screen work, shoulders tense, then notices the SKG massager beside the laptop; show neck and shoulder area clearly."), ("product_macro", "Macro product detail: slow moving close-up across the SKG U-shaped device, buttons, inner massage nodes, silver pads, premium white plastic and red heat accents."), ("wear", "Wear demo: the character places the SKG U-shaped massager externally around the back of the neck and upper shoulders, hands guiding both arms into position."), ("contact", "Heat/contact moment: close-up of silver massage pads aligned with side neck and upper trapezius, subtle red warmth glow, product outside the transparent body, no clipping."), ("office_use", "Office use beat: the character works calmly at a desk while wearing the SKG massager, small relief gesture, device stable and visible around neck and shoulders."), ("living_room", "Comfort beat: relaxed home setting, character leans back slightly, SKG device running, premium wellness mood, smooth gentle camera drift."), ("angle_proof", "Product angle proof: clean tabletop shot with the SKG U-shaped massager rotating or being lifted by hand, show thickness, contact pads, seams, and control button."), ("mobility", "Daily mobility scene: character walks from desk to sofa wearing the SKG massager, lightweight lifestyle demonstration, product silhouette remains accurate."), ("benefit", "Benefit visualization: transparent anatomy view emphasizes neck and shoulder contact zones with tasteful red warmth accents while the device stays opaque and external."), ("packaging", "Brand proof shot: SKG product and packaging on a clean surface, hand picks up the device, premium white product photography look, no extra text overlays."), ("cta", "Ending CTA: character faces camera wearing the SKG massager, then the final frame lands on a clean product hero angle with confident premium ad finish."), ] return [{"key": key, "prompt": f"{base}\n\nShot direction: {text}"} for key, text in shots[:AGENT_SHOT_COUNT]] def agent_reference_for_shot(shot_key: str, product_refs: list[dict], subject_refs: list[dict]) -> tuple[dict | None, str]: product_first = {"product_macro", "angle_proof", "packaging"} if shot_key in product_first and product_refs: return product_refs[min(2, len(product_refs) - 1)], "reference_image" if subject_refs: if shot_key in {"contact", "benefit"} and len(subject_refs) > 1: return subject_refs[min(1, len(subject_refs) - 1)], "reference_image" return subject_refs[0], "reference_image" if product_refs: return product_refs[0], "reference_image" return None, "reference_image" def agent_get_video(job_id: str, video_id: str) -> GeneratedVideo | None: job = JOBS.get(job_id) if not job: return None return next((item for item in job.generated_videos if item.id == video_id), None) def agent_wait_videos(run: AgentRun, ids: list[str], *, target_completed: int) -> list[str]: deadline = time.time() + AGENT_VIDEO_TIMEOUT_SECONDS last_summary = "" while time.time() < deadline: completed: list[str] = [] active = 0 failed = 0 for video_id in ids: item = agent_get_video(run.job_id, video_id) if not item: active += 1 continue if item.status == "completed" and item.url: completed.append(video_id) elif item.status == "failed": failed += 1 else: active += 1 summary = f"视频生成中 · 完成 {len(completed)}/{target_completed} · 运行 {active} · 失败 {failed}" if summary != last_summary: agent_log(run, summary, stage="execute", progress=58 + min(24, len(completed) * 2)) last_summary = summary if len(completed) >= target_completed or active == 0: return completed time.sleep(6) return [video_id for video_id in ids if (agent_get_video(run.job_id, video_id) and agent_get_video(run.job_id, video_id).status == "completed")] def agent_submit_shot( run: AgentRun, frame: KeyFrame, shot: dict, product_refs: list[dict], subject_refs: list[dict], retry: int = 0, ) -> str: first_ref, primary_role = agent_reference_for_shot(str(shot["key"]), product_refs, subject_refs) if not first_ref: raise RuntimeError("no reference image available for video generation") job = JOBS[run.job_id] prompt = str(shot["prompt"]) if retry: prompt += f"\n\nRetry pass {retry}: keep the same idea but simplify motion, keep the product shape stable, avoid strange anatomy or deformed product." req = GenerateStoryboardVideoReq( prompt=prompt, duration=AGENT_SHOT_DURATION_SECONDS, count=1, storyboard_row_idx=len(run.video_ids), first_image=first_ref, product_images=product_refs[:6], subject_images=subject_refs[:4], model="seedance", size="720x1280", ) # _enqueue_storyboard_videos derives the primary role from first_image. Keep the # local variable above for future provider-specific tuning without changing API. _ = primary_role ids = _enqueue_storyboard_videos(job, frame, req, None) return ids[0] def agent_compose_final(agent: AgentRun, ordered_ids: list[str]) -> None: d = agent_run_dir(agent.id) d.mkdir(parents=True, exist_ok=True) final_dir = job_dir(agent.job_id) / "final" final_dir.mkdir(parents=True, exist_ok=True) final = final_dir / f"agent-{agent.id}.mp4" concat_file = d / "concat.txt" paths: list[Path] = [] for video_id in ordered_ids: p = job_dir(agent.job_id) / "storyboard_videos" / video_id / "video.mp4" if p.exists() and p.stat().st_size > 0: paths.append(p.resolve()) if not paths: raise RuntimeError("no completed video files to compose") concat_file.write_text("".join(f"file '{str(p).replace(chr(39), chr(39) + chr(92) + chr(39) + chr(39))}'\n" for p in paths), encoding="utf-8") try: run_cmd = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(concat_file), "-c", "copy", "-movflags", "+faststart", str(final)] run(run_cmd) except Exception: run_cmd = [ "ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", str(concat_file), "-vf", "scale=720:1280,setsar=1", "-r", "24", "-c:v", "mpeg4", "-q:v", "4", "-c:a", "aac", "-b:a", "160k", "-movflags", "+faststart", str(final), ] run(run_cmd) contact = d / "contact.jpg" try: run([ "ffmpeg", "-y", "-i", str(final), "-vf", "select='not(mod(n,120))',scale=180:320,tile=12x1", "-frames:v", "1", str(contact), ]) agent.contact_sheet_url = f"/agent-runs/{agent.id}/contact.jpg" except Exception as e: agent_log(agent, f"抽帧审片图生成失败:{str(e)[:180]}", level="warn") agent.final_video_url = f"/agent-runs/{agent.id}/final.mp4" save_agent_run(agent) def agent_run_worker(run_id: str, product_refs: list[dict]) -> None: run = AGENT_RUNS[run_id] try: agent_log(run, "接管任务:创建 1 分钟二创出片流程", status="executing", stage="download", progress=4) pipeline_download(run.job_id) job = JOBS[run.job_id] if job.status == "failed": raise RuntimeError(job.error or job.message or "source video download failed") agent_log(run, f"源视频就绪 · {job.duration:.1f}s · {job.width}x{job.height}", stage="download", progress=14) refs = product_refs[:6] or agent_fallback_product_refs(run.job_id) if not refs: raise RuntimeError("需要至少 1 张产品图") update(job, product_refs=refs, message=f"Agent 已接入产品图 · {len(refs)} 张") agent_log(run, f"产品素材就绪 · {len(refs)} 张", stage="assets", progress=20) subject_refs = agent_subject_refs(run.job_id) if subject_refs: agent_log(run, f"主体参考就绪 · {len(subject_refs)} 张透明骨架角色", stage="assets", progress=24) else: agent_log(run, "未找到主体角色库,改用产品图和文本约束生成", stage="assets", progress=24, level="warn") agent_log(run, "抽取源视频节奏帧 · 12 张", stage="analyze", progress=28) pipeline_analyze(run.job_id, frame_count=12, target="transparent_human", mode="replace", quality="auto") job = JOBS[run.job_id] if not job.frames: raise RuntimeError(job.error or "keyframe extraction failed") agent_log(run, f"节奏帧完成 · {len(job.frames)} 张", stage="plan", progress=40) shots = agent_shot_plan() agent_log(run, f"生成二创镜头计划 · {len(shots)} 段 × {AGENT_SHOT_DURATION_SECONDS:g}s", stage="plan", progress=46) submitted: list[str] = [] for idx, shot in enumerate(shots): frame = job.frames[idx % len(job.frames)] video_id = agent_submit_shot(run, frame, shot, refs, subject_refs) submitted.append(video_id) run.video_ids = submitted save_agent_run(run) agent_log(run, f"提交镜头 {idx + 1:02d}/{len(shots)} · {shot['key']} · {video_id}", stage="execute", progress=48 + idx) completed = agent_wait_videos(run, submitted, target_completed=len(shots)) failed_positions = [i for i, video_id in enumerate(submitted) if video_id not in completed] if failed_positions: agent_log(run, f"有 {len(failed_positions)} 段未完成,自动重跑一次", stage="execute", progress=82, level="warn") for pos in failed_positions: frame = job.frames[pos % len(job.frames)] retry_id = agent_submit_shot(run, frame, shots[pos], refs, subject_refs, retry=1) submitted[pos] = retry_id run.video_ids = submitted save_agent_run(run) agent_log(run, f"重跑镜头 {pos + 1:02d} · {retry_id}", stage="execute", progress=83) completed = agent_wait_videos(run, submitted, target_completed=len(shots)) ordered_completed = [video_id for video_id in submitted if video_id in completed] if len(ordered_completed) < max(8, len(shots) - 2): raise RuntimeError(f"可用镜头不足:{len(ordered_completed)}/{len(shots)}") agent_log(run, f"自动审片通过 · 可用 {len(ordered_completed)}/{len(shots)} 段", status="reviewing", stage="review", progress=88) agent_log(run, "合成最终成片", stage="compose", progress=92) agent_compose_final(run, ordered_completed) agent_log(run, f"成片完成 · {len(ordered_completed)} 段", status="completed", stage="final", progress=100) except Exception as e: run.error = str(e)[:600] agent_log(run, f"任务失败:{run.error}", status="failed", stage="failed", progress=100, level="error") @app.post("/agent-runs", response_model=AgentRun) async def create_agent_run( request: Request, tk_url: str = Form(...), product_files: list[UploadFile] | None = File(None), ) -> AgentRun: if not tk_url.strip(): raise HTTPException(400, "tk_url required") user = data_user_from_request(request) job_id = uuid.uuid4().hex[:12] run_id = uuid.uuid4().hex[:12] job = Job(id=job_id, url=tk_url.strip()) assign_owner(job, user) JOBS[job_id] = job save_state(job) refs: list[dict] = [] for index, upload in enumerate((product_files or [])[:6], start=1): refs.append(await save_agent_product_upload(job_id, upload, index)) run = AgentRun(id=run_id, job_id=job_id, status="queued", stage="queued", progress=1) assign_owner(run, user) save_agent_run(run) agent_log(run, f"任务已入队 · job={job_id} · 产品图 {len(refs)} 张", status="queued", stage="queued", progress=1) db.audit(user, "agent_run.create", "agent_run", run_id, {"job_id": job_id, "product_refs": len(refs)}, request) threading.Thread(target=agent_run_worker, args=(run_id, refs), daemon=True).start() return run @app.get("/agent-runs", response_model=list[AgentRun]) def list_agent_runs(request: Request, limit: int = 20) -> list[AgentRun]: user = data_user_from_request(request) for p in AGENT_RUNS_DIR.iterdir(): if p.is_dir() and (p / "state.json").exists() and p.name not in AGENT_RUNS: try: AGENT_RUNS[p.name] = AgentRun.model_validate_json((p / "state.json").read_text(encoding="utf-8")) except Exception: pass items = [item for item in AGENT_RUNS.values() if user_can_access_agent_run(item.id, user)] items.sort(key=lambda item: item.updated_at, reverse=True) return items[:max(1, min(100, limit))] @app.get("/agent-runs/{run_id}", response_model=AgentRun) def get_agent_run(run_id: str) -> AgentRun: run = AGENT_RUNS.get(run_id) if not run and agent_run_path(run_id).exists(): run = AgentRun.model_validate_json(agent_run_path(run_id).read_text(encoding="utf-8")) AGENT_RUNS[run_id] = run if not run: raise HTTPException(404, "agent run not found") return run @app.get("/agent-runs/{run_id}/final.mp4") def get_agent_run_final(run_id: str): run = get_agent_run(run_id) p = job_dir(run.job_id) / "final" / f"agent-{run.id}.mp4" if not p.exists(): raise HTTPException(404, "final video not found") return FileResponse(p, media_type="video/mp4") @app.get("/agent-runs/{run_id}/contact.jpg") def get_agent_run_contact(run_id: str): p = agent_run_dir(run_id) / "contact.jpg" if not p.exists(): raise HTTPException(404, "contact sheet not found") return FileResponse(p, media_type="image/jpeg") def product_image_alpha(img: Image.Image) -> Image.Image: rgba = img.convert("RGBA") rgb = rgba.convert("RGB") diff = ImageChops.difference(rgb, Image.new("RGB", rgb.size, (255, 255, 255))) mask = diff.convert("L").point(lambda p: 0 if p < 18 else min(255, int(p * 2.4))) mask = mask.filter(ImageFilter.GaussianBlur(0.7)) rgba.putalpha(mask) return rgba @app.post("/jobs/{job_id}/product-fusion/guide") def create_product_fusion_guide(job_id: str, req: ProductFusionShot) -> dict: if job_id not in JOBS: raise HTTPException(404, "job not found") person_path = storyboard_ref_path(job_id, req.person_image) product_path = storyboard_ref_path(job_id, req.product_image) if not person_path or not person_path.exists(): raise HTTPException(400, "person image required") if not product_path or not product_path.exists(): raise HTTPException(400, "product image required") if not req.product_region or req.product_region.w <= 0 or req.product_region.h <= 0: raise HTTPException(400, "product region required") region = req.product_region x = max(0.0, min(1.0, float(region.x))) y = max(0.0, min(1.0, float(region.y))) w = max(0.02, min(1.0 - x, float(region.w))) h = max(0.02, min(1.0 - y, float(region.h))) try: base = Image.open(person_path).convert("RGB") base.thumbnail((1600, 1600), Image.Resampling.LANCZOS) product = product_image_alpha(Image.open(product_path)) bw, bh = base.size box = ( int(round(x * bw)), int(round(y * bh)), max(1, int(round(w * bw))), max(1, int(round(h * bh))), ) product.thumbnail((box[2], box[3]), Image.Resampling.LANCZOS) px = box[0] + max(0, (box[2] - product.width) // 2) py = box[1] + max(0, (box[3] - product.height) // 2) guide = base.convert("RGBA") guide.alpha_composite(product, (px, py)) out = guide.convert("RGB") asset_id = uuid.uuid4().hex[:12] out_dir = job_dir(job_id) / "assets" out_dir.mkdir(parents=True, exist_ok=True) out_path = out_dir / f"{asset_id}.jpg" out.save(out_path, "JPEG", quality=94) except Exception as e: raise HTTPException(400, f"product fusion guide failed: {e}") return { "kind": "asset", "frame_idx": -1, "element_id": asset_id, "cutout_id": asset_id, "label": f"产品融合引导图 · {req.image_model or 'gpt-image-2'}", } def fallback_product_fusion_descriptions() -> list[str]: return [ "清晨卧室柔光里,透明骨架人把白色 SKG 颈部按摩仪轻戴到后颈,微微闭眼露出放松微笑。", "现代客厅沙发旁,透明骨架人双手扶住 SKG 机身两侧,肩线慢慢放低,表情从紧绷变舒适。", "居家办公桌前,透明骨架人轻按 SKG 侧边控制键,颈部骨架区域清晰可见,神情安静享受。", "暖色卧室床边,透明骨架人佩戴 SKG 后轻轻仰头,白色骨架与透明外壳干净明亮,画面高级。", "落地窗自然光下,透明骨架人坐姿端正,SKG 产品贴合后颈,嘴角微扬呈现轻松舒缓状态。", "简洁浴室镜前,透明骨架人用双手调整 SKG 贴合角度,眼神柔和,产品白色机身清楚可辨。", "午后阳台休息区,透明骨架人戴着 SKG 慢慢侧头伸展,肩颈线条舒展,表情舒适而不夸张。", "高端影棚白色背景中,透明骨架人平稳转身展示 SKG 佩戴效果,产品比例真实,轮廓清晰。", "健身后休息长椅上,透明骨架人把 SKG 放上肩颈,呼吸放慢,脸上出现明显放松感。", "办公会议间隙,透明骨架人靠在椅背上佩戴 SKG,轻轻闭眼,画面传达短暂恢复和舒适休息。", "夜晚卧室暖灯下,透明骨架人坐在床沿使用 SKG,肩颈骨架被柔和光线照亮,神情安稳享受。", "城市公寓客厅里,透明骨架人一边看向窗外一边使用 SKG,动作自然,产品贴合不漂移。", "极简桌面场景中,透明骨架人拿起 SKG 靠近颈部,镜头轻推展示产品材质和佩戴准备动作。", "木质休闲椅上,透明骨架人佩戴 SKG 后轻轻呼气,肩部下沉,脸部呈现舒缓满足的微笑。", "白色商业摄影场景里,透明骨架人用指尖轻触 SKG 按键,产品细节清晰,人物状态轻松专业。", "温暖客厅地毯旁,透明骨架人坐姿放松,SKG 稳定贴合后颈,闭眼感受舒适放松的瞬间。", "窗边阅读角落中,透明骨架人戴着 SKG 翻开书页,动作慢而自然,表情平和享受。", "办公室午休场景里,透明骨架人把 SKG 戴稳后靠回椅背,眼睛半闭,颈肩明显放松。", "干净产品广告场景中,透明骨架人轻扶 SKG 两端展示佩戴贴合度,微笑自然,产品不变形。", "收尾特写镜头里,透明骨架人佩戴 SKG 后缓慢抬头微笑,白色骨架清楚,整体干净高级。", ] @app.post("/jobs/{job_id}/product-fusion/descriptions") def generate_product_fusion_descriptions(job_id: str, req: ProductFusionDescriptionReq) -> dict: if job_id not in JOBS: raise HTTPException(404, "job not found") fallback = fallback_product_fusion_descriptions() shots = (req.shots or [])[:6] if not LLM_API_KEY: return {"descriptions": fallback, "mode": "fallback"} shot_lines = [] for i, shot in enumerate(shots, start=1): first = (shot.first_image or {}).get("label") or "首帧未填" last = (shot.last_image or {}).get("label") or "尾帧未填" products = [ (ref or {}).get("label") or f"产品角度{idx + 1}未填" for idx, ref in enumerate((shot.product_images or [])[:4]) ] while len(products) < 4: products.append(f"产品角度{len(products) + 1}未填") shot_lines.append(f"{i}. 首帧={first};尾帧={last};产品角度={products[0]} / {products[1]} / {products[2]} / {products[3]};已有描述={shot.action_text or '空'}") prompt = ( "你是 SKG 产品短视频分镜导演。请写 20 条中文产品融合动作描述," "每条 35-70 字,必须说明透明骨架人在什么场景下使用产品、产品如何佩戴/展示、脸部如何舒适享受。" "产品是 SKG 白色 U 形颈部/肩颈按摩仪,四张产品角度图是同一产品的身份真源;不要写医疗治疗承诺,不要出现竞品。" "输出 JSON:{\"descriptions\":[\"...\", \"...\"]}。\n\n" + "\n".join(shot_lines) ) try: resp = llm().chat.completions.create( model=REWRITE_MODEL, messages=[ {"role": "system", "content": "只输出合法 JSON,不要解释。"}, {"role": "user", "content": prompt}, ], temperature=0.5, ) text = resp.choices[0].message.content or "" data = json.loads(text) descriptions = [str(x).strip() for x in data.get("descriptions", []) if str(x).strip()] if len(descriptions) < 20: descriptions = (descriptions + fallback)[:20] return {"descriptions": descriptions[:20], "mode": "llm"} except Exception: return {"descriptions": fallback, "mode": "fallback"} @app.get("/jobs/{job_id}/assets/{asset_id}.jpg") def get_storyboard_asset(job_id: str, asset_id: str): p = job_dir(job_id) / "assets" / f"{asset_id}.jpg" if not p.exists(): raise HTTPException(404, "asset not found") return FileResponse(p, media_type="image/jpeg") @app.delete("/jobs/{job_id}/storyboard-videos/{video_id}", response_model=Job) def delete_storyboard_video(job_id: str, video_id: str) -> Job: """删除 Video Gen 节点里的一个视频任务(成功/失败/排队都可删)。""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") before = len(job.generated_videos) removed = next((v for v in job.generated_videos if v.id == video_id), None) kept = [v for v in job.generated_videos if v.id != video_id] if len(kept) == before: raise HTTPException(404, "generated video not found") cancel_queued_video_task(job_id, video_id) out_dir = job_dir(job_id) / "storyboard_videos" / video_id if out_dir.exists(): try: shutil.rmtree(out_dir) except OSError: pass if removed: for frame in job.frames: if frame.index == removed.frame_idx and frame.storyboard and frame.storyboard.selected_video_id == video_id: frame.storyboard.selected_video_id = "" msg = f"删除视频任务 · 分镜 {removed.frame_idx + 1}" if removed else "删除视频任务" update(job, generated_videos=kept, frames=job.frames, message=msg) return job @app.put("/jobs/{job_id}/frames/{idx}/storyboard", response_model=Job) def update_storyboard(job_id: str, idx: int, req: UpdateStoryboardReq) -> Job: """更新分镜的编排字段(subject / product / scene / action / duration / reference_ids)""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") new_frames = [] for f in job.frames: if f.index == idx: f.storyboard = StoryboardScene( duration=max(0.0, float(req.duration)), first_image=req.first_image, last_image=req.last_image, product_images=list(req.product_images), subject_images=list(req.subject_images), product_fusion_shots=list(req.product_fusion_shots), visual_mode=req.visual_mode, needs_product=bool(req.needs_product), needs_subject=bool(req.needs_subject), storyboard_row_idx=req.storyboard_row_idx, subject_brief=req.subject_brief.strip(), skg_copy_en=req.skg_copy_en.strip(), skg_copy_zh=req.skg_copy_zh.strip(), scene_one_line_en=req.scene_one_line_en.strip(), scene_one_line_zh=req.scene_one_line_zh.strip(), action_one_line_en=req.action_one_line_en.strip(), action_one_line_zh=req.action_one_line_zh.strip(), selected_video_id=req.selected_video_id.strip(), first_frame_plan=req.first_frame_plan.strip(), last_frame_plan=req.last_frame_plan.strip(), product_placement=req.product_placement.strip(), subject_image=req.subject_image, scene_image=req.scene_image, product_image=req.product_image, action_image=req.action_image, subject=req.subject.strip(), product=req.product.strip(), scene=req.scene.strip(), action=req.action.strip(), reference_ids=list(req.reference_ids), ) new_frames.append(f) update(job, frames=new_frames, message=f"分镜 {idx + 1} 编排已更新") return job class PushStoryboardImageReq(BaseModel): kind: Literal["keyframe", "cutout", "asset"] frame_idx: int element_id: str | None = None cutout_id: str | None = None label: str = "" @app.post("/jobs/{job_id}/storyboard-images", response_model=Job) def push_storyboard_image(job_id: str, req: PushStoryboardImageReq) -> Job: """把一张图(关键帧本身或元素提取图)推送到分镜头编排区""" import time as _time job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") # 防重复推送:相同 frame_idx + element_id + cutout_id 已存在就跳过 for existing in job.storyboard_images: if (existing.kind == req.kind and existing.frame_idx == req.frame_idx and existing.element_id == req.element_id and existing.cutout_id == req.cutout_id): return job img = StoryboardImage( ref_id=uuid.uuid4().hex[:8], kind=req.kind, frame_idx=req.frame_idx, element_id=req.element_id, cutout_id=req.cutout_id, label=req.label.strip(), created_at=_time.time(), ) update(job, storyboard_images=job.storyboard_images + [img], message=f"上推到分镜头编排 · {req.label or req.kind}") return job @app.delete("/jobs/{job_id}/storyboard-images/{ref_id}", response_model=Job) def remove_storyboard_image(job_id: str, ref_id: str) -> Job: """从分镜头编排区移除一张图""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") before = len(job.storyboard_images) new_list = [x for x in job.storyboard_images if x.ref_id != ref_id] if len(new_list) == before: raise HTTPException(404, "storyboard image not found") update(job, storyboard_images=new_list, message="从分镜头编排移除一张图") return job @app.get("/jobs/{job_id}/frames/{idx}/elements/{element_id}/cutouts/{cutout_id}.jpg") def get_cutout_versioned(job_id: str, idx: int, element_id: str, cutout_id: str): p = job_dir(job_id) / "elements" / f"{idx:03d}_{element_id}_{cutout_id}.jpg" if not p.exists(): raise HTTPException(404, "cutout not found") return FileResponse(p, media_type="image/jpeg") @app.get("/jobs/{job_id}/frames/{idx}/elements/{element_id}/cutout.jpg") def get_cutout(job_id: str, idx: int, element_id: str): """旧路径兼容(v1 单图)→ 找 elements/{idx}_{element_id}.jpg 或 .png""" p = job_dir(job_id) / "elements" / f"{idx:03d}_{element_id}.jpg" if not p.exists(): legacy = job_dir(job_id) / "elements" / f"{idx:03d}_{element_id}.png" if legacy.exists(): return FileResponse(legacy, media_type="image/jpeg") raise HTTPException(404, "cutout not found") return FileResponse(p, media_type="image/jpeg") # ---------- 删除:关键帧 / 单张生成图 ---------- @app.delete("/jobs/{job_id}/frames/{idx}", response_model=Job) def delete_frame(job_id: str, idx: int) -> Job: """删除整张关键帧,清理所有附属文件(原图 / 干净版 / 元素抠图 / 生成图)""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") target = next((f for f in job.frames if f.index == idx), None) if not target: raise HTTPException(404, "frame not found") d = job_dir(job_id) # 删文件 — 静默错误,文件可能不存在 paths = [ d / "frames" / f"{idx:03d}.jpg", d / "cleaned" / f"{idx:03d}.jpg", ] for p in paths: if p.exists(): try: p.unlink() except OSError: pass # 该帧的所有元素抠图(命名前缀 {idx:03d}_) elements_dir = d / "elements" if elements_dir.exists(): for ext in ("png", "jpg"): for p in elements_dir.glob(f"{idx:03d}_*.{ext}"): try: p.unlink() except OSError: pass # 该帧的所有生成图 gen_dir = d / "gen" if gen_dir.exists(): for p in gen_dir.glob(f"{idx:03d}_*.jpg"): try: p.unlink() except OSError: pass new_frames = [f for f in job.frames if f.index != idx] update(job, frames=new_frames, message=f"已删除参考帧 {idx + 1}") return job @app.delete("/jobs/{job_id}/frames/{idx}/gen/{gen_id}", response_model=Job) def delete_generated(job_id: str, idx: int, gen_id: str) -> Job: """删除该 frame 的某张生成图(文件 + 列表)""" job = JOBS.get(job_id) if not job: raise HTTPException(404, "job not found") frame = next((f for f in job.frames if f.index == idx), None) if not frame: raise HTTPException(404, "frame not found") p = job_dir(job_id) / "gen" / f"{idx:03d}_{gen_id}.jpg" if p.exists(): try: p.unlink() except OSError: pass new_frames = [] found = False for f in job.frames: if f.index == idx: before = len(f.generated_images) f.generated_images = [g for g in f.generated_images if g.id != gen_id] found = len(f.generated_images) < before new_frames.append(f) if not found: raise HTTPException(404, "generated image not found") update(job, frames=new_frames, message=f"删除生成图 · 分镜 {idx + 1}") return job