HIVE Inspired Architecture for Investment Videos
Shipped pipeline and HIVE-inspired backlog on one page.
Current Working Flow (What Runs Today)
Five boxes abbreviate the path. Real order: after prune, keyframes + layout vision, then encode. Full logic: internals below.
๐งพ Core Artifacts
On disk: `transcript.json`, `clips.json`, `hooks.json`, `prune.json`, `layout_vision.json`; output: `short_*.mp4` (local, gitignored).
โ๏ธ Design Principle
Pydantic schemas, JSON artifacts, hash-keyed caches; ffmpeg compile is deterministic.
๐ฏ Domain Fit
Finance long-form โ ranked 50โ90s vertical clips.
Pipeline logic (humeo.pipeline.run_pipeline)
Abbreviated diagram above. Call order below: prune โ keyframes + layout vision โ render.
1 ยท Ingest Deterministic; cache skips re-download
work_dir/source.mp4, transcript.json; optional source.info.json for manifest.
ingest_complete(work_dir)true โ reusesource.mp4+transcript.json(per-video dir underHUMEO_CACHE_ROOT).- Else: yt-dlp โ ffmpeg audio โ ASR (OpenAI Whisper API or WhisperX;
HUMEO_TRANSCRIBE_PROVIDER). - Canonical JSON โ
transcript_sha256; downstream caches key on it.
2 ยท Clip selection (Gemini, text-only)
clips.json, clips.meta.json, clip_selection_raw.json.
- Prompts:
clip_selection_system.jinja2,clip_selection_user.jinja2; override:HUMEO_PROMPTS_DIR. - Pool size default 12; each candidate parses as
Clip,virality_scoreโ [0,1]. - Sort by score;
needs_review=Trueloses tiebreak vs same score. - Keep: score โฅ threshold (0.70 default), not reviewed-out; if count < min (5), fill from top until max (8).
- Pydantic:
humeo_core.schemas.Clip. Cache skip:clips.meta.jsonmatches transcript hash + model.
2.25 ยท Hook detection (Gemini)
hooks.json, hooks.meta.json, hooks_raw.json.
Selector copies [0, 3]s hook example; that blocks start-trims in prune. Stage 2.25 overwrites hook_start_sec / hook_end_sec per clip after validation.
- Bounds:
0 โค hook_start < hook_end โค clip.duration; min/max hook length per prompt. - Exact
(0, 3)match โ_looks_like_default_hookโ clamps treat as no hook if detection skipped.
2.5 ยท Content pruning (coarse trim, Gemini)
prune.json, prune_raw.json, prune.meta.json.
Coarse = head/tail only: shift in-point in, out-point in. No internal splices. Writes trim_start_sec, trim_end_sec on Clip.
prune_levelcaps total trim as % of clip length (conservative / balanced / aggressive).- Post-trim duration โฅ
MIN_CLIP_DURATION_SEC(50). - Real hook: start trim โค
hook_start โ 0.25s; end trim capped so hook tail stays in window (symmetric ยฑ0.25s). - Segment snap: trims snapped to transcript phrase boundaries when snap logic applies (
content_pruning.py). - LLM/API fail: trims
0, 0for that clip; run continues.
3 ยท Keyframes + layout vision (Gemini multimodal)
work_dir/keyframes/*.jpg, layout_vision.json, layout_vision.meta.json.
clip_for_render(clip)โ trimmed timeline; oneSceneper clip for keyframe time range.extract_keyframes: ffmpeg snapshot, one image per clip.- Gemini vision: JSON โ
LayoutInstruction(layout enum; normalized bboxes for splits). Prompt string:GEMINI_LAYOUT_VISION_PROMPTinlayout_vision.py. - Cache key: transcript hash + clips hash + vision model id.
4 ยท Render (deterministic ffmpeg)
output/short_<id>.mp4 + per-clip ASS under work_dir/subtitles/.
effective_export_bounds: source window = clip bounds minustrim_*; hooks do not widen/narrow export (metadata for prune only);clip_for_renderclears hook fields.humeo_core.primitives.compile: filtergraph fromLayoutInstruction; five layouts; โค2 on-screen items; ASS + drawtext title.
Full Planned Architecture (HIVE Alternative You Are Building)
North-star modules. Backlogged nodes use reduced opacity.
Backlog blocks โ paper vs this repo
A2 ยท Narrative context Backlogged
HIVE ยง3.1.5 analogue: sparse frames + transcript โ narrative_context.json (summary, scene captions, value thread). Clip select consumes it so charts/OCR influence windows before Stage 2.
A3 ยท Character + dialogue fusion Backlogged
Paper: diarization, face clusters, LLM merge; OCR as anchor vs ASR. Product today: ASR text only at Stage 2; no speaker-ID or OCR-merge.
A4 ยท Cross-episode memory Backlogged
Paper: cross-episode memory. Here: one video per work_dir; no shared character DB.
B2 ยท Hook + outro boundaries In progress
Hook: Stage 2.25, schema fields. Outro: no outro_*; endings live in prompts (selector + prune), not HIVE-style OรE boundary search over scenes.
B4 ยท Micro-pruning Backlogged
Cut internal silence/filler (VAD, silencedetect, verbatim ASR); compiler would splice segments. Today: one contiguous -ss/-t per short.
narrative_context, OCR fusion, cross-episode memory, micro-prune.