HIVE Inspired Architecture for Investment Videos

Shipped pipeline and HIVE-inspired backlog on one page.

✅ Done 🟧 In Progress 🔹 Backlogged

Current Working Flow (What Runs Today)

Five boxes abbreviate the path. Real order: after prune, keyframes + layout vision, then encode. Full logic: internals below.

🎬

Input + Ingest

yt-dlp, audio extract, ASR, cache hit skips redo.

Done

🤖

Clip Selection

Gemini: pool, score, rank, keep set.

Done

🪝

Hook Detection

Real hook window; replaces 0–3s placeholder.

Done

✂️

Content Pruning

Coarse trim: head/tail only; clamped.

Done

📱

Render + Overlay

ffmpeg 1080×1920, ASS, title, compile.

Done

→

🧾 Core Artifacts

On disk: `transcript.json`, `clips.json`, `hooks.json`, `prune.json`, `layout_vision.json`; output: `short_*.mp4` (local, gitignored).

⚙️ Design Principle

Pydantic schemas, JSON artifacts, hash-keyed caches; ffmpeg compile is deterministic.

🎯 Domain Fit

Finance long-form → ranked 50–90s vertical clips.

Pipeline logic (`humeo.pipeline.run_pipeline`)

Abbreviated diagram above. Call order below: prune → keyframes + layout vision → render.

1 · Ingest Deterministic; cache skips re-download

Outputs

work_dir/source.mp4, transcript.json; optional source.info.json for manifest.

Logic

ingest_complete(work_dir) true → reuse source.mp4 + transcript.json (per-video dir under HUMEO_CACHE_ROOT).
Else: yt-dlp → ffmpeg audio → ASR (OpenAI Whisper API or WhisperX; HUMEO_TRANSCRIBE_PROVIDER).
Canonical JSON → transcript_sha256; downstream caches key on it.

2 · Clip selection (Gemini, text-only)

Artifacts

clips.json, clips.meta.json, clip_selection_raw.json.

Algorithm

Prompts: clip_selection_system.jinja2, clip_selection_user.jinja2; override: HUMEO_PROMPTS_DIR.
Pool size default 12; each candidate parses as Clip, virality_score ∈ [0,1].
Sort by score; needs_review=True loses tiebreak vs same score.
Keep: score ≥ threshold (0.70 default), not reviewed-out; if count < min (5), fill from top until max (8).
Pydantic: humeo_core.schemas.Clip. Cache skip: clips.meta.json matches transcript hash + model.

2.25 · Hook detection (Gemini)

Artifacts

hooks.json, hooks.meta.json, hooks_raw.json.

Why

Selector copies [0, 3]s hook example; that blocks start-trims in prune. Stage 2.25 overwrites hook_start_sec / hook_end_sec per clip after validation.

Logic

Bounds: 0 ≤ hook_start < hook_end ≤ clip.duration; min/max hook length per prompt.
Exact (0, 3) match → _looks_like_default_hook → clamps treat as no hook if detection skipped.

2.5 · Content pruning (coarse trim, Gemini)

Artifacts

prune.json, prune_raw.json, prune.meta.json.

Definition

Coarse = head/tail only: shift in-point in, out-point in. No internal splices. Writes trim_start_sec, trim_end_sec on Clip.

Clamp (post-LLM)

prune_level caps total trim as % of clip length (conservative / balanced / aggressive).
Post-trim duration ≥ MIN_CLIP_DURATION_SEC (50).
Real hook: start trim ≤ hook_start − 0.25s; end trim capped so hook tail stays in window (symmetric ±0.25s).
Segment snap: trims snapped to transcript phrase boundaries when snap logic applies (content_pruning.py).
LLM/API fail: trims 0, 0 for that clip; run continues.

3 · Keyframes + layout vision (Gemini multimodal)

Artifacts

work_dir/keyframes/*.jpg, layout_vision.json, layout_vision.meta.json.

Logic

clip_for_render(clip) → trimmed timeline; one Scene per clip for keyframe time range.
extract_keyframes: ffmpeg snapshot, one image per clip.
Gemini vision: JSON → LayoutInstruction (layout enum; normalized bboxes for splits). Prompt string: GEMINI_LAYOUT_VISION_PROMPT in layout_vision.py.
Cache key: transcript hash + clips hash + vision model id.

4 · Render (deterministic ffmpeg)

Outputs

output/short_<id>.mp4 + per-clip ASS under work_dir/subtitles/.

Logic

effective_export_bounds: source window = clip bounds minus trim_*; hooks do not widen/narrow export (metadata for prune only); clip_for_render clears hook fields.
humeo_core.primitives.compile: filtergraph from LayoutInstruction; five layouts; ≤2 on-screen items; ASS + drawtext title.

Full Planned Architecture (HIVE Alternative You Are Building)

North-star modules. Backlogged nodes use reduced opacity.

🎞️

A1. Media Ingest

ASR in product; scene/keyframe primitives in core.

Done

🧠

A2. Narrative Context

Pre-select multimodal context artifact.

Backlogged

👤

A3. Character + Dialogue Fusion

Diarization + OCR anchor + merge (paper path).

Backlogged

🗂️

A4. Cross-Episode Memory

Cross-upload character + plot store.

Backlogged

🏗️

B. Structured Editing Core

Decomposed edit stages vs one-shot.

In Progress

→

📈

B1. Highlight Ranking

Rubric + score + rank in selector.

Done

🪝

B2. Hook + Outro Boundaries

Hook shipped; timed outro schema not.

In Progress

🧹

B3. Coarse Content Pruning

Stage 2.5: LLM trim + clamp.

Done

🔬

B4. Micro-Pruning

VAD/splice; not in compiler v1.

Backlogged

🎥

C. Compile + Publish

humeo-core compile → MP4.

Done

→

Backlog blocks — paper vs this repo

A2 · Narrative context Backlogged

HIVE §3.1.5 analogue: sparse frames + transcript → narrative_context.json (summary, scene captions, value thread). Clip select consumes it so charts/OCR influence windows before Stage 2.

A3 · Character + dialogue fusion Backlogged

Paper: diarization, face clusters, LLM merge; OCR as anchor vs ASR. Product today: ASR text only at Stage 2; no speaker-ID or OCR-merge.

A4 · Cross-episode memory Backlogged

Paper: cross-episode memory. Here: one video per work_dir; no shared character DB.

B2 · Hook + outro boundaries In progress

Hook: Stage 2.25, schema fields. Outro: no outro_*; endings live in prompts (selector + prune), not HIVE-style O×E boundary search over scenes.

B4 · Micro-pruning Backlogged

Cut internal silence/filler (VAD, silencedetect, verbatim ASR); compiler would splice segments. Today: one contiguous -ss/-t per short.

Shipped: Staged pipeline + JSON caches + ffmpeg compile for finance long-form.

In progress: Outro as first-class data; Module A (narrative, fusion, memory).

Backlogged: narrative_context, OCR fusion, cross-episode memory, micro-prune.

UI: Status color = Done / In Progress / Backlogged.

HIVE Inspired Architecture for Investment Videos

Current Working Flow (What Runs Today)

🧾 Core Artifacts

⚙️ Design Principle

🎯 Domain Fit

Pipeline logic (humeo.pipeline.run_pipeline)

Full Planned Architecture (HIVE Alternative You Are Building)

Backlog blocks — paper vs this repo

Pipeline logic (`humeo.pipeline.run_pipeline`)