HIVE Inspired Architecture for Investment Videos

Shipped pipeline and HIVE-inspired backlog on one page.

โœ… Done ๐ŸŸง In Progress ๐Ÿ”น Backlogged

Current Working Flow (What Runs Today)

Five boxes abbreviate the path. Real order: after prune, keyframes + layout vision, then encode. Full logic: internals below.

๐ŸŽฌ
Input + Ingest
yt-dlp, audio extract, ASR, cache hit skips redo.
Done
๐Ÿค–
Clip Selection
Gemini: pool, score, rank, keep set.
Done
๐Ÿช
Hook Detection
Real hook window; replaces 0โ€“3s placeholder.
Done
โœ‚๏ธ
Content Pruning
Coarse trim: head/tail only; clamped.
Done
๐Ÿ“ฑ
Render + Overlay
ffmpeg 1080ร—1920, ASS, title, compile.
Done
โ†’
โ†’
โ†’
โ†’
โ†’

๐Ÿงพ Core Artifacts

On disk: `transcript.json`, `clips.json`, `hooks.json`, `prune.json`, `layout_vision.json`; output: `short_*.mp4` (local, gitignored).

โš™๏ธ Design Principle

Pydantic schemas, JSON artifacts, hash-keyed caches; ffmpeg compile is deterministic.

๐ŸŽฏ Domain Fit

Finance long-form โ†’ ranked 50โ€“90s vertical clips.

Pipeline logic (humeo.pipeline.run_pipeline)

Abbreviated diagram above. Call order below: prune โ†’ keyframes + layout vision โ†’ render.

1 ยท Ingest Deterministic; cache skips re-download
Outputs

work_dir/source.mp4, transcript.json; optional source.info.json for manifest.

Logic
  • ingest_complete(work_dir) true โ†’ reuse source.mp4 + transcript.json (per-video dir under HUMEO_CACHE_ROOT).
  • Else: yt-dlp โ†’ ffmpeg audio โ†’ ASR (OpenAI Whisper API or WhisperX; HUMEO_TRANSCRIBE_PROVIDER).
  • Canonical JSON โ†’ transcript_sha256; downstream caches key on it.
2 ยท Clip selection (Gemini, text-only)
Artifacts

clips.json, clips.meta.json, clip_selection_raw.json.

Algorithm
  • Prompts: clip_selection_system.jinja2, clip_selection_user.jinja2; override: HUMEO_PROMPTS_DIR.
  • Pool size default 12; each candidate parses as Clip, virality_score โˆˆ [0,1].
  • Sort by score; needs_review=True loses tiebreak vs same score.
  • Keep: score โ‰ฅ threshold (0.70 default), not reviewed-out; if count < min (5), fill from top until max (8).
  • Pydantic: humeo_core.schemas.Clip. Cache skip: clips.meta.json matches transcript hash + model.
2.25 ยท Hook detection (Gemini)
Artifacts

hooks.json, hooks.meta.json, hooks_raw.json.

Why

Selector copies [0, 3]s hook example; that blocks start-trims in prune. Stage 2.25 overwrites hook_start_sec / hook_end_sec per clip after validation.

Logic
  • Bounds: 0 โ‰ค hook_start < hook_end โ‰ค clip.duration; min/max hook length per prompt.
  • Exact (0, 3) match โ†’ _looks_like_default_hook โ†’ clamps treat as no hook if detection skipped.
2.5 ยท Content pruning (coarse trim, Gemini)
Artifacts

prune.json, prune_raw.json, prune.meta.json.

Definition

Coarse = head/tail only: shift in-point in, out-point in. No internal splices. Writes trim_start_sec, trim_end_sec on Clip.

Clamp (post-LLM)
  • prune_level caps total trim as % of clip length (conservative / balanced / aggressive).
  • Post-trim duration โ‰ฅ MIN_CLIP_DURATION_SEC (50).
  • Real hook: start trim โ‰ค hook_start โˆ’ 0.25s; end trim capped so hook tail stays in window (symmetric ยฑ0.25s).
  • Segment snap: trims snapped to transcript phrase boundaries when snap logic applies (content_pruning.py).
  • LLM/API fail: trims 0, 0 for that clip; run continues.
3 ยท Keyframes + layout vision (Gemini multimodal)
Artifacts

work_dir/keyframes/*.jpg, layout_vision.json, layout_vision.meta.json.

Logic
  • clip_for_render(clip) โ†’ trimmed timeline; one Scene per clip for keyframe time range.
  • extract_keyframes: ffmpeg snapshot, one image per clip.
  • Gemini vision: JSON โ†’ LayoutInstruction (layout enum; normalized bboxes for splits). Prompt string: GEMINI_LAYOUT_VISION_PROMPT in layout_vision.py.
  • Cache key: transcript hash + clips hash + vision model id.
4 ยท Render (deterministic ffmpeg)
Outputs

output/short_<id>.mp4 + per-clip ASS under work_dir/subtitles/.

Logic
  • effective_export_bounds: source window = clip bounds minus trim_*; hooks do not widen/narrow export (metadata for prune only); clip_for_render clears hook fields.
  • humeo_core.primitives.compile: filtergraph from LayoutInstruction; five layouts; โ‰ค2 on-screen items; ASS + drawtext title.

Full Planned Architecture (HIVE Alternative You Are Building)

North-star modules. Backlogged nodes use reduced opacity.

๐ŸŽž๏ธ
A1. Media Ingest
ASR in product; scene/keyframe primitives in core.
Done
๐Ÿง 
A2. Narrative Context
Pre-select multimodal context artifact.
Backlogged
๐Ÿ‘ค
A3. Character + Dialogue Fusion
Diarization + OCR anchor + merge (paper path).
Backlogged
๐Ÿ—‚๏ธ
A4. Cross-Episode Memory
Cross-upload character + plot store.
Backlogged
๐Ÿ—๏ธ
B. Structured Editing Core
Decomposed edit stages vs one-shot.
In Progress
โ†’
โ†’
โ†’
โ†’
โ†’
๐Ÿ“ˆ
B1. Highlight Ranking
Rubric + score + rank in selector.
Done
๐Ÿช
B2. Hook + Outro Boundaries
Hook shipped; timed outro schema not.
In Progress
๐Ÿงน
B3. Coarse Content Pruning
Stage 2.5: LLM trim + clamp.
Done
๐Ÿ”ฌ
B4. Micro-Pruning
VAD/splice; not in compiler v1.
Backlogged
๐ŸŽฅ
C. Compile + Publish
humeo-core compile โ†’ MP4.
Done
โ†’
โ†’
โ†’
โ†’
โ†’

Backlog blocks โ€” paper vs this repo

A2 ยท Narrative context Backlogged

HIVE ยง3.1.5 analogue: sparse frames + transcript โ†’ narrative_context.json (summary, scene captions, value thread). Clip select consumes it so charts/OCR influence windows before Stage 2.

A3 ยท Character + dialogue fusion Backlogged

Paper: diarization, face clusters, LLM merge; OCR as anchor vs ASR. Product today: ASR text only at Stage 2; no speaker-ID or OCR-merge.

A4 ยท Cross-episode memory Backlogged

Paper: cross-episode memory. Here: one video per work_dir; no shared character DB.

B2 ยท Hook + outro boundaries In progress

Hook: Stage 2.25, schema fields. Outro: no outro_*; endings live in prompts (selector + prune), not HIVE-style Oร—E boundary search over scenes.

B4 ยท Micro-pruning Backlogged

Cut internal silence/filler (VAD, silencedetect, verbatim ASR); compiler would splice segments. Today: one contiguous -ss/-t per short.

Shipped: Staged pipeline + JSON caches + ffmpeg compile for finance long-form.
In progress: Outro as first-class data; Module A (narrative, fusion, memory).
Backlogged: narrative_context, OCR fusion, cross-episode memory, micro-prune.
UI: Status color = Done / In Progress / Backlogged.