Analysis and Corpus#
OpenMontage builds and queries a local, CLIP-indexed corpus of video and image assets for reference-driven productions. The corpus draws from free and open archives such as Archive.org, NASA, and Wikimedia Commons, plus optional stock sources when API keys are present. This enables documentary-style workflows that rely on real footage instead of generated video.
Corpus Layout#
Each production that uses analysis maintains its own corpus under the project workspace:
projects/<kebab-name>/corpus/
├── clips/ # downloaded video and image files
├── thumbnails/
│ └── <clip_id>/
│ └── frame_00.jpg # evenly spaced frames for preview
├── embeddings.npy # (N, 512) L2-normalised visual vectors
├── tag_embeddings.npy # (N, 512) L2-normalised text/tag vectors
└── index.jsonl # one metadata record per clip
The index.jsonl and NumPy files stay aligned row-by-row. The entire projects/ tree is gitignored.
Core Analysis Tools#
Four tools perform the heavy lifting. All run locally and require no paid API keys for basic operation:
video_analyzeraccepts a URL or local file and returns avideo_analysis_brief. Depths aretranscript_only,standard, anddeep. It chains metadata fetch, transcript extraction, scene detection, keyframe sampling, motion classification, and style profiling.scene_detectwraps PySceneDetect (content and threshold methods) plus FFmpeg to emit start/end timestamps for each scene.frame_samplerextracts JPEG frames by count, explicit timestamps, or scene boundaries.transcriberproduces word-level segments using WhisperX. Speaker diarization is available when anHF_TOKENis configured.
Supporting tools (video_downloader, transcript_fetcher, audio_probe) are invoked automatically by the analyzer.
Reference-Driven Workflows#
Pipelines that declare reference_input.supported: true (such as cinematic) activate analysis when a reference video is supplied at the start of a session. The agent:
- Runs the analysis tools listed in the pipeline manifest.
- Produces a
video_analysis_briefcontaining source metadata, content analysis, structure (scenes and pacing_profile), style profile, narration transcript, keyframes, and replication guidance. - Embeds the reference and user query through CLIP.
- Retrieves matching clips from the corpus using fused visual + tag similarity.
- Diversifies the selected set and builds
edit_decisionsthat respect the reference's motion and timing characteristics.
Sub-stages (for example, a short sample preview) become active once the video_analysis_brief exists.
Request real-footage behavior explicitly: "use real footage only." The system then avoids paid video generation and falls back to stock or archive clips.
Retrieval and Diversity#
The corpus supports:
- Ranking by text query (visual and tag channels blended).
- Nearest-neighbour search from a seed clip.
- Maximal Marginal Relevance diversification to avoid repetitive cuts.
Motion score, duration, and shot-type filters can be applied during retrieval. Results feed directly into the edit stage.
Verification Before Use#
Run preflight to confirm the local analysis stack and any configured stock providers are ready:
make preflight
Or inspect the live registry:
python -c "
from tools.tool_registry import registry
import json
registry.discover()
print(json.dumps(registry.provider_menu_summary(), indent=2))
See guides/project-workspace for the full directory contract and guides/running-pipelines for how to start a reference-aware session. For the list of pipelines that support reference input, see reference/available-pipelines.