Analysis and Corpus#

OpenMontage builds and queries a local, CLIP-indexed corpus of video and image assets for reference-driven productions. The corpus draws from free and open archives such as Archive.org, NASA, and Wikimedia Commons, plus optional stock sources when API keys are present. This enables documentary-style workflows that rely on real footage instead of generated video.

Corpus Layout#

Each production that uses analysis maintains its own corpus under the project workspace:

projects/<kebab-name>/corpus/
├── clips/                  # downloaded video and image files
├── thumbnails/
│   └── <clip_id>/
│       └── frame_00.jpg    # evenly spaced frames for preview
├── embeddings.npy          # (N, 512) L2-normalised visual vectors
├── tag_embeddings.npy      # (N, 512) L2-normalised text/tag vectors
└── index.jsonl             # one metadata record per clip

The index.jsonl and NumPy files stay aligned row-by-row. The entire projects/ tree is gitignored.

Core Analysis Tools#

Four tools perform the heavy lifting. All run locally and require no paid API keys for basic operation:

video_analyzer accepts a URL or local file and returns a video_analysis_brief. Depths are transcript_only, standard, and deep. It chains metadata fetch, transcript extraction, scene detection, keyframe sampling, motion classification, and style profiling.
scene_detect wraps PySceneDetect (content and threshold methods) plus FFmpeg to emit start/end timestamps for each scene.
frame_sampler extracts JPEG frames by count, explicit timestamps, or scene boundaries.
transcriber produces word-level segments using WhisperX. Speaker diarization is available when an HF_TOKEN is configured.

Supporting tools (video_downloader, transcript_fetcher, audio_probe) are invoked automatically by the analyzer.

Reference-Driven Workflows#

Pipelines that declare reference_input.supported: true (such as cinematic) activate analysis when a reference video is supplied at the start of a session. The agent:

Runs the analysis tools listed in the pipeline manifest.
Produces a video_analysis_brief containing source metadata, content analysis, structure (scenes and pacing_profile), style profile, narration transcript, keyframes, and replication guidance.
Embeds the reference and user query through CLIP.
Retrieves matching clips from the corpus using fused visual + tag similarity.
Diversifies the selected set and builds edit_decisions that respect the reference's motion and timing characteristics.

Sub-stages (for example, a short sample preview) become active once the video_analysis_brief exists.

Request real-footage behavior explicitly: "use real footage only." The system then avoids paid video generation and falls back to stock or archive clips.

Retrieval and Diversity#

The corpus supports:

Ranking by text query (visual and tag channels blended).
Nearest-neighbour search from a seed clip.
Maximal Marginal Relevance diversification to avoid repetitive cuts.

Motion score, duration, and shot-type filters can be applied during retrieval. Results feed directly into the edit stage.

Verification Before Use#

Run preflight to confirm the local analysis stack and any configured stock providers are ready:

make preflight

Or inspect the live registry:

python -c "
from tools.tool_registry import registry
import json
registry.discover()
print(json.dumps(registry.provider_menu_summary(), indent=2))

See guides/project-workspace for the full directory contract and guides/running-pipelines for how to start a reference-aware session. For the list of pipelines that support reference input, see reference/available-pipelines.