Daily single — testing guide

Daily single — testing guide

This page explains every kind of test used in the daily single video pipeline: what it checks, when to run it, and where results are saved. Written for operators and contributors — no need to read the Python source first.

Three layers

┌─────────────────────────────────────────────────────────────┐
│  Layer 1 — Unit tests (pytest, no video project required)   │
├─────────────────────────────────────────────────────────────┤
│  Layer 2 — Modular QA (validate-qa / video_qa stages)       │
├─────────────────────────────────────────────────────────────┤
│  Layer 3 — Legacy publish gates (validate-sync, validate-all)│
└─────────────────────────────────────────────────────────────┘

Run Layer 1 after code changes. Run Layer 2 after each pipeline phase. Run Layer 3 before upload — or rely on stage s10-final-composite, which runs the same checks.


Layer 1 — Unit tests (pytest)

Fast, offline tests. No API keys required for the core QA module tests.

conda activate test
cd /path/to/praisonaippt

# Minimal — video_qa module only
pytest tests/test_video_qa.py -q

# Full daily_single suite
pytest tests/test_daily_single_display_sync_unit.py \
       tests/test_daily_single_sync_validation.py \
       tests/test_daily_single_hook_montage.py \
       tests/test_daily_single_media_sync.py \
       tests/test_daily_single_visual_audit.py \
       tests/test_daily_single_youtube_quality.py \
       tests/test_daily_single_captions.py \
       tests/test_video_qa.py -q
Test file What it verifies
test_video_qa.py Stage registry, skip rules, s04/s05/s06 behaviour, VLM cache round-trip
test_daily_single_sync_validation.py Caption script lock, hook structure, sync suite idempotency (mocked)
test_daily_single_display_sync_unit.py Cue → asset keyword scoring, SRT parsing
test_daily_single_hook_montage.py Phrase → hero montage plan, montage validators
test_daily_single_visual_audit.py Pixel similarity thresholds, generic B-roll patterns
test_daily_single_youtube_quality.py Hook stakes, plain language, outro CTA rules
test_daily_single_media_sync.py Handoff inventory, HD video rules
test_daily_single_captions.py Sentence splitting, proportional caption fallback

!!! tip “When to run” Run the full pytest suite before merging changes to praisonaippt/daily_single/ or praisonaippt/video_qa/.


Layer 2 — Modular QA (validate-qa)

Module: praisonaippt/video_qa/
CLI: daily-single -p $PROJECT validate-qa or python -m praisonaippt.video_qa --project $PROJECT run

Each stage runs independently and writes a JSON report under merge/qa/. A rollup lives in merge/qa/summary.json.

When to run

Phase flag Run after Stages included
pre_build Scripts + handoff ready; before or after sync-assets s04, s06, s01, s02 (optional VLM)
post_vo synthesise-vo s05 (narration present per segment)
pre_assemble bookend-media s00 (hook/outro HeyGen gate)
post_build assemble-beats + build-captions s05 captions, s03, s08, s07, s09, s10
all Full rebuild audit Every configured stage
daily-single -p $PROJECT validate-qa --when pre_build
daily-single -p $PROJECT validate-qa --when post_build

# Single stage debug
daily-single -p $PROJECT validate-qa s08-av-sync
python -m praisonaippt.video_qa --project $PROJECT list

Stage reference (what each test does)

Stage Plain English Pass means
s04-knowledge “Do we have the research inputs?” manifest, video-script, handoff, beat-map, segment scripts exist
s06-coverage “Does each beat have enough visuals for its script?” No critical asset gaps; hook montage plan valid
s01-assets “Are handoff files on disk and readable?” Images/videos resolve; beat-map paths exist
s02-source-vlm “Do source B-roll clips look on-topic?” (optional) VLM samples every 5s; flags generic/stock footage
s00-bookends “Are hook and outro ready to merge?” script + narration + heygen.mp4 for 00-hook and 99-outro
s05-transcript “Does audio match the locked script?” post_vo: MP3 exists; post_captions: SRT matches script + overlap checks
s03-image-speech “Does each spoken line show the right image?” Display sync: ≥35% keyword alignment per cue
s08-av-sync “Is the timeline coherent?” Hook structure, word-level match (hook/outro), section durations vs timeline.json
s07-framing “Are HeyGen clips the expected resolution?” Hook/outro dimensions (warn-only)
s09-on-screen-text “Any long cues with weak visual match?” Flags cues with ≥6 words and low alignment
s10-final-composite “Production gate” Visual audit 5s samples + sync×3 + validate-all

Full stage config: Video QA.

Degradation (warn, not fail)

Some environments cannot run every check. The suite records flags in summary.json:

Flag Cause Behaviour
whisper: missing_timestamps Whisper/transcribe failed for beat segments Proportional captions used; s05 passes with warnings
vlm: offline No OPENAI_API_KEY s02 skipped
final_mp4: missing No merge/final.mp4 post_build visual stages skipped

Set PRAISONAIPPT_QA_OFFLINE=1 in CI to skip API-dependent stages.


Layer 3 — Legacy publish gates

These pre-date the modular video_qa package but remain the authoritative publish bar. Stage s10 runs them automatically; you can also run them standalone.

validate-display

Maps every SRT cue to the visual shown at the cue midpoint.

Check Threshold
Keyword alignment 0.35 per cue
Borderline band 0.35–0.45 (passes but worth spot-check)

Output: merge/display_sync_report.json

daily-single -p $PROJECT validate-display

validate-spoken-visual

Stricter full-video gate: montage fragments, slide windows (worst overlapping cue), chart/plain-language checks, transition samples at every image change, coverage, and plain-language rules.

Output: merge/spoken_visual_sync_report.json — require "ok": true before publish.

daily-single -p $PROJECT validate-spoken-visual

Cue-aligned rebuild (beat-06, beat-01 views): build-captionsassemble-beatsvalidate-displayvalidate-spoken-visual. Skill: .cursor/skills/daily-single-video-pipeline/spoken-visual-sync.md

validate-slide-quality / validate-engagement-assets / validate-viral-readiness

Professional and viral publish gates (trust-audit uses stricter thresholds via variant: trust-audit in beat-map).

Command Output What it checks
validate-slide-quality merge/slide_design_report.json Body PNG tier mix — rejects text_slide-heavy decks
validate-engagement-assets merge/engagement_report.json Motion ratio, clip beats, social captures, demo beats
validate-viral-readiness merge/viral_readiness_report.json Composite: slide + engagement + hook motion + proof density

Full matrix: .cursor/skills/daily-single-video-pipeline/scripts/run-publish-gate.sh

Unit tests:

pytest tests/test_cue_slide_sync.py tests/test_spoken_visual_sync.py \
       tests/test_slide_design_audit.py tests/test_engagement_audit.py \
       tests/test_viral_readiness.py tests/test_video_qa.py -q

validate-sync --runs 3

Runs the full spoken↔visual suite three times and requires identical results (idempotency).

Sub-check What it does
caption_script_lock SRT text equals locked script.md — not raw Whisper text
hook_structure Cues 1–3 = attention → overview → “Let’s get started.”
hook_montage Overview cue uses ≥ 5 distinct hero slides; alignment ≥ 0.45
image_mapping Same as display sync pass rate
youtube_quality Hook stakes, plain language, pacing, outro CTA
spoken_visual Requires passing spoken_visual_sync_report.json
visual_audit Requires passing visual_audit_report.json

Output: merge/sync_validation_report.json

daily-single -p $PROJECT validate-sync --runs 3

audit-visual

Samples merge/final.mp4 every 5 seconds (plus cue midpoints). Compares frames to planned assets.

Asset type Min pixel similarity
PNG slides 0.42
Video clips 0.28
HeyGen / avatar 0.15

Optional vision LLM (gpt-4o-mini) flags off-topic or generic B-roll.

Output: merge/visual_audit_report.json, frames in merge/visual_audit_frames/

daily-single -p $PROJECT audit-visual --interval 5
daily-single -p $PROJECT validate-visual-audit

validate-all

Single publish gate combining tools, output specs, media inventory, and all reports above.

Check Rule
Output 1920×1080, duration ~280–540s
Beat coverage All beats assembled
Bookends HeyGen hook + outro present
Media Videos ≥720p from handoff
Reports display, sync, slide design, engagement, viral readiness, visual audit all pass

Output: validation_report.json (project root)

daily-single -p $PROJECT validate-all

Use this checklist when building a video step by step:

PROJECT=examples/videos/<slug>

# 1 — After scripts + handoff
daily-single -p $PROJECT validate-qa --when pre_build

# 2 — After voice-over
daily-single -p $PROJECT validate-qa --when post_vo

# 3 — After HeyGen bookends
daily-single -p $PROJECT validate-qa --when pre_assemble

# 4 — After captions + assemble (cue-aligned order)
daily-single -p $PROJECT build-captions
daily-single -p $PROJECT assemble-beats
daily-single -p $PROJECT validate-display
daily-single -p $PROJECT validate-spoken-visual
pytest tests/test_cue_slide_sync.py tests/test_spoken_visual_sync.py -q

# 5 — Main modular gate
daily-single -p $PROJECT validate-qa --when post_build

# 6 — Confirm legacy gates (optional if s10 passed)
daily-single -p $PROJECT validate-all
daily-single -p $PROJECT validate-sync --runs 3

# 7 — After code changes only
pytest tests/test_video_qa.py tests/test_daily_single_sync_validation.py \
       tests/test_cue_slide_sync.py tests/test_spoken_visual_sync.py -q

Output files (where to look when something fails)

File Layer Contains
merge/qa/summary.json Modular QA Overall pass/fail, failed_required, degradation
merge/qa/s*_report.json Modular QA Per-stage checks and messages
merge/display_sync_report.json Legacy Per-cue alignment and asset file
merge/spoken_visual_sync_report.json Legacy Windows, charts, transitions, coverage (ok: true required)
merge/sync_validation_report.json Legacy 3-run results, hook_montage, youtube_quality, spoken_visual
merge/visual_audit_report.json Legacy Per-sample pixel/topic pass
validation_report.json Legacy Final publish gate issues list