Daily single — testing guide

This page explains every kind of test used in the daily single video pipeline: what it checks, when to run it, and where results are saved. Written for operators and contributors — no need to read the Python source first.

Three layers

┌─────────────────────────────────────────────────────────────┐
│  Layer 1 — Unit tests (pytest, no video project required)   │
├─────────────────────────────────────────────────────────────┤
│  Layer 2 — Modular QA (validate-qa / video_qa stages)       │
├─────────────────────────────────────────────────────────────┤
│  Layer 3 — Legacy publish gates (validate-sync, validate-all)│
└─────────────────────────────────────────────────────────────┘

Run Layer 1 after code changes. Run Layer 2 after each pipeline phase. Run Layer 3 before upload — or rely on stage s10-final-composite, which runs the same checks.

Layer 1 — Unit tests (pytest)

Fast, offline tests. No API keys required for the core QA module tests.

conda activate test
cd /path/to/praisonaippt

# Minimal — video_qa module only
pytest tests/test_video_qa.py -q

# Full daily_single suite
pytest tests/test_daily_single_display_sync_unit.py \
       tests/test_daily_single_sync_validation.py \
       tests/test_daily_single_hook_montage.py \
       tests/test_daily_single_media_sync.py \
       tests/test_daily_single_visual_audit.py \
       tests/test_daily_single_youtube_quality.py \
       tests/test_daily_single_captions.py \
       tests/test_video_qa.py -q

Test file	What it verifies
`test_video_qa.py`	Stage registry, skip rules, s04/s05/s06 behaviour, VLM cache round-trip
`test_daily_single_sync_validation.py`	Caption script lock, hook structure, sync suite idempotency (mocked)
`test_daily_single_display_sync_unit.py`	Cue → asset keyword scoring, SRT parsing
`test_daily_single_hook_montage.py`	Phrase → hero montage plan, montage validators
`test_daily_single_visual_audit.py`	Pixel similarity thresholds, generic B-roll patterns
`test_daily_single_youtube_quality.py`	Hook stakes, plain language, outro CTA rules
`test_daily_single_media_sync.py`	Handoff inventory, HD video rules
`test_daily_single_captions.py`	Sentence splitting, proportional caption fallback

!!! tip “When to run” Run the full pytest suite before merging changes to praisonaippt/daily_single/ or praisonaippt/video_qa/.

Layer 2 — Modular QA (`validate-qa`)

Module: praisonaippt/video_qa/
CLI: daily-single -p $PROJECT validate-qa or python -m praisonaippt.video_qa --project $PROJECT run

Each stage runs independently and writes a JSON report under merge/qa/. A rollup lives in merge/qa/summary.json.

When to run

Phase flag	Run after	Stages included
`pre_build`	Scripts + handoff ready; before or after `sync-assets`	s04, s06, s01, s02 (optional VLM)
`post_vo`	`synthesise-vo`	s05 (narration present per segment)
`pre_assemble`	`bookend-media`	s00 (hook/outro HeyGen gate)
`post_build`	`assemble-beats` + `build-captions`	s05 captions, s03, s08, s07, s09, s10
`all`	Full rebuild audit	Every configured stage

daily-single -p $PROJECT validate-qa --when pre_build
daily-single -p $PROJECT validate-qa --when post_build

# Single stage debug
daily-single -p $PROJECT validate-qa s08-av-sync
python -m praisonaippt.video_qa --project $PROJECT list

Stage reference (what each test does)

Stage	Plain English	Pass means
s04-knowledge	“Do we have the research inputs?”	manifest, video-script, handoff, beat-map, segment scripts exist
s06-coverage	“Does each beat have enough visuals for its script?”	No critical asset gaps; hook montage plan valid
s01-assets	“Are handoff files on disk and readable?”	Images/videos resolve; beat-map paths exist
s02-source-vlm	“Do source B-roll clips look on-topic?” (optional)	VLM samples every 5s; flags generic/stock footage
s00-bookends	“Are hook and outro ready to merge?”	script + narration + `heygen.mp4` for 00-hook and 99-outro
s05-transcript	“Does audio match the locked script?”	post_vo: MP3 exists; post_captions: SRT matches script + overlap checks
s03-image-speech	“Does each spoken line show the right image?”	Display sync: ≥35% keyword alignment per cue
s08-av-sync	“Is the timeline coherent?”	Hook structure, word-level match (hook/outro), section durations vs `timeline.json`
s07-framing	“Are HeyGen clips the expected resolution?”	Hook/outro dimensions (warn-only)
s09-on-screen-text	“Any long cues with weak visual match?”	Flags cues with ≥6 words and low alignment
s10-final-composite	“Production gate”	Visual audit 5s samples + sync×3 + validate-all

Full stage config: Video QA.

Degradation (warn, not fail)

Some environments cannot run every check. The suite records flags in summary.json:

Flag	Cause	Behaviour
`whisper: missing_timestamps`	Whisper/transcribe failed for beat segments	Proportional captions used; s05 passes with warnings
`vlm: offline`	No `OPENAI_API_KEY`	s02 skipped
`final_mp4: missing`	No `merge/final.mp4`	post_build visual stages skipped

Set PRAISONAIPPT_QA_OFFLINE=1 in CI to skip API-dependent stages.

Layer 3 — Legacy publish gates

These pre-date the modular video_qa package but remain the authoritative publish bar. Stage s10 runs them automatically; you can also run them standalone.

`validate-display`

Maps every SRT cue to the visual shown at the cue midpoint.

Check	Threshold
Keyword alignment	≥ 0.35 per cue
Borderline band	0.35–0.45 (passes but worth spot-check)

Output: merge/display_sync_report.json

daily-single -p $PROJECT validate-display

`validate-spoken-visual`

Stricter full-video gate: montage fragments, slide windows (worst overlapping cue), chart/plain-language checks, transition samples at every image change, coverage, and plain-language rules.

Output: merge/spoken_visual_sync_report.json — require "ok": true before publish.

daily-single -p $PROJECT validate-spoken-visual

Cue-aligned rebuild (beat-06, beat-01 views): build-captions → assemble-beats → validate-display → validate-spoken-visual. Skill: .cursor/skills/daily-single-video-pipeline/spoken-visual-sync.md

`validate-slide-quality` / `validate-engagement-assets` / `validate-viral-readiness`

Professional and viral publish gates (trust-audit uses stricter thresholds via variant: trust-audit in beat-map).

Command	Output	What it checks
`validate-slide-quality`	`merge/slide_design_report.json`	Body PNG tier mix — rejects text_slide-heavy decks
`validate-engagement-assets`	`merge/engagement_report.json`	Motion ratio, clip beats, social captures, demo beats
`validate-viral-readiness`	`merge/viral_readiness_report.json`	Composite: slide + engagement + hook motion + proof density

Full matrix: .cursor/skills/daily-single-video-pipeline/scripts/run-publish-gate.sh

Unit tests:

pytest tests/test_cue_slide_sync.py tests/test_spoken_visual_sync.py \
       tests/test_slide_design_audit.py tests/test_engagement_audit.py \
       tests/test_viral_readiness.py tests/test_video_qa.py -q

`validate-sync --runs 3`

Runs the full spoken↔visual suite three times and requires identical results (idempotency).

Sub-check	What it does
`caption_script_lock`	SRT text equals locked `script.md` — not raw Whisper text
`hook_structure`	Cues 1–3 = attention → overview → “Let’s get started.”
`hook_montage`	Overview cue uses ≥ 5 distinct hero slides; alignment ≥ 0.45
`image_mapping`	Same as display sync pass rate
`youtube_quality`	Hook stakes, plain language, pacing, outro CTA
`spoken_visual`	Requires passing `spoken_visual_sync_report.json`
`visual_audit`	Requires passing `visual_audit_report.json`

Output: merge/sync_validation_report.json

daily-single -p $PROJECT validate-sync --runs 3

`audit-visual`

Samples merge/final.mp4 every 5 seconds (plus cue midpoints). Compares frames to planned assets.

Asset type	Min pixel similarity
PNG slides	0.42
Video clips	0.28
HeyGen / avatar	0.15

Optional vision LLM (gpt-4o-mini) flags off-topic or generic B-roll.

Output: merge/visual_audit_report.json, frames in merge/visual_audit_frames/

daily-single -p $PROJECT audit-visual --interval 5
daily-single -p $PROJECT validate-visual-audit

`validate-all`

Single publish gate combining tools, output specs, media inventory, and all reports above.

Check	Rule
Output	1920×1080, duration ~280–540s
Beat coverage	All beats assembled
Bookends	HeyGen hook + outro present
Media	Videos ≥720p from handoff
Reports	display, sync, slide design, engagement, viral readiness, visual audit all pass

Output: validation_report.json (project root)

daily-single -p $PROJECT validate-all

Recommended test workflow (full rebuild)

Use this checklist when building a video step by step:

PROJECT=examples/videos/<slug>

# 1 — After scripts + handoff
daily-single -p $PROJECT validate-qa --when pre_build

# 2 — After voice-over
daily-single -p $PROJECT validate-qa --when post_vo

# 3 — After HeyGen bookends
daily-single -p $PROJECT validate-qa --when pre_assemble

# 4 — After captions + assemble (cue-aligned order)
daily-single -p $PROJECT build-captions
daily-single -p $PROJECT assemble-beats
daily-single -p $PROJECT validate-display
daily-single -p $PROJECT validate-spoken-visual
pytest tests/test_cue_slide_sync.py tests/test_spoken_visual_sync.py -q

# 5 — Main modular gate
daily-single -p $PROJECT validate-qa --when post_build

# 6 — Confirm legacy gates (optional if s10 passed)
daily-single -p $PROJECT validate-all
daily-single -p $PROJECT validate-sync --runs 3

# 7 — After code changes only
pytest tests/test_video_qa.py tests/test_daily_single_sync_validation.py \
       tests/test_cue_slide_sync.py tests/test_spoken_visual_sync.py -q

Output files (where to look when something fails)

File	Layer	Contains
`merge/qa/summary.json`	Modular QA	Overall pass/fail, `failed_required`, degradation
`merge/qa/s*_report.json`	Modular QA	Per-stage checks and messages
`merge/display_sync_report.json`	Legacy	Per-cue alignment and asset file
`merge/spoken_visual_sync_report.json`	Legacy	Windows, charts, transitions, coverage (`ok: true` required)
`merge/sync_validation_report.json`	Legacy	3-run results, hook_montage, youtube_quality, spoken_visual
`merge/visual_audit_report.json`	Legacy	Per-sample pixel/topic pass
`validation_report.json`	Legacy	Final publish gate issues list

PraisonAI PPT Documentation

Daily single — testing guide

Daily single — testing guide

Three layers

Layer 1 — Unit tests (pytest)

Layer 2 — Modular QA (validate-qa)

When to run

Stage reference (what each test does)

Degradation (warn, not fail)

Layer 3 — Legacy publish gates

validate-display

validate-spoken-visual

validate-slide-quality / validate-engagement-assets / validate-viral-readiness

validate-sync --runs 3

audit-visual

validate-all

Recommended test workflow (full rebuild)

Output files (where to look when something fails)

Related

Layer 2 — Modular QA (`validate-qa`)

`validate-display`

`validate-spoken-visual`

`validate-slide-quality` / `validate-engagement-assets` / `validate-viral-readiness`

`validate-sync --runs 3`

`audit-visual`

`validate-all`