Video export (PPTX → MP4)

PraisonAIPPT can export presentations to MP4 on Mac and Linux using a compositor backend: LibreOffice rasterises slides to PNG, then FFmpeg overlays avatar and media regions using geometry from avatar_layouts and deck_slides.

PowerPoint CreateVideo (Windows, on-prem) is Phase 3 and not implemented in v1.

Requirements

Tool	Role	macOS install
FFmpeg + ffprobe	Encode and probe	`brew install ffmpeg`
poppler (`pdftoppm`)	PDF → PNG	`brew install poppler`
LibreOffice	PPTX → PDF	`brew install --cask libreoffice`

Check dependencies:

praisonaippt convert-video --check

On macOS the default H.264 encoder is h264_videotoolbox when available; otherwise libx264 is used.

CLI

# Build deck and export video (shares one LibreOffice PDF run when combined)
praisonaippt -i examples/avatar_layouts.yaml -o deck.pptx --convert-video

# Both PDF and video — single LO PDF internally
praisonaippt -i deck.yaml -o deck.pptx --convert-pdf --convert-video

# Standalone PPTX → MP4 (loads deck.yaml sidecar when present)
praisonaippt convert-video deck.pptx
# deck.yaml beside deck.pptx supplies avatar/media paths for PiP overlays --video-preset draft --slide-range 1-5

# Preflight
praisonaippt convert-video --check

CLI flags (video)

Flag	Values / type	Notes
`--convert-video`	flag	Build + export in one command
`--video-output`	path	Overrides `video_export.output_path`
`--video-backend`	`compositor`, `auto`, `powerpoint`	Overrides YAML `backend`
`--video-preset`	`draft`, `standard`, `high`, `4k`	Overrides YAML `preset`
`--narration-mode`	`fixed`, `audio_file`, `avatar`, `tts`, `auto`	Overrides YAML
`--video-options`	JSON string	Merged via `VideoOptions.from_dict`
`--slide-range`	`START-END` (1-based)	Export subset only
`--keep-temp`	flag	Retain temp files for debugging
`--check`	flag	Dependency check

Preset	Resolution	FPS	DPI
`draft`	1280×720	24	120
`standard`	1920×1080	30	192
`high`	1920×1080	30	240
`4k`	3840×2160	30	300

YAML configuration

Top-level video_export block:

video_export:
  backend: compositor
  narration_mode: fixed          # fixed | audio_file | avatar | tts | auto
  output_path: output/deck.mp4
  resolution: { width: 1920, height: 1080 }
  fps: 30
  dpi: 192
  preset: standard               # draft | standard | high
  slide_duration_sec: 5
  avatar_timeline: auto          # per_slide | continuous | auto
  avatar:
    fit: cover                   # cover | stretch (PPTX stretch uses stretch)
    shape: circle                # circle | square | rect
    crop_y_ratio: 0.06
    zoom_ratio: 1.45
    loop_if_shorter: true
  slide_cache: true              # PNG cache under ~/.praisonaippt/video_cache/
  tts:                           # requires pip install praisonaippt[video-tts]
    provider: edge
    voice: en-GB-RyanNeural
  captions:
    enabled: true                # writes .srt sidecar when notes/TTS used

Per-verse overrides:

- slide_type: avatar_media_1
  avatar_video_path: assets/speaker.mp4
  media_path: assets/diagram.png
  notes: Narration text.
  duration_sec: 12
  narration_mode: avatar

Schema keys: duration_sec, audio_start_sec, audio_path, narration_mode, sync_mode (verse level); video_export, slide_timestamps (deck level).

When duration_sec and audio_start_sec are set on a verse, they take precedence over ffprobe on shared HeyGen MP4 or MP3 files.

`avatar_timeline`

Value	Behaviour
`per_slide`	Avatar video restarts at each slide
`continuous`	One shared file; offset advances by each slide’s duration
`auto` (default)	`continuous` when all content slides share one `avatar_video_path`; otherwise `per_slide`

Use continuous (or auto with one HeyGen file) and per-slide audio_start_sec to slice one narration track across many slides without blink between slides.

slide_style.layouts.pip (crop_y_ratio, zoom_ratio, shape) merges into video options when not set under video_export.avatar.

Video overlay protocol (position, zoom, framing)

Compositor overlays use inch regions from layout engines, then apply the video overlay protocol (praisonaippt.video_protocol).

Precedence (later wins): video_export.avatar / media → slide_style.layouts.<slide_type> → layouts.pip → verse flat keys → verse.video_overlay.

Layer	Keys	Purpose
Deck	`video_export.avatar`, `video_export.media`	Defaults for all slides
Layout	`slide_style.layouts.pip` or `layouts.<slide_type>`	PiP anchor, size, crop, zoom
Verse (short)	`avatar_zoom_ratio`, `avatar_crop_y_ratio`, `avatar_fit`, `media_*`	One-off tweaks
Verse (full)	`video_overlay.avatar`, `video_overlay.media`	Per-slide override

Placement fields (avatar or media block):

Field	Type	Description
`anchor` / `pip_position`	enum	`bottom_right`, `bottom_left`, `top_right`, `top_left`, `center`
`width_ratio` / `pip_width_ratio`	0–1	PiP width vs slide width (anchor mode)
`margin_in` / `pip_margin_in`	inches	Inset from slide edge
`box`	mapping	Explicit `{left_in, top_in, width_in, height_in}` (overrides layout region)
`left_in`, `top_in`, `width_in`, `height_in`	inches	Shorthand explicit box
`offset_px`	`{x, y}`	Nudge overlay after pixel mapping (integers)
`crop_y_ratio`	0–0.45	Vertical crop bias when `fit: cover`
`zoom_ratio`	0.5–3.0	Cover scale (clamped ≥ 1.0 at render)
`fit`	enum	`cover`, `contain`, `stretch`
`shape`	enum	Avatar mask: `circle`, `rect`, `auto`, …

Example — per-slide PiP on a quote slide:

- slide_type: avatar_quote
  avatar_video_path: examples/heygen-article-50590.mp4
  video_overlay:
    avatar:
      anchor: bottom_right
      width_ratio: 0.20
      margin_in: 0.38
      zoom_ratio: 1.30
      crop_y_ratio: 0.08

Example — deck-wide defaults + explicit media framing:

video_export:
  avatar:
    fit: cover
    zoom_ratio: 1.35
    crop_y_ratio: 0.10
  media:
    fit: contain

Validation runs on load via yaml_validate.validate_video_export and validate_verse_options. slide_timestamps length is checked against the slide plan (warning if mismatched).

daily_single hook scroll: record-canonical-scroll crops side gutters (column + MSER + DOM), caps speed at 100 px/s, and writes merge/qa/canonical_capture/framing-diagram.png. Gate with validate-canonical-scroll before assemble.

Transcript-driven HeyGen decks

Generate YAML from Whisper JSON:

praisonaippt transcript-to-yaml \
  -i examples/short-script-50590_timestamps.json \
  -o examples/heygen-article-50590 \
  --transcript-mode both \
  --transcript-audio examples/short-script-50590.mp3 \
  --align silence,karaoke

Flag	Effect
`--transcript-mode`	`full`, `thematic`, or `both` deck variants
`--transcript-audio`	MP3 for silence/RMS alignment
`--align`	`silence`, `emphasis`, `karaoke` (comma-separated)
`--variants all`	Write media combination YAMLs

Example deck: examples/heygen-article-50590-short.yaml. Full matrix and build steps: HeyGen article examples.

Timing: use wall-clock merge (last_segment.end - first_segment.start) so pauses between Whisper segments are held on the correct slide. Sum of segment durations alone is shorter than total audio length.

HeyGen audio: default export uses HeyGen MP4 embedded audio (narration_mode: avatar or audio_source: heygen_video). Use audio_file / audio_source: external when video is visual-only and a separate MP3 drives narration.

With narration_mode: auto, if both audio_path and avatar_video_path are set, HeyGen video audio wins when the avatar file has an audio track; otherwise external audio_path is used. Use explicit avatar or audio_file to override.

Narration modes

Mode	Duration source	Primary audio
`fixed`	`slide_duration_sec` / `duration_sec`	none
`audio_file`	verse `duration_sec`, else `slide_timestamps`, else ffprobe	external file (trimmed with `audio_start_sec`)
`avatar`	verse `duration_sec`, else `slide_timestamps`, else ffprobe	avatar track
`tts`	ffprobe on generated MP3	TTS (avatar muted)
`auto`	precedence: avatar (if audio) → audio_path → notes→TTS → fixed	per rules

Optional alias in video_export: audio_source: heygen_video | external | tts (maps to narration_mode when narration_mode is omitted).

Avatar video audio is muted when TTS or audio_file is primary to avoid double narration.

sync_mode (per verse, optional)

When set explicitly on a verse, adjusts slide duration across sources:

Value	Behaviour
`avatar_lead`	Duration follows avatar video (skipped when verse has explicit `duration_sec`)
`notes_lead`	Duration follows TTS of notes
`longest`	Maximum of resolved sources (skipped when verse has explicit `duration_sec`)

Slide raster cache

PNG pages are cached under ~/.praisonaippt/video_cache/ keyed by PPTX mtime and DPI. Disable with slide_cache: false in video_export (via JSON --video-options).

Compositor behaviour

LibreOffice PNG is static chrome (text, borders, baked deck images). FFmpeg overlays:

avatar_video_path → regions["avatar"] when the region exists
media_path → regions["media"] when present and skip_media_overlay is false

All deck_* slides set skip_media_overlay: true (images are already in the PPTX). Avatar layout slides overlay both regions when paths are set.

Split layouts (avatar_media_1 vs avatar_media_2) use distinct width ratios from layout_tokens.py, visible in both PPTX and video.

Z-order: media → avatar → text (already in PNG).

Fidelity matrix (Phase 0 — LibreOffice vs PowerPoint)

Measured on Mac with LO headless PDF → pdftoppm vs PowerPoint slide view for avatar layouts. Use this when judging export quality.

Layout	LO static chrome	Embedded movies in LO PNG	FFmpeg overlay fix	Known delta
`avatar_only`	Good	Grey placeholder only	Avatar video in region	LO placeholder colour may differ slightly
`media_only`	Good	Image OK; video not played	Media file overlaid	Video must be overlaid, not embedded
`avatar_media_1` (50/50)	Good split geometry	Placeholders only	Both regions overlaid	Split ratio matches YAML (~50/50)
`avatar_media_2` (40/60)	Good	Placeholders only	Both regions overlaid	Wider media column vs `_1`
`avatar_media_3` (PiP)	Good	Placeholders only	PiP boxes overlaid	stacked: media band below panel; full_bleed: media fills slide; text panel in PNG; PiP from `_slide_regions` + verse `text_panel`
`avatar_name_card`	Good	Avatar placeholder	Avatar in region	Navy text panel may sit above avatar in PPTX; v1 square overlays
`avatar_headline`	Good	Same as name card	Same	Panel text in PNG only
`avatar_quote`	Moderate	Navy fill approximate	Avatar overlaid on quote area	LO may shift quote typography; use `raster_mode: native` (future) if drift matters
`avatar_border` / `media_border`	Good borders	Placeholders	Overlays in bordered rects	Rounded inner corners: square overlays in v1
`avatar_media_border_*`	Good	Placeholders	Overlays	60/40 vs 40/60 ratios preserved

Invariants enforced: len(slides) == pdf_pages == png_count — export fails fast on mismatch.

Slide transitions: default is none (hard cuts). Per-clip dip-to-black is segment_fade; true A→B blends use crossfade / wipes via the compositor xfade path. Full guide: Slide transitions.

slide_transitions:
  default: none
  edges:
    - after_slide: 2
      type: crossfade
      duration_sec: 0.35

Showcase: examples/slide-transitions-showcase.yaml → examples/slide-transitions-showcase.mp4.

Not in v1: rounded overlay masks, Windows CreateVideo animations.

Python API

from praisonaippt import (
    create_presentation,
    load_verses_from_file,
    VideoOptions,
    convert_deck_to_video,
)

data = load_verses_from_file("deck.yaml")
pptx = create_presentation(data, "deck.pptx")
convert_deck_to_video(data, pptx, video_options=VideoOptions(preset="draft"))

Optional extras

pip install praisonaippt[video-tts]      # edge-tts for narration_mode: tts
pip install praisonaippt[video-windows]  # Phase 3 stub only

Windows worker (deferred)

Phase 3 adds an on-prem FastAPI worker calling PowerPoint CreateVideo. It is not multi-tenant SaaS-ready without Microsoft Office licensing review. See praisonaippt/workers/ppt_com.py.

Legal note

SaaS redistributors using libx264 should review H.264 patent obligations. macOS VideoToolbox is preferred where available.

PraisonAI PPT Documentation

Video export (PPTX → MP4)

Video export (PPTX → MP4)

Requirements

CLI

CLI flags (video)

YAML configuration

avatar_timeline

Video overlay protocol (position, zoom, framing)

Transcript-driven HeyGen decks

Narration modes

sync_mode (per verse, optional)

Slide raster cache

Compositor behaviour

Fidelity matrix (Phase 0 — LibreOffice vs PowerPoint)

Python API

Optional extras

Windows worker (deferred)

Legal note

`avatar_timeline`