Video export (PPTX → MP4)
Video export (PPTX → MP4)
PraisonAIPPT can export presentations to MP4 on Mac and Linux using a compositor
backend: LibreOffice rasterises slides to PNG, then FFmpeg overlays avatar and media
regions using geometry from avatar_layouts and deck_slides.
Related docs: Recent features · Layouts overview · Avatar layouts & PiP · Deck layouts · YAML deck reference · HeyGen examples · Avatar calibration · Slide JPEGs · Commands
PowerPoint CreateVideo (Windows, on-prem) is Phase 3 and not implemented in v1.
Requirements
| Tool | Role | macOS install |
|---|---|---|
| FFmpeg + ffprobe | Encode and probe | brew install ffmpeg |
poppler (pdftoppm) |
PDF → PNG | brew install poppler |
| LibreOffice | PPTX → PDF | brew install --cask libreoffice |
Check dependencies:
praisonaippt convert-video --check
On macOS the default H.264 encoder is h264_videotoolbox when available; otherwise
libx264 is used.
CLI
# Build deck and export video (shares one LibreOffice PDF run when combined)
praisonaippt -i examples/avatar_layouts.yaml -o deck.pptx --convert-video
# Both PDF and video — single LO PDF internally
praisonaippt -i deck.yaml -o deck.pptx --convert-pdf --convert-video
# Standalone PPTX → MP4 (loads deck.yaml sidecar when present)
praisonaippt convert-video deck.pptx
# deck.yaml beside deck.pptx supplies avatar/media paths for PiP overlays --video-preset draft --slide-range 1-5
# Preflight
praisonaippt convert-video --check
CLI flags (video)
| Flag | Values / type | Notes |
|---|---|---|
--convert-video |
flag | Build + export in one command |
--video-output |
path | Overrides video_export.output_path |
--video-backend |
compositor, auto, powerpoint |
Overrides YAML backend |
--video-preset |
draft, standard, high, 4k |
Overrides YAML preset |
--narration-mode |
fixed, audio_file, avatar, tts, auto |
Overrides YAML |
--video-options |
JSON string | Merged via VideoOptions.from_dict |
--slide-range |
START-END (1-based) |
Export subset only |
--keep-temp |
flag | Retain temp files for debugging |
--check |
flag | Dependency check |
| Preset | Resolution | FPS | DPI |
|---|---|---|---|
draft |
1280×720 | 24 | 120 |
standard |
1920×1080 | 30 | 192 |
high |
1920×1080 | 30 | 240 |
4k |
3840×2160 | 30 | 300 |
YAML configuration
Top-level video_export block:
video_export:
backend: compositor
narration_mode: fixed # fixed | audio_file | avatar | tts | auto
output_path: output/deck.mp4
resolution: { width: 1920, height: 1080 }
fps: 30
dpi: 192
preset: standard # draft | standard | high
slide_duration_sec: 5
avatar_timeline: auto # per_slide | continuous | auto
avatar:
fit: cover # cover | stretch (PPTX stretch uses stretch)
shape: circle # circle | square | rect
crop_y_ratio: 0.06
zoom_ratio: 1.45
loop_if_shorter: true
slide_cache: true # PNG cache under ~/.praisonaippt/video_cache/
tts: # requires pip install praisonaippt[video-tts]
provider: edge
voice: en-GB-RyanNeural
captions:
enabled: true # writes .srt sidecar when notes/TTS used
Per-verse overrides:
- slide_type: avatar_media_1
avatar_video_path: assets/speaker.mp4
media_path: assets/diagram.png
notes: Narration text.
duration_sec: 12
narration_mode: avatar
Schema keys: duration_sec, audio_start_sec, audio_path, narration_mode, sync_mode (verse level);
video_export, slide_timestamps (deck level).
When duration_sec and audio_start_sec are set on a verse, they take precedence over
ffprobe on shared HeyGen MP4 or MP3 files.
avatar_timeline
| Value | Behaviour |
|---|---|
per_slide |
Avatar video restarts at each slide |
continuous |
One shared file; offset advances by each slide’s duration |
auto (default) |
continuous when all content slides share one avatar_video_path; otherwise per_slide |
Use continuous (or auto with one HeyGen file) and per-slide audio_start_sec to slice one narration track across many slides without blink between slides.
slide_style.layouts.pip (crop_y_ratio, zoom_ratio, shape) merges into video options when not set under video_export.avatar.
Video overlay protocol (position, zoom, framing)
Compositor overlays use inch regions from layout engines, then apply the video overlay protocol (praisonaippt.video_protocol).
Precedence (later wins): video_export.avatar / media → slide_style.layouts.<slide_type> → layouts.pip → verse flat keys → verse.video_overlay.
| Layer | Keys | Purpose |
|---|---|---|
| Deck | video_export.avatar, video_export.media |
Defaults for all slides |
| Layout | slide_style.layouts.pip or layouts.<slide_type> |
PiP anchor, size, crop, zoom |
| Verse (short) | avatar_zoom_ratio, avatar_crop_y_ratio, avatar_fit, media_* |
One-off tweaks |
| Verse (full) | video_overlay.avatar, video_overlay.media |
Per-slide override |
Placement fields (avatar or media block):
| Field | Type | Description |
|---|---|---|
anchor / pip_position |
enum | bottom_right, bottom_left, top_right, top_left, center |
width_ratio / pip_width_ratio |
0–1 | PiP width vs slide width (anchor mode) |
margin_in / pip_margin_in |
inches | Inset from slide edge |
box |
mapping | Explicit {left_in, top_in, width_in, height_in} (overrides layout region) |
left_in, top_in, width_in, height_in |
inches | Shorthand explicit box |
offset_px |
{x, y} |
Nudge overlay after pixel mapping (integers) |
crop_y_ratio |
0–0.45 | Vertical crop bias when fit: cover |
zoom_ratio |
0.5–3.0 | Cover scale (clamped ≥ 1.0 at render) |
fit |
enum | cover, contain, stretch |
shape |
enum | Avatar mask: circle, rect, auto, … |
Example — per-slide PiP on a quote slide:
- slide_type: avatar_quote
avatar_video_path: examples/heygen-article-50590.mp4
video_overlay:
avatar:
anchor: bottom_right
width_ratio: 0.20
margin_in: 0.38
zoom_ratio: 1.30
crop_y_ratio: 0.08
Example — deck-wide defaults + explicit media framing:
video_export:
avatar:
fit: cover
zoom_ratio: 1.35
crop_y_ratio: 0.10
media:
fit: contain
Validation runs on load via yaml_validate.validate_video_export and validate_verse_options. slide_timestamps length is checked against the slide plan (warning if mismatched).
daily_single hook scroll: record-canonical-scroll crops side gutters (column + MSER + DOM), caps speed at 100 px/s, and writes merge/qa/canonical_capture/framing-diagram.png. Gate with validate-canonical-scroll before assemble.
Transcript-driven HeyGen decks
Generate YAML from Whisper JSON:
praisonaippt transcript-to-yaml \
-i examples/short-script-50590_timestamps.json \
-o examples/heygen-article-50590 \
--transcript-mode both \
--transcript-audio examples/short-script-50590.mp3 \
--align silence,karaoke
| Flag | Effect |
|---|---|
--transcript-mode |
full, thematic, or both deck variants |
--transcript-audio |
MP3 for silence/RMS alignment |
--align |
silence, emphasis, karaoke (comma-separated) |
--variants all |
Write media combination YAMLs |
Example deck: examples/heygen-article-50590-short.yaml. Full matrix and build steps: HeyGen article examples.
Timing: use wall-clock merge (last_segment.end - first_segment.start) so pauses between
Whisper segments are held on the correct slide. Sum of segment durations alone is shorter than
total audio length.
HeyGen audio: default export uses HeyGen MP4 embedded audio (narration_mode: avatar or
audio_source: heygen_video). Use audio_file / audio_source: external when video is visual-only
and a separate MP3 drives narration.
With narration_mode: auto, if both audio_path and avatar_video_path are set, HeyGen video
audio wins when the avatar file has an audio track; otherwise external audio_path is used.
Use explicit avatar or audio_file to override.
Narration modes
| Mode | Duration source | Primary audio |
|---|---|---|
fixed |
slide_duration_sec / duration_sec |
none |
audio_file |
verse duration_sec, else slide_timestamps, else ffprobe |
external file (trimmed with audio_start_sec) |
avatar |
verse duration_sec, else slide_timestamps, else ffprobe |
avatar track |
tts |
ffprobe on generated MP3 | TTS (avatar muted) |
auto |
precedence: avatar (if audio) → audio_path → notes→TTS → fixed | per rules |
Optional alias in video_export: audio_source: heygen_video | external | tts (maps to
narration_mode when narration_mode is omitted).
Avatar video audio is muted when TTS or audio_file is primary to avoid double narration.
sync_mode (per verse, optional)
When set explicitly on a verse, adjusts slide duration across sources:
| Value | Behaviour |
|---|---|
avatar_lead |
Duration follows avatar video (skipped when verse has explicit duration_sec) |
notes_lead |
Duration follows TTS of notes |
longest |
Maximum of resolved sources (skipped when verse has explicit duration_sec) |
Slide raster cache
PNG pages are cached under ~/.praisonaippt/video_cache/ keyed by PPTX mtime and DPI.
Disable with slide_cache: false in video_export (via JSON --video-options).
Compositor behaviour
LibreOffice PNG is static chrome (text, borders, baked deck images). FFmpeg overlays:
avatar_video_path→regions["avatar"]when the region existsmedia_path→regions["media"]when present andskip_media_overlayis false
All deck_* slides set skip_media_overlay: true (images are already in the PPTX). Avatar layout slides overlay both regions when paths are set.
Split layouts (avatar_media_1 vs avatar_media_2) use distinct width ratios from
layout_tokens.py, visible in both PPTX and video.
Z-order: media → avatar → text (already in PNG).
Fidelity matrix (Phase 0 — LibreOffice vs PowerPoint)
Measured on Mac with LO headless PDF → pdftoppm vs PowerPoint slide view for avatar
layouts. Use this when judging export quality.
| Layout | LO static chrome | Embedded movies in LO PNG | FFmpeg overlay fix | Known delta |
|---|---|---|---|---|
avatar_only |
Good | Grey placeholder only | Avatar video in region | LO placeholder colour may differ slightly |
media_only |
Good | Image OK; video not played | Media file overlaid | Video must be overlaid, not embedded |
avatar_media_1 (50/50) |
Good split geometry | Placeholders only | Both regions overlaid | Split ratio matches YAML (~50/50) |
avatar_media_2 (40/60) |
Good | Placeholders only | Both regions overlaid | Wider media column vs _1 |
avatar_media_3 (PiP) |
Good | Placeholders only | PiP boxes overlaid | stacked: media band below panel; full_bleed: media fills slide; text panel in PNG; PiP from _slide_regions + verse text_panel |
avatar_name_card |
Good | Avatar placeholder | Avatar in region | Navy text panel may sit above avatar in PPTX; v1 square overlays |
avatar_headline |
Good | Same as name card | Same | Panel text in PNG only |
avatar_quote |
Moderate | Navy fill approximate | Avatar overlaid on quote area | LO may shift quote typography; use raster_mode: native (future) if drift matters |
avatar_border / media_border |
Good borders | Placeholders | Overlays in bordered rects | Rounded inner corners: square overlays in v1 |
avatar_media_border_* |
Good | Placeholders | Overlays | 60/40 vs 40/60 ratios preserved |
Invariants enforced: len(slides) == pdf_pages == png_count — export fails fast on mismatch.
Slide transitions: default is none (hard cuts). Per-clip dip-to-black is segment_fade; true A→B blends use crossfade / wipes via the compositor xfade path. Full guide: Slide transitions.
slide_transitions:
default: none
edges:
- after_slide: 2
type: crossfade
duration_sec: 0.35
Showcase: examples/slide-transitions-showcase.yaml → examples/slide-transitions-showcase.mp4.
Not in v1: rounded overlay masks, Windows CreateVideo animations.
Python API
from praisonaippt import (
create_presentation,
load_verses_from_file,
VideoOptions,
convert_deck_to_video,
)
data = load_verses_from_file("deck.yaml")
pptx = create_presentation(data, "deck.pptx")
convert_deck_to_video(data, pptx, video_options=VideoOptions(preset="draft"))
Optional extras
pip install praisonaippt[video-tts] # edge-tts for narration_mode: tts
pip install praisonaippt[video-windows] # Phase 3 stub only
Windows worker (deferred)
Phase 3 adds an on-prem FastAPI worker calling PowerPoint CreateVideo. It is not
multi-tenant SaaS-ready without Microsoft Office licensing review. See
praisonaippt/workers/ppt_com.py.
Legal note
SaaS redistributors using libx264 should review H.264 patent obligations. macOS
VideoToolbox is preferred where available.