Add target-side video preprocessing pipeline

Preprocesses a folder of video files into UUID-named clips suitable as target inputs for roop-unleashed-style face-swap. Counterpart to the faceset (source-side) tooling. work/video_target_pipeline.py — orchestration with subcommands scan / scenes / stage / merge / track / score / cut / report. Quality gates default to face-sets-can-handle-side-profile values (yaw<=75°, pitch<=45°, face_short>=80px, det>=0.5). Cross-track segment merge fuses adjacent-in-time tracks within the same scene up to 2s gap. Output organized into <output_dir>/<source_stem>/<uuid>.mp4 + <uuid>.json sidecar with full provenance. work/video_face_worker.py — Windows DML face detect+embed worker. Uses JSONL append-only for results.jsonl: a critical perf fix (re- serializing the monolithic 245MB results.json on every flush was the dominant cost in the first attempt, dropping throughput to 0.5 fps). Append-only got it to 13+ fps, ~7.5 fps cumulative across the first 6.18h batch. Also uses seek-once-per-video + sequential cap.grab() between samples to dodge cv2 per-sample seek pathology on long H.264. Legacy results.json is auto-migrated to .jsonl on first load. work/run_video_pipeline.sh — generic chain driver, parameterized via WORK / INPUT_DIR / OUTPUT_DIR / FILTER_FROM / SKIP_PATTERN / MAX_DUR / IDENTITY env vars. work/status_video_pipeline.sh — generic status helper. First production batch (ct_src_00050..00062, 13 files, 6.18h input): 600 emitted segments, 239.5min accepted content (64.6% of input), 254 segments built from >=2 tracks (cross-track merge), 1h43m wall clock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 21:38:50 +02:00
parent 49a43c7685
commit 998fa79f81
6 changed files with 1480 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -343,6 +343,7 @@ clean it up over time:
 | `work/consolidate_facesets.py` | Merge duplicate identities (centroid cosine sim ≥ 0.55 with confident ≥ 0.65, **complete-linkage** to defeat single-link chaining). Pulls embeddings from cache, no GPU. See `docs/analysis/identity-consolidation-and-age-extend.md`. |
 | `work/age_extend_001.py` | Slot newly-added PNGs into existing era buckets of `faceset_001` (anchor cosine distance ≤ 0.40 AND `|year_delta|` ≤ 5). Same anchor-fragment rule as `age_split_001.py`. |
 | `work/dedup_optimize.py` (+ Windows `work/multiface_worker.py`) | (a) cross-family SHA256 byte-dedup, (b) within-faceset near-dup at cosine sim ≥ 0.95, (c) multi-face audit (re-detect via insightface, drop PNGs with face_count ≠ 1). Multi-face is the load-bearing roop invariant. See `docs/analysis/dedup-and-roop-optimization.md`. |
 | `work/video_target_pipeline.py` (+ Windows `work/video_face_worker.py` + `work/run_video_pipeline.sh` chain) | Target-side preprocessing: scan a folder of videos → PySceneDetect shot-cuts → 2 fps frame sampling → DML face detection + embedding → IoU+embedding tracking → quality-gated segments (yaw≤75°, face≥80px, det≥0.5, ≥70% pass-rate, 1–120s duration, 2s cross-track merge gap) → ffmpeg stream-copy into UUID-named clips with sidecar JSON. Output organized into per-source subfolders. See `docs/analysis/video-target-preprocessing.md`. |
 All four operate idempotently and reversibly: dropped PNGs go to
 `<faceset>/faces/_dropped/`, quarantined whole facesets go to
@@ -382,6 +383,10 @@ Highly recommended at swap time: enable **Select post-processing = GFPGAN** with
   ├─ consolidate_facesets.py                    (duplicate-identity merger; complete-linkage)
   ├─ dedup_optimize.py                          (byte + near-dup + multi-face audit driver)
   ├─ multiface_worker.py                        (Windows DML multi-face audit worker)
   ├─ video_target_pipeline.py                   (video → swappable segment cuts orchestration)
   ├─ video_face_worker.py                       (Windows DML per-frame face worker; JSONL append-only)
   ├─ run_video_pipeline.sh                      (generic chain driver: scenes → stage → worker → cut)
   ├─ status_video_pipeline.sh                   (status helper for any video_pipeline log)
   ├─ synthetic_*_manifest.json                  (per-run synthetic refine manifests)
   ├─ immich/
   │  ├─ users.json                              (label -> userId map; gitignored)
--- a/docs/analysis/video-target-preprocessing.md
+++ b/docs/analysis/video-target-preprocessing.md
@@ -0,0 +1,129 @@
 # Video target preprocessing for roop-unleashed
 _Initial design + first batch run: 2026-04-27. Driver scripts: `work/video_target_pipeline.py`, `work/video_face_worker.py`, `work/run_video_pipeline.sh`._
 Companion to the face-set side of the project: instead of building per-identity .fsz bundles for the *source* of a swap, this pipeline preprocesses the *target* (videos to swap into). Given a folder of video files, it identifies "swappable" segments — continuous shots where a face is detectable, sufficiently visible, and roughly within inswapper_128's working envelope — and cuts them into UUID-named clips ready to feed into roop-unleashed.
 ## 1. Why build it
 I checked the obvious open-source projects for an existing implementation:
 - **FaceFusion** ([github.com/facefusion/facefusion](https://github.com/facefusion/facefusion)) — CLI has `run`, `headless-run`, `batch-run`, `job-*`, `force-download`, `benchmark`. No scene-detection or clip-extraction subcommand. Its own guides recommend "split your video manually first."
 - **roop-unleashed** at `/opt/roop-unleashed/roop/util_ffmpeg.py` — has `cut_video(start_frame, end_frame)` for a manual GUI trim, no detection-driven segmentation.
 - **Deep-Live-Cam** ([github.com/hacksider/Deep-Live-Cam](https://github.com/hacksider/Deep-Live-Cam)) — real-time / single-shot, no batch preprocessing.
 - **DeepFaceLab** — `extract_video.bat` dumps every frame between user-supplied trim points; no quality gating.
 Closest prior art for the cut-detection pattern is the two-stage hybrid in [SportSBD MMSys'26](https://dl.acm.org/doi/10.1145/3793853.3799803) (cheap detector for cuts, accurate net for verification), but the actual implementation has to be ours.
 ## 2. Pipeline architecture
 ```
 WSL  /opt/face-sets/work/                   Windows  C:\face_embed_venv\
 ─────────────────────────────────────       ─────────────────────────────
 run_video_pipeline.sh (chain driver)
   │
   ├─ scan         (ffprobe metadata)
   ├─ scenes       (PySceneDetect AdaptiveDetector, CPU)
   ├─ stage        (sampled frame queue.json @ 2 fps)
   │                                  │
   │                                  ▼
   │                            video_face_worker.py
   │                            insightface FaceAnalysis
   │                            on DmlExecutionProvider
   │                            output: results.jsonl
   ├─ merge        (ingest results.jsonl)
   ├─ track        (IoU + embedding stitching, ~30 LOC)
   ├─ score        (track-level quality gate + cross-track merge)
   ├─ cut          (ffmpeg -c copy → per-source subfolders)
   └─ report       (HTML preview)
   Output: <output_dir>/<source_video_stem>/<uuid>.mp4
                                           /<uuid>.json (sidecar)
 ```
 `run_video_pipeline.sh` is parameterized via env vars (`WORK`, `INPUT_DIR`, `OUTPUT_DIR`, `FILTER_FROM`, `SKIP_PATTERN`, `MAX_DUR`, `IDENTITY`) so you can pin a particular batch without editing the script.
 ## 3. Quality signals (matched to inswapper_128's working envelope)
 inswapper_128 is trained near-frontal at 128×128. The score gate uses defaults that admit side profiles (since rich face-sets can absorb non-frontal swap targets):
 | signal | threshold | rationale |
 |--------|----------:|-----------|
 | `|yaw|` | ≤ 75° | covers full 3/4 + side profile |
 | `|pitch|` | ≤ 45° | covers extreme up/down looks |
 | `face_short` | ≥ 80 px | inswapper resamples to 128; ≥80 still produces clean output |
 | `det_score` | ≥ 0.5 | matches buffalo_l's MIN_DET; lower = unreliable detection |
 | track-gate | ≥ 70 % frames pass | binary track filter rather than per-frame |
 | duration | 1 s ≤ dur ≤ 120 s | below 1s = unusable slivers; above 120s probably contains a missed micro-cut |
 Plus two segment-merging knobs:
 - `--bridge-gap` (default 3 s) — within a single track, brief pose-failure gaps shorter than this get bridged so single bad frames don't fragment a good run
 - `--merge-gap` (default 2 s) — across tracks within the same scene, segments closer than this get fused (cross-track merge fires when face detection briefly fails between adjacent good runs)
 The defaults can be tightened (e.g. `--max-yaw 25` for portrait-only) or loosened (e.g. `--max-yaw 90 --merge-gap 5`) without re-running detection — `score` reads the existing `tracks.json`.
 ## 4. Performance + the JSONL append-only fix
 This is where the engineering interest is. The first production run on 13 videos / 6.18 h of input went through three failure modes before settling at production speed:
 | attempt | issue | rate observed |
 |---|---|---:|
 | 1. Original `cap.set(POS_FRAMES, N)` per sample | OpenCV seeks to nearest keyframe + decodes forward at every sample. Cost grows with depth into the video; on a 60-min H.264 it falls off a cliff. | 1.4 fps → degrading |
 | 2. Sequential `cap.grab()` from frame 0 | On resume, grab-walking from frame 0 to a deep target is unbounded. | 0.08 fps |
 | 3. Hybrid: seek-once-per-video + sequential within | Better in principle. But hit a different bug: `flush()` was re-serializing the entire `results.json` (245 MB at this point) every 100 frames or 30 sec. Save dominated wall-clock. | 0.5 fps |
 | 4. **JSONL append-only** | One result per line. Each flush is O(new records), not O(total records). | **13.77 fps** smoke / 7.57 fps cumulative across the full batch |
 Lesson: when the output is large + grows monotonically + needs frequent checkpointing, *do not* re-serialize the whole structure on each flush. Append-only line-delimited JSON is the right tool. The legacy `results.json` is auto-converted to `.jsonl` on first load (one-time migration), so resumes survive the format switch.
 ## 5. Hardware decode/encode on AMD Vega + WSL
 Skipped. Per [Microsoft's WSL D3D12 video acceleration post](https://devblogs.microsoft.com/commandline/d3d12-gpu-video-acceleration-in-the-windows-subsystem-for-linux-now-available/), VAAPI-via-Mesa-D3D12 exists but is fragile on older AMD. AMF on Windows would mean a Windows-side ffmpeg leg, doubling boundary crossings. CPU software decode of 1280×720 H.264 in WSL ffmpeg is faster than realtime, and the bottleneck is buffalo_l detection on DML, not decode.
 For cutting we use `-c copy` stream-copy — no re-encode, hardware codecs are moot.
 ## 6. First batch run results (ct_src_00050..00062)
 | | |
 |---|---:|
 | input videos | 13 |
 | input duration | 6.18 h |
 | sampled frames | 44,635 (@ 2 fps) |
 | accepted tracks | 1,193 / 2,564 (47 %) |
 | **emitted segments** | **600** |
 | segments built from ≥2 tracks (cross-track merge fired) | 254 |
 | accepted content total | 239.5 min (64.6 % of input) |
 | segment duration min/median/mean/max | 1 / 12 / 24 / 119 s |
 | output size | 3.63 GB |
 Phase timings:
 - scenes: 25 min (cached on later runs)
 - stage: instant
 - worker: 78 min @ ~7.5 fps cumulative
 - merge: 73 s
 - track: 77 s
 - score: 21 s
 - cut (600 ffmpeg stream-copies): 19 min
 - report (600 thumbs + HTML): 3 min
 - **total wall-clock: 1h43m**
 ## 7. Re-running
 ```bash
 # choose a per-batch workdir + log
 WORK=/opt/face-sets/work/video_preprocess_<batch_name> \
  FILTER_FROM=ct_src_00050.mp4 \
  bash work/run_video_pipeline.sh > work/logs/video_run_<batch_name>.log 2>&1 &
 # check status anytime
 bash work/status_video_pipeline.sh work/logs/video_run_<batch_name>.log
 ```
 Skip patterns can exclude already-processed inputs:
 ```bash
 SKIP_PATTERN='^ct_src_(0001[015]|005[0-9]|006[0-9])\.mp4$' \
  WORK=/opt/face-sets/work/video_preprocess_rest \
  bash work/run_video_pipeline.sh > work/logs/video_run_rest.log 2>&1 &
 ```
 `scenes` outputs are cached in the batch's `WORK/scenes/` dir, so re-running the chain after an edit-to-score step doesn't redo detection. The worker is also resumable per `queue_id` — if killed mid-flight, just relaunch.
--- a/work/run_video_pipeline.sh
+++ b/work/run_video_pipeline.sh
@@ -0,0 +1,123 @@
 #!/bin/bash
 # Generic chain driver for the video target preprocessing pipeline.
 #
 # Usage:
 #   WORK=/path/to/workdir SKIP_PATTERN='ct_src_(0001[015]|005[0-9]|006[0-9])\.mp4' \
 #     bash run_video_pipeline.sh > /opt/face-sets/work/logs/<name>.log 2>&1
 #
 # Required env vars:
 #   WORK         per-batch workdir (will hold scenes/, queue.json, results.jsonl, plan.json, review/)
 #
 # Optional env vars:
 #   INPUT_DIR    default /mnt/x/src/vd
 #   OUTPUT_DIR   default /mnt/x/src/vd/ct
 #   FILTER_FROM  basename floor; only files with name >= this go in (e.g. ct_src_00050.mp4)
 #   SKIP_PATTERN regex of basenames to exclude (Python `re` syntax). Applied AFTER FILTER_FROM.
 #   MAX_DUR      score --max-dur (default 120)
 #   IDENTITY     "yes" to enable identity tagging; default "no"
 set -e
 : ${WORK:?WORK env var must point at a workdir}
 : ${INPUT_DIR:=/mnt/x/src/vd}
 : ${OUTPUT_DIR:=/mnt/x/src/vd/ct}
 : ${MAX_DUR:=120}
 : ${IDENTITY:=no}
 mkdir -p "$WORK" "$WORK/scenes"
 PY_WSL=/home/peter/face_sort_env/bin/python
 PY_WIN="/mnt/c/face_embed_venv/Scripts/python.exe"
 PIPELINE=/opt/face-sets/work/video_target_pipeline.py
 WORKER=/opt/face-sets/work/video_face_worker.py
 INVENTORY_FULL=/opt/face-sets/work/video_preprocess/inventory_full.json
 ts() { date +"%Y-%m-%d %H:%M:%S"; }
 log() { echo "[$(ts)] [$PHASE] $*"; }
 PHASE="setup"
 log "STARTED — host=$(hostname) pid=$$ work=$WORK"
 log "config: input=$INPUT_DIR output=$OUTPUT_DIR filter_from=${FILTER_FROM:-<none>} skip_pattern=${SKIP_PATTERN:-<none>} max_dur=$MAX_DUR identity=$IDENTITY"
 PHASE="inventory"
 log "building subset inventory"
 T0=$(date +%s)
 # rebuild full inventory if missing
 if [ ! -f "$INVENTORY_FULL" ]; then
    log "(no full inventory cached — running fresh scan)"
    $PY_WSL $PIPELINE scan --input "$INPUT_DIR" --output-dir "$OUTPUT_DIR" --out "$INVENTORY_FULL"
 fi
 $PY_WSL <<EOF
 import json, re
 from pathlib import Path
 inv = json.load(open('$INVENTORY_FULL'))
 subset = list(inv['videos'])
 filter_from = '${FILTER_FROM}'
 skip_pat = '${SKIP_PATTERN}'
 if filter_from:
    subset = [v for v in subset if Path(v['path']).name >= filter_from]
 if skip_pat:
    pat = re.compile(skip_pat)
    subset = [v for v in subset if not pat.search(Path(v['path']).name)]
 subset.sort(key=lambda v: v['path'])
 inv['videos'] = subset
 json.dump(inv, open('$WORK/inventory.json','w'), indent=2)
 total_dur = sum(v.get('duration_s', 0) for v in inv['videos'] if 'error' not in v)
 print(f'  {len(inv["videos"])} videos, total {total_dur/3600:.2f}h input')
 EOF
 log "done in $(($(date +%s)-T0))s"
 PHASE="scenes"
 log "PySceneDetect AdaptiveDetector across all videos (cached entries skipped)"
 T0=$(date +%s)
 $PY_WSL $PIPELINE scenes --inventory "$WORK/inventory.json" --out-dir "$WORK/scenes"
 log "done in $(($(date +%s)-T0))s"
 PHASE="stage"
 log "building frame queue @ 2 fps within scenes"
 T0=$(date +%s)
 $PY_WSL $PIPELINE stage --inventory "$WORK/inventory.json" --scenes-dir "$WORK/scenes" --out "$WORK/queue.json"
 log "done in $(($(date +%s)-T0))s"
 PHASE="worker"
 log "Windows DML face detect+embed (resumable; the slow one)"
 T0=$(date +%s)
 $PY_WIN $WORKER "$WORK/queue.json" "$WORK/results.json"
 log "done in $(($(date +%s)-T0))s"
 PHASE="merge"
 log "ingesting worker output (jsonl)"
 T0=$(date +%s)
 $PY_WSL $PIPELINE merge --results "$WORK/results.json" --out "$WORK/frames.json"
 log "done in $(($(date +%s)-T0))s"
 PHASE="track"
 log "stitching detections into tracks"
 T0=$(date +%s)
 $PY_WSL $PIPELINE track --frames "$WORK/frames.json" --scenes-dir "$WORK/scenes" \
  --inventory "$WORK/inventory.json" --out "$WORK/tracks.json"
 log "done in $(($(date +%s)-T0))s"
 PHASE="score"
 log "scoring with relaxed gates + max-dur=$MAX_DUR identity=$IDENTITY"
 T0=$(date +%s)
 ID_FLAG=""
 if [ "$IDENTITY" != "yes" ]; then ID_FLAG="--no-identity"; fi
 $PY_WSL $PIPELINE score --tracks "$WORK/tracks.json" --inventory "$WORK/inventory.json" \
  --out "$WORK/plan.json" --max-dur "$MAX_DUR" $ID_FLAG
 log "done in $(($(date +%s)-T0))s"
 PHASE="cut"
 log "ffmpeg stream-copy into per-source subfolders (no --clean)"
 T0=$(date +%s)
 $PY_WSL $PIPELINE cut --plan "$WORK/plan.json" --output-dir "$OUTPUT_DIR"
 log "done in $(($(date +%s)-T0))s"
 PHASE="report"
 log "rendering HTML"
 T0=$(date +%s)
 $PY_WSL $PIPELINE report --plan "$WORK/plan.json" --output-dir "$OUTPUT_DIR" --out "$WORK/review"
 log "done in $(($(date +%s)-T0))s"
 PHASE="done"
 log "PIPELINE COMPLETE — review at file://$WORK/review/index.html"
--- a/work/status_video_pipeline.sh
+++ b/work/status_video_pipeline.sh
@@ -0,0 +1,32 @@
 #!/bin/bash
 # Generic status helper for run_video_pipeline.sh.
 # Usage: bash status_video_pipeline.sh <log_file>
 # Defaults to /opt/face-sets/work/logs/video_run.log if no arg.
 LOG="${1:-/opt/face-sets/work/logs/video_run.log}"
 if [ ! -f "$LOG" ]; then
    echo "no log at $LOG yet"
    exit 0
 fi
 echo "=== last 8 log lines ==="
 tail -8 "$LOG"
 echo
 # worker progress
 last=$(grep -E "^\[scan\] [0-9]+/[0-9]+" "$LOG" | tail -1)
 if [ -n "$last" ]; then
    echo "=== DML worker progress ==="
    echo "  $last"
 fi
 # total elapsed
 start_epoch=$(head -1 "$LOG" | sed 's/.*\[\(.*\)\].*\[setup\].*/\1/' | xargs -I{} date -d "{}" +%s 2>/dev/null)
 now_epoch=$(date +%s)
 if [ -n "$start_epoch" ] && [ "$start_epoch" != "" ] 2>/dev/null; then
    elapsed=$((now_epoch - start_epoch))
    h=$((elapsed / 3600))
    m=$(( (elapsed % 3600) / 60 ))
    echo "  elapsed: ${h}h${m}m"
 fi
--- a/work/video_face_worker.py
+++ b/work/video_face_worker.py
@@ -0,0 +1,274 @@
 """Windows / DirectML video frame face worker.
 Reads a queue.json from /opt/face-sets/work/video_target_pipeline.py:stage
 (WSL side), each entry: {video_path, win_video_path, frame_idx, time_s,
 queue_id}. Decodes frame N from the video, runs insightface FaceAnalysis,
 emits per-face records (bbox, det_score, pose, embedding, face_short).
 CLI:
    py -3.12 video_face_worker.py <queue.json> <out_results.json> [--limit N]
 Resumable: existing entries in out_results.json with the same queue_id are
 skipped.
 """
 from __future__ import annotations
 import argparse
 import json
 import os
 import sys
 import time
 from pathlib import Path
 import numpy as np
 import cv2
 from insightface.app import FaceAnalysis
 MODEL_ROOT = r"C:\face_embed_venv\models"
 MIN_DET = 0.5
 MIN_FACE_PIX = 40
 FLUSH_EVERY = 100
 def jsonl_path_for(out_path: Path) -> Path:
    """Sister JSONL file: one result-record per line, append-only."""
    return out_path.with_suffix(".jsonl")
 def load_existing(out_path: Path):
    """Load existing results from .jsonl (preferred) or legacy .json (one-time conversion).
    Returns (records_list, processed_set)."""
    jsonl = jsonl_path_for(out_path)
    if jsonl.exists():
        records = []
        processed = set()
        with open(jsonl) as f:
            for line_num, line in enumerate(f, 1):
                line = line.strip()
                if not line:
                    continue
                try:
                    r = json.loads(line)
                    records.append(r)
                    if r.get("queue_id"):
                        processed.add(r["queue_id"])
                except json.JSONDecodeError:
                    print(f"[warn] {jsonl}:{line_num} corrupt; skipping", file=sys.stderr)
        return records, processed
    # legacy JSON support: load once, convert to JSONL
    if out_path.exists():
        try:
            d = json.loads(out_path.read_text())
            records = d.get("results", [])
            processed = set(d.get("processed", []))
            print(f"[migrate] converting {len(records)} legacy JSON records to JSONL", file=sys.stderr)
            with open(jsonl, "w") as f:
                for r in records:
                    f.write(json.dumps(r) + "\n")
            return records, processed
        except Exception as e:
            print(f"[warn] could not parse {out_path}: {e}; starting fresh", file=sys.stderr)
    return [], set()
 def append_records(out_path: Path, new_records: list):
    """Append-only write to the sister .jsonl file. No re-serialization of prior records."""
    if not new_records:
        return
    jsonl = jsonl_path_for(out_path)
    with open(jsonl, "a") as f:
        for r in new_records:
            f.write(json.dumps(r) + "\n")
 def write_compat_summary(out_path: Path, total_records: int, processed: set):
    """Write a tiny JSON pointer file at the legacy out_path so older consumers
    still see *something*, but the canonical store is the .jsonl. Cheap."""
    summary = {
        "_format": "jsonl-pointer",
        "_jsonl": str(jsonl_path_for(out_path).name),
        "results_count": total_records,
        "processed_count": len(processed),
    }
    tmp = out_path.with_suffix(".tmp.json")
    tmp.write_text(json.dumps(summary, indent=2))
    os.replace(tmp, out_path)
 def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("queue", type=Path)
    ap.add_argument("out", type=Path)
    ap.add_argument("--limit", type=int, default=None)
    args = ap.parse_args()
    queue = json.loads(args.queue.read_text())
    print(f"[queue] {len(queue)} entries from {args.queue}", flush=True)
    args.out.parent.mkdir(parents=True, exist_ok=True)
    results, processed = load_existing(args.out)
    if processed:
        print(f"[resume] {len(processed)} already scored", flush=True)
    pending = [e for e in queue if e["queue_id"] not in processed]
    if args.limit is not None:
        pending = pending[: args.limit]
    print(f"[pending] {len(pending)} entries", flush=True)
    if not pending:
        print("[done] nothing to do")
        return
    print("[load] FaceAnalysis with DmlExecutionProvider", flush=True)
    app = FaceAnalysis(
        name="buffalo_l",
        root=MODEL_ROOT,
        providers=["DmlExecutionProvider", "CPUExecutionProvider"],
    )
    app.prepare(ctx_id=0, det_size=(640, 640))
    # group queue by video so we can keep one VideoCapture open and seek
    from collections import defaultdict
    by_video = defaultdict(list)
    for e in pending:
        by_video[e["win_video_path"]].append(e)
    n_done = 0
    n_load_err = 0
    last_flush = time.time()
    t_start = time.time()
    new_buffer: list = []
    def flush():
        # append-only: only NEW records since last flush get written. O(new_records),
        # not O(total_records). Was 11s/flush at 9k records; now <50ms.
        if new_buffer:
            append_records(args.out, new_buffer)
            new_buffer.clear()
        write_compat_summary(args.out, len(results), processed)
    for vidpath, entries in by_video.items():
        # entries are already sorted by frame_idx. Hybrid decode strategy:
        #   1. Seek ONCE to the first pending target (cheap keyframe-seek).
        #   2. Sequential cap.grab() between subsequent targets (decode without
        #      BGR conversion until we reach a target, then cap.retrieve()).
        # This avoids per-sample seek cost (the original pathology that
        # caused 1.4 fps deep in long videos) AND avoids grab-walking from
        # frame 0 on resume (the over-correction that gave 0.08 fps).
        entries.sort(key=lambda e: e["frame_idx"])
        cap = cv2.VideoCapture(vidpath)
        if not cap.isOpened():
            print(f"[err] cannot open {vidpath}", flush=True)
            for e in entries:
                rec = {
                    "queue_id": e["queue_id"], "video_path": e["video_path"],
                    "frame_idx": e["frame_idx"], "time_s": e["time_s"],
                    "faces": [], "error": "cap_open",
                }
                results.append(rec); new_buffer.append(rec)
                processed.add(e["queue_id"])
                n_done += 1
                n_load_err += 1
            continue
        first_target = entries[0]["frame_idx"]
        if first_target > 0:
            cap.set(cv2.CAP_PROP_POS_FRAMES, first_target)
            cur_frame_idx = first_target - 1
        else:
            cur_frame_idx = -1
        for e in entries:
            target = e["frame_idx"]
            if target < cur_frame_idx + 1:
                # backward jump (only triggers for unsorted entries — defensive)
                cap.set(cv2.CAP_PROP_POS_FRAMES, target)
                cur_frame_idx = target - 1
            # advance up to (but not including) target via grab()-only
            ran_out = False
            while cur_frame_idx + 1 < target:
                ok = cap.grab()
                if not ok:
                    ran_out = True
                    break
                cur_frame_idx += 1
            if not ran_out:
                ok = cap.grab()
                if not ok:
                    ran_out = True
                else:
                    cur_frame_idx = target
            if ran_out:
                rec = {
                    "queue_id": e["queue_id"], "video_path": e["video_path"],
                    "frame_idx": e["frame_idx"], "time_s": e["time_s"],
                    "faces": [], "error": "frame_read",
                }
                results.append(rec); new_buffer.append(rec)
                processed.add(e["queue_id"])
                n_done += 1
                n_load_err += 1
                continue
            ok, bgr = cap.retrieve()
            if not ok or bgr is None:
                rec = {
                    "queue_id": e["queue_id"], "video_path": e["video_path"],
                    "frame_idx": e["frame_idx"], "time_s": e["time_s"],
                    "faces": [], "error": "frame_read",
                }
                results.append(rec); new_buffer.append(rec)
                processed.add(e["queue_id"])
                n_done += 1
                n_load_err += 1
                continue
            faces = app.get(bgr)
            kept_faces = []
            H, W = bgr.shape[:2]
            for f in faces:
                if float(f.det_score) < MIN_DET:
                    continue
                x1, y1, x2, y2 = [int(round(v)) for v in f.bbox]
                x1 = max(x1, 0); y1 = max(y1, 0)
                x2 = min(x2, W); y2 = min(y2, H)
                w, h = x2 - x1, y2 - y1
                short = min(w, h)
                if short < MIN_FACE_PIX:
                    continue
                rec = {
                    "bbox": [x1, y1, x2, y2],
                    "det_score": float(f.det_score),
                    "face_short": int(short),
                }
                if hasattr(f, "pose") and f.pose is not None:
                    rec["pose"] = [float(x) for x in f.pose]   # pitch, yaw, roll
                if hasattr(f, "normed_embedding") and f.normed_embedding is not None:
                    rec["embedding"] = f.normed_embedding.astype(np.float32).tolist()
                kept_faces.append(rec)
            rec = {
                "queue_id": e["queue_id"], "video_path": e["video_path"],
                "frame_idx": e["frame_idx"], "time_s": e["time_s"],
                "frame_w": W, "frame_h": H,
                "faces": kept_faces,
            }
            results.append(rec); new_buffer.append(rec)
            processed.add(e["queue_id"])
            n_done += 1
            if (n_done % FLUSH_EVERY == 0) or (time.time() - last_flush) > 30.0:
                flush()
                last_flush = time.time()
                el = time.time() - t_start
                rate = n_done / max(0.1, el)
                eta = (len(pending) - n_done) / max(0.1, rate) / 60.0
                print(f"[scan] {n_done}/{len(pending)} rate={rate:.2f} fps eta={eta:.1f}min "
                      f"errs={n_load_err}", flush=True)
        cap.release()
    flush()
    el = time.time() - t_start
    print(f"[done] {n_done} scored, {n_load_err} errors, {el:.1f}s "
          f"({n_done/max(0.1,el):.2f} fps) -> {args.out}", flush=True)
 if __name__ == "__main__":
    main()
--- a/work/video_target_pipeline.py
+++ b/work/video_target_pipeline.py
@@ -0,0 +1,917 @@
 """Video target preprocessing pipeline for roop-unleashed.
 Discovers video files in an input folder, runs scene-cut detection, samples
 frames within each scene, runs face detection + embedding via Windows DML
 worker, stitches per-frame detections into face tracks, applies quality
 gates, cuts approved segments out with ffmpeg stream-copy, and writes a
 report. Output clips have generic UUID names + a sidecar JSON with full
 provenance.
 Subcommands:
  scan      list input videos, run ffprobe, write per-video index
  scenes    PySceneDetect AdaptiveDetector per video; write scenes_<basename>.json
  stage     write frame queue.json (sampled @ 2 fps within scenes)
  merge     ingest worker results.json into per-video frame_results
  track     IoU+embedding stitching of per-frame detections into tracks
  score     track-level quality gating + segment plan
  cut       ffmpeg -c copy each accepted segment to <out_dir>/<uuid>.mp4
  report    HTML preview with thumbnails + identity tags
 """
 from __future__ import annotations
 import argparse
 import json
 import math
 import re
 import shutil
 import subprocess
 import sys
 import time
 import uuid
 from collections import defaultdict
 from pathlib import Path
 import numpy as np
 DEFAULT_INPUT = Path("/mnt/x/src/vd")
 DEFAULT_OUTPUT = Path("/mnt/x/src/vd/ct")
 WORK_DIR = Path("/opt/face-sets/work/video_preprocess")
 # defaults — first set was strict-portrait; second set loosened for side-profile + segment merging
 SAMPLE_FPS = 2.0
 QUALITY_YAW_MAX = 75.0      # was 25; allow full 3/4 + profile (face-sets handle it)
 QUALITY_PITCH_MAX = 45.0    # was 30
 QUALITY_FACE_MIN = 80       # was 96
 QUALITY_BLUR_MIN = 50.0
 QUALITY_DET_MIN = 0.5       # was 0.6
 TRACK_GATE_FRAC = 0.7       # >=70% of frames in track must pass per-frame gates
 SEGMENT_MIN_S = 1.0
 SEGMENT_MAX_S = 30.0        # was 10
 SEGMENT_BRIDGE_S = 3.0      # was 1.0 — within-track pose-failure bridging
 SEGMENT_MERGE_GAP_S = 2.0   # NEW — across-track merge if same scene + within this gap
 TRACK_IOU_MIN = 0.3
 TRACK_EMB_MIN = 0.5
 CACHES = [
    Path("/opt/face-sets/work/cache/nl_full.npz"),
    Path("/opt/face-sets/work/cache/immich_peter.npz"),
    Path("/opt/face-sets/work/cache/immich_nic.npz"),
 ]
 FACESETS_ROOT = Path("/mnt/e/temp_things/fcswp/nl_sorted/facesets_swap_ready")
 IDENTITY_TAG_THRESHOLD = 0.6  # cosine sim to faceset centroid
 def wsl_to_win(p: str) -> str:
    s = str(p)
    if s.startswith("/mnt/"):
        return f"{s[5].upper()}:\\{s[7:].replace('/', chr(92))}"
    return s
 # ----------------------------- ffprobe / scan -----------------------------
 def ffprobe(video: Path) -> dict:
    cmd = [
        "ffprobe", "-v", "error", "-print_format", "json",
        "-show_format", "-show_streams", str(video),
    ]
    r = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
    if r.returncode != 0:
        return {"error": r.stderr.strip()}
    return json.loads(r.stdout)
 def parse_video_meta(probe: dict) -> dict:
    if "error" in probe:
        return {"error": probe["error"]}
    fmt = probe.get("format", {})
    duration = float(fmt.get("duration", 0))
    vstream = next((s for s in probe.get("streams", []) if s.get("codec_type") == "video"), None)
    if vstream is None:
        return {"error": "no video stream"}
    fps_str = vstream.get("avg_frame_rate", "0/1")
    try:
        num, den = (int(x) for x in fps_str.split("/"))
        fps = num / den if den else 0.0
    except Exception:
        fps = 0.0
    nb_frames = int(vstream.get("nb_frames", 0)) or int(round(duration * fps))
    return {
        "duration_s": duration,
        "fps": fps,
        "frames": nb_frames,
        "width": int(vstream.get("width", 0)),
        "height": int(vstream.get("height", 0)),
        "codec": vstream.get("codec_name"),
    }
 def cmd_scan(args):
    in_dir = Path(args.input)
    out = Path(args.out)
    out.parent.mkdir(parents=True, exist_ok=True)
    extensions = {".mp4", ".mov", ".mkv", ".m4v", ".avi", ".webm"}
    out_root = Path(args.output_dir).resolve()
    videos = []
    for p in sorted(in_dir.iterdir() if not args.recursive else in_dir.rglob("*")):
        if not p.is_file():
            continue
        if out_root in p.parents or p.resolve() == out_root:
            continue  # never include the output dir
        if p.suffix.lower() not in extensions:
            continue
        videos.append(p)
    print(f"[scan] {len(videos)} candidate videos", file=sys.stderr)
    inventory = []
    for p in videos:
        meta = parse_video_meta(ffprobe(p))
        meta["path"] = str(p)
        meta["win_path"] = wsl_to_win(str(p))
        meta["size"] = p.stat().st_size
        inventory.append(meta)
        if "error" not in meta:
            print(f"  {p.name}: {meta['duration_s']:.1f}s @ {meta['fps']:.1f}fps "
                  f"{meta['width']}x{meta['height']} {meta['codec']}", file=sys.stderr)
        else:
            print(f"  {p.name}: ERROR {meta['error']}", file=sys.stderr)
    out.write_text(json.dumps({"input": str(in_dir), "videos": inventory}, indent=2))
    print(f"[scan] inventory -> {out}", file=sys.stderr)
 # ----------------------------- scenes -----------------------------
 def cmd_scenes(args):
    from scenedetect import open_video, SceneManager
    from scenedetect.detectors import AdaptiveDetector
    inv = json.loads(Path(args.inventory).read_text())
    out_dir = Path(args.out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)
    only = set(args.only.split(",")) if args.only else None
    for v in inv["videos"]:
        if "error" in v:
            continue
        path = Path(v["path"])
        if only and path.name not in only:
            continue
        out_file = out_dir / (path.stem + ".scenes.json")
        if out_file.exists() and not args.force:
            continue
        print(f"[scenes] {path.name} ...", file=sys.stderr, flush=True)
        t0 = time.time()
        try:
            video = open_video(str(path))
            sm = SceneManager()
            sm.add_detector(AdaptiveDetector(min_scene_len=int(round(v.get("fps", 30) or 30) * 0.5)))
            sm.detect_scenes(video, show_progress=False)
            scenes = sm.get_scene_list()
            entries = []
            for s, e in scenes:
                entries.append({
                    "start_frame": s.frame_num, "end_frame": e.frame_num,
                    "start_s": s.get_seconds(), "end_s": e.get_seconds(),
                    "duration_s": e.get_seconds() - s.get_seconds(),
                })
            # if no cuts found, treat the whole video as one scene
            if not entries:
                entries = [{
                    "start_frame": 0, "end_frame": v["frames"],
                    "start_s": 0.0, "end_s": v["duration_s"],
                    "duration_s": v["duration_s"],
                }]
            out_file.write_text(json.dumps({"video": str(path), "scenes": entries}, indent=2))
            print(f"  {len(entries)} scenes in {time.time()-t0:.1f}s -> {out_file.name}",
                  file=sys.stderr)
        except Exception as e:
            print(f"  ERROR: {e}", file=sys.stderr)
 # ----------------------------- stage -----------------------------
 def cmd_stage(args):
    inv = json.loads(Path(args.inventory).read_text())
    scenes_dir = Path(args.scenes_dir)
    queue = []
    qid = 0
    sample_every = 1.0 / args.sample_fps
    for v in inv["videos"]:
        if "error" in v:
            continue
        p = Path(v["path"])
        sf = scenes_dir / (p.stem + ".scenes.json")
        if not sf.exists():
            print(f"[warn] no scenes file for {p.name}; skipping", file=sys.stderr)
            continue
        scenes = json.loads(sf.read_text()).get("scenes", [])
        fps = v.get("fps", 30) or 30
        for sc in scenes:
            t = sc["start_s"]
            while t < sc["end_s"] - 0.01:
                fidx = int(round(t * fps))
                if fidx >= v["frames"]:
                    break
                queue.append({
                    "queue_id": f"q{qid:08d}",
                    "video_path": str(p),
                    "win_video_path": v["win_path"],
                    "frame_idx": fidx,
                    "time_s": t,
                })
                qid += 1
                t += sample_every
    out = Path(args.out)
    out.parent.mkdir(parents=True, exist_ok=True)
    out.write_text(json.dumps(queue, indent=2))
    print(f"[stage] {len(queue)} sampled frames @ {args.sample_fps} fps -> {out}",
          file=sys.stderr)
    print(f"[stage] win path for worker: {wsl_to_win(str(out))}", file=sys.stderr)
 # ----------------------------- merge + track -----------------------------
 def cmd_merge(args):
    """Read worker output and group by video_path. Supports either JSONL (one record
    per line, the new format) or legacy JSON (results.json with `results` list)."""
    src_path = Path(args.results)
    records = []
    # try JSONL first (sister .jsonl file or .results passed directly)
    jsonl_candidate = src_path.with_suffix(".jsonl")
    if jsonl_candidate.exists():
        with open(jsonl_candidate) as f:
            for line in f:
                line = line.strip()
                if line:
                    records.append(json.loads(line))
    elif src_path.suffix == ".jsonl":
        with open(src_path) as f:
            for line in f:
                line = line.strip()
                if line:
                    records.append(json.loads(line))
    else:
        # legacy: monolithic JSON
        src = json.loads(src_path.read_text())
        records = src.get("results", [])
    by_video: dict[str, list] = {}
    for r in records:
        by_video.setdefault(r["video_path"], []).append(r)
    for v in by_video:
        by_video[v].sort(key=lambda x: x["frame_idx"])
    out = Path(args.out)
    out.parent.mkdir(parents=True, exist_ok=True)
    out.write_text(json.dumps({"by_video": by_video}, indent=2))
    print(f"[merge] {sum(len(v) for v in by_video.values())} frames across {len(by_video)} videos "
          f"-> {out}", file=sys.stderr)
 def _iou(a, b):
    ax1, ay1, ax2, ay2 = a
    bx1, by1, bx2, by2 = b
    ix1 = max(ax1, bx1); iy1 = max(ay1, by1)
    ix2 = min(ax2, bx2); iy2 = min(ay2, by2)
    iw = max(ix2 - ix1, 0); ih = max(iy2 - iy1, 0)
    inter = iw * ih
    ua = (ax2 - ax1) * (ay2 - ay1) + (bx2 - bx1) * (by2 - by1) - inter
    return inter / ua if ua > 0 else 0.0
 def cmd_track(args):
    """Stitch per-frame face detections into tracks within each scene of each video.
    Track = list of (frame_idx, face_idx) where adjacent samples have IoU>=0.3 OR
    cosine(emb)>=0.5. New face → new track. No cross-scene merging."""
    fr = json.loads(Path(args.frames).read_text())
    scenes_dir = Path(args.scenes_dir)
    inv = json.loads(Path(args.inventory).read_text())
    inv_by_path = {v["path"]: v for v in inv["videos"]}
    all_video_tracks: dict[str, list] = {}
    for video_path, frames in fr["by_video"].items():
        v = inv_by_path.get(video_path, {})
        sf = scenes_dir / (Path(video_path).stem + ".scenes.json")
        scenes = json.loads(sf.read_text()).get("scenes", []) if sf.exists() else []
        # group frames by scene
        scene_for_frame = {}
        for si, sc in enumerate(scenes):
            for f in frames:
                if f["frame_idx"] >= sc["start_frame"] and f["frame_idx"] < sc["end_frame"]:
                    scene_for_frame.setdefault(si, []).append(f)
        video_tracks = []
        for si, scene_frames in scene_for_frame.items():
            scene_frames.sort(key=lambda x: x["frame_idx"])
            # tracks = list of dict{ "members": [(frame_idx, face_idx, face_dict)], "last_bbox", "last_emb" }
            tracks = []
            for f in scene_frames:
                claimed = set()
                for face_idx, face in enumerate(f.get("faces", [])):
                    bbox = face["bbox"]
                    emb = np.array(face.get("embedding", []), dtype=np.float32) if face.get("embedding") else None
                    best_track = None
                    best_score = 0.0
                    for ti, tr in enumerate(tracks):
                        if ti in claimed:
                            continue
                        # staleness in TIME (sample period independent of source fps)
                        last_time = tr["members"][-1][3]
                        if f["time_s"] - last_time > 1.5:  # stale if >1.5s gap (3 sample periods @ 2fps)
                            continue
                        score = _iou(tr["last_bbox"], bbox)
                        if emb is not None and tr.get("last_emb") is not None:
                            score = max(score, float(np.dot(tr["last_emb"], emb)))
                        if score > best_score:
                            best_score = score
                            best_track = ti
                    if best_track is not None and best_score >= min(TRACK_IOU_MIN, TRACK_EMB_MIN):
                        tr = tracks[best_track]
                        tr["members"].append((f["frame_idx"], face_idx, face, f["time_s"]))
                        tr["last_bbox"] = bbox
                        if emb is not None:
                            tr["last_emb"] = emb
                        claimed.add(best_track)
                    else:
                        tracks.append({
                            "members": [(f["frame_idx"], face_idx, face, f["time_s"])],
                            "last_bbox": bbox,
                            "last_emb": emb,
                        })
            for tr in tracks:
                if len(tr["members"]) < 2:
                    continue
                video_tracks.append({
                    "scene_idx": si,
                    "members": [
                        {"frame_idx": m[0], "face_idx": m[1], "time_s": m[3], "face": m[2]}
                        for m in tr["members"]
                    ],
                })
        all_video_tracks[video_path] = video_tracks
        print(f"[track] {Path(video_path).name}: {sum(len(s) for s in scene_for_frame.values())} frames "
              f"-> {len(video_tracks)} tracks across {len(scene_for_frame)} scenes",
              file=sys.stderr)
    out = Path(args.out)
    out.parent.mkdir(parents=True, exist_ok=True)
    out.write_text(json.dumps({"by_video": all_video_tracks}, indent=2))
    print(f"[track] -> {out}", file=sys.stderr)
 # ----------------------------- score (quality gates) -----------------------------
 def _track_passes(track, cfg):
    """Per-frame quality gating; return list of bool (does each member pass) +
    aggregate stats. cfg: dict with yaw_max, pitch_max, face_min, det_min."""
    passes = []
    yaws, pitches, sizes, dets = [], [], [], []
    for m in track["members"]:
        f = m["face"]
        yaw = abs(f.get("pose", [0, 0, 0])[1]) if f.get("pose") else 0
        pitch = abs(f.get("pose", [0, 0, 0])[0]) if f.get("pose") else 0
        size = f.get("face_short", 0)
        det = f.get("det_score", 0)
        ok = (yaw <= cfg["yaw_max"] and pitch <= cfg["pitch_max"]
              and size >= cfg["face_min"] and det >= cfg["det_min"])
        passes.append(ok)
        yaws.append(yaw); pitches.append(pitch); sizes.append(size); dets.append(det)
    return passes, {
        "n": len(passes), "n_pass": sum(passes), "frac_pass": sum(passes) / max(1, len(passes)),
        "yaw_med": float(np.median(yaws)) if yaws else None,
        "pitch_med": float(np.median(pitches)) if pitches else None,
        "size_med": float(np.median(sizes)) if sizes else None,
        "det_med": float(np.median(dets)) if dets else None,
    }
 def _build_segments(track, cfg):
    """Return list of (start_s, end_s) accepted sub-segments of this track:
    contiguous runs of passing frames meeting min/max duration. Pose-failure
    spans <= cfg['bridge_s'] long get bridged across (handles momentary head
    turns / detection misses)."""
    passes, stats = _track_passes(track, cfg)
    members = track["members"]
    if not members:
        return [], stats
    # bridge gaps of failing frames (any width) up to cfg["bridge_s"] seconds
    bridged = list(passes)
    n = len(bridged)
    i = 0
    while i < n:
        if bridged[i]:
            i += 1
            continue
        # find run of consecutive False starting at i
        j = i
        while j < n and not bridged[j]:
            j += 1
        # bridge if surrounded by True on both sides AND time gap <= bridge_s
        if i > 0 and j < n and bridged[i - 1] and bridged[j]:
            t_left = members[i - 1]["time_s"]
            t_right = members[j]["time_s"]
            if t_right - t_left <= cfg["bridge_s"]:
                for k in range(i, j):
                    bridged[k] = True
        i = j
    # find runs of True
    runs = []
    i = 0
    while i < n:
        if not bridged[i]:
            i += 1; continue
        j = i
        while j + 1 < n and bridged[j + 1]:
            j += 1
        s = members[i]["time_s"]
        # end is the time of the last passing sample plus one sample-period
        e = members[j]["time_s"] + 1.0 / max(SAMPLE_FPS, 1e-3)
        runs.append((s, e))
        i = j + 1
    return runs, stats
 def _merge_close_segments(segs_with_meta, merge_gap_s: float):
    """Merge segments within the same scene that are within merge_gap_s of each other.
    segs_with_meta: list of dicts with start_s, end_s, scene_idx, track_idx, stats.
    Returns list of merged dicts (one per merged group). Identity-tag and stats
    aggregation happen later."""
    by_scene: dict[int, list] = {}
    for s in segs_with_meta:
        by_scene.setdefault(s["scene_idx"], []).append(s)
    merged_all = []
    for scene_idx, segs in by_scene.items():
        segs.sort(key=lambda x: x["start_s"])
        cur = None
        for s in segs:
            if cur is None:
                cur = {**s, "track_idxs": [s["track_idx"]], "member_count": s["stats"]["n"],
                       "pass_count": s["stats"]["n_pass"]}
                continue
            gap = s["start_s"] - cur["end_s"]
            if gap <= merge_gap_s:
                # merge
                cur["end_s"] = max(cur["end_s"], s["end_s"])
                cur["track_idxs"].append(s["track_idx"])
                cur["member_count"] += s["stats"]["n"]
                cur["pass_count"] += s["stats"]["n_pass"]
                # take the better-quality stats for display
                if s["stats"]["n_pass"] > cur["stats"]["n_pass"]:
                    cur["stats"] = s["stats"]
            else:
                merged_all.append(cur)
                cur = {**s, "track_idxs": [s["track_idx"]], "member_count": s["stats"]["n"],
                       "pass_count": s["stats"]["n_pass"]}
        if cur is not None:
            merged_all.append(cur)
    return merged_all
 def _split_long_segments(segs_with_meta, min_s: float, max_s: float):
    """Apply min/max duration: drop too-short, split too-long evenly."""
    out = []
    for s in segs_with_meta:
        dur = s["end_s"] - s["start_s"]
        if dur < min_s:
            continue
        if dur <= max_s:
            out.append(s)
            continue
        n = int(math.ceil(dur / max_s))
        chunk = dur / n
        base_start = s["start_s"]
        for k in range(n):
            piece = dict(s)
            piece["start_s"] = base_start + k * chunk
            piece["end_s"] = base_start + (k + 1) * chunk
            out.append(piece)
    return out
 # identity tagging via cached arcface centroids
 def load_caches_index():
    rec_index = {}
    alias_map = {}
    for c in CACHES:
        if not c.exists():
            continue
        d = np.load(c, allow_pickle=True)
        emb = d["embeddings"]
        meta = json.loads(str(d["meta"]))
        face_records = [m for m in meta if not m.get("noface")]
        if "path_aliases" in d.files:
            paliases = json.loads(str(d["path_aliases"]))
            for canon, alist in paliases.items():
                alias_map.setdefault(canon, canon)
                for a in alist:
                    alias_map[a] = canon
        for i, rec in enumerate(face_records):
            v = emb[i].astype(np.float32)
            n = float(np.linalg.norm(v))
            if n > 0:
                v = v / n
            rec_index[(rec["path"], tuple(int(x) for x in rec["bbox"]))] = v
            alias_map.setdefault(rec["path"], rec["path"])
    return rec_index, alias_map
 def load_faceset_centroids():
    """Return dict faceset_name -> normalized centroid embedding."""
    rec_index, alias_map = load_caches_index()
    centroids = {}
    for fs_dir in sorted(FACESETS_ROOT.iterdir()):
        if not fs_dir.is_dir() or fs_dir.name.startswith("_"):
            continue
        # exclude era splits to avoid double-tagging within a family
        if re.match(r"^faceset_\d+_(?:\d{4}-\d{2,4}|\d{4}|undated)", fs_dir.name):
            continue
        mp = fs_dir / "manifest.json"
        if not mp.exists():
            continue
        m = json.loads(mp.read_text())
        vecs = []
        for f in m.get("faces", []):
            src = f.get("source"); bbox = f.get("bbox")
            if not src or not bbox:
                continue
            canon = alias_map.get(src, src)
            v = rec_index.get((canon, tuple(int(x) for x in bbox)))
            if v is None and canon != src:
                v = rec_index.get((src, tuple(int(x) for x in bbox)))
            if v is not None:
                vecs.append(v)
        if len(vecs) < 3:
            continue
        c = np.stack(vecs).mean(axis=0)
        n = float(np.linalg.norm(c))
        if n > 0:
            c = c / n
        centroids[fs_dir.name] = c
    return centroids
 def _track_centroid(track):
    embs = [m["face"].get("embedding") for m in track["members"] if m["face"].get("embedding")]
    if not embs:
        return None
    arr = np.array(embs, dtype=np.float32)
    c = arr.mean(axis=0)
    n = float(np.linalg.norm(c))
    return c / n if n > 0 else c
 def cmd_score(args):
    tr = json.loads(Path(args.tracks).read_text())
    inv = json.loads(Path(args.inventory).read_text())
    inv_by_path = {v["path"]: v for v in inv["videos"]}
    cfg = {
        "yaw_max": args.max_yaw, "pitch_max": args.max_pitch,
        "face_min": args.min_face, "det_min": args.min_det,
        "bridge_s": args.bridge_gap,
    }
    centroids = {}
    if not args.no_identity:
        print("[score] loading faceset centroids ...", file=sys.stderr)
        t0 = time.time()
        centroids = load_faceset_centroids()
        print(f"[score]   {len(centroids)} active faceset centroids loaded in {time.time()-t0:.1f}s",
              file=sys.stderr)
    n_total_tracks = 0
    n_accepted_tracks = 0
    # collect per-track candidate segments first; merging happens per-video below
    per_video_candidates: dict[str, list] = {}
    track_centroids_by_video: dict[str, dict] = {}
    for video_path, tracks in tr["by_video"].items():
        per_video_candidates.setdefault(video_path, [])
        track_centroids_by_video.setdefault(video_path, {})
        for ti, track in enumerate(tracks):
            n_total_tracks += 1
            runs, stats = _build_segments(track, cfg)
            if stats["frac_pass"] < args.track_gate_frac:
                continue
            if not runs:
                continue
            n_accepted_tracks += 1
            track_centroids_by_video[video_path][ti] = _track_centroid(track)
            for (s, e) in runs:
                per_video_candidates[video_path].append({
                    "video_path": video_path,
                    "track_idx": ti,
                    "scene_idx": track["scene_idx"],
                    "start_s": s,
                    "end_s": e,
                    "stats": stats,
                })
    plan = []
    for video_path, segs in per_video_candidates.items():
        if not segs:
            continue
        # merge across tracks within the same scene if gap <= merge_gap_s
        merged = _merge_close_segments(segs, args.merge_gap)
        # apply min/max duration (split long, drop short)
        merged = _split_long_segments(merged, args.min_dur, args.max_dur)
        for s in merged:
            tag = None
            tag_sim = None
            # identity from union of contributing tracks' centroids
            if centroids:
                track_centroid_list = [
                    track_centroids_by_video[video_path].get(ti)
                    for ti in s.get("track_idxs", [s.get("track_idx")])
                ]
                track_centroid_list = [c for c in track_centroid_list if c is not None]
                if track_centroid_list:
                    union = np.stack(track_centroid_list).mean(axis=0)
                    nm = float(np.linalg.norm(union))
                    if nm > 0:
                        union = union / nm
                    sims = {name: float(np.dot(c, union)) for name, c in centroids.items()}
                    best = max(sims, key=sims.get)
                    if sims[best] >= IDENTITY_TAG_THRESHOLD:
                        tag = best; tag_sim = round(sims[best], 4)
            plan.append({
                "video_path": video_path,
                "track_idxs": s.get("track_idxs", [s.get("track_idx")]),
                "scene_idx": s["scene_idx"],
                "start_s": round(s["start_s"], 3),
                "end_s": round(s["end_s"], 3),
                "duration_s": round(s["end_s"] - s["start_s"], 3),
                "member_count": s.get("member_count", s["stats"]["n"]),
                "pass_count": s.get("pass_count", s["stats"]["n_pass"]),
                "stats": s["stats"],
                "identity_tag": tag,
                "identity_sim": tag_sim,
                "uuid": uuid.uuid4().hex[:12],
            })
    plan.sort(key=lambda p: (p["video_path"], p["start_s"]))
    out = Path(args.out)
    out.parent.mkdir(parents=True, exist_ok=True)
    out.write_text(json.dumps({
        "thresholds": {
            "yaw_max": args.max_yaw, "pitch_max": args.max_pitch,
            "face_min": args.min_face, "blur_min": QUALITY_BLUR_MIN,
            "det_min": args.min_det, "track_gate_frac": args.track_gate_frac,
            "bridge_s": args.bridge_gap, "merge_gap_s": args.merge_gap,
            "min_dur_s": args.min_dur, "max_dur_s": args.max_dur,
            "identity_tag_threshold": IDENTITY_TAG_THRESHOLD,
        },
        "totals": {
            "tracks_total": n_total_tracks, "tracks_accepted": n_accepted_tracks,
            "segments": len(plan),
        },
        "plan": plan,
    }, indent=2))
    print(f"[score] {n_accepted_tracks}/{n_total_tracks} tracks accepted -> {len(plan)} segments "
          f"-> {out}", file=sys.stderr)
 # ----------------------------- cut -----------------------------
 def cmd_cut(args):
    plan = json.loads(Path(args.plan).read_text())
    out_dir = Path(args.output_dir)
    out_dir.mkdir(parents=True, exist_ok=True)
    if args.clean:
        # remove only existing UUID-named clips + sidecars (12-char hex), keeping any other files
        import re as _re
        uuid_pat = _re.compile(r"^[0-9a-f]{12}\.(mp4|json)$")
        n_removed = 0
        for child in out_dir.iterdir():
            if child.is_file() and uuid_pat.match(child.name):
                child.unlink()
                n_removed += 1
            elif child.is_dir() and _re.match(r"^[A-Za-z0-9_.-]+$", child.name):
                # subfolder of prior runs — clear UUID files inside, then remove if empty
                for inner in child.iterdir():
                    if inner.is_file() and uuid_pat.match(inner.name):
                        inner.unlink()
                        n_removed += 1
                try:
                    child.rmdir()
                except OSError:
                    pass
        if n_removed:
            print(f"[clean] removed {n_removed} prior UUID clips/sidecars", file=sys.stderr)
    n_done = 0
    n_err = 0
    sidecars = []
    for seg in plan["plan"]:
        sub = Path(seg["video_path"]).stem
        seg_dir = out_dir / sub
        seg_dir.mkdir(parents=True, exist_ok=True)
        out_video = seg_dir / f"{seg['uuid']}.mp4"
        if out_video.exists() and not args.force:
            continue
        s = seg["start_s"]; d = seg["duration_s"]
        cmd = [
            "ffmpeg", "-y", "-loglevel", "error",
            "-ss", f"{s}",
            "-i", seg["video_path"],
            "-t", f"{d}",
            "-c", "copy",
            "-avoid_negative_ts", "make_zero",
            str(out_video),
        ]
        r = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
        if r.returncode != 0 or not out_video.exists() or out_video.stat().st_size < 1024:
            print(f"[cut-err] {seg['uuid']} {seg['video_path']}@{s}+{d}: {r.stderr.strip()[:200]}",
                  file=sys.stderr)
            n_err += 1
            if out_video.exists() and out_video.stat().st_size < 1024:
                out_video.unlink()
            continue
        # sidecar (alongside the clip in the source-named subfolder)
        sidecar = seg_dir / f"{seg['uuid']}.json"
        sidecar.write_text(json.dumps({
            "uuid": seg["uuid"],
            "source_video": seg["video_path"],
            "source_basename": Path(seg["video_path"]).name,
            "start_s": s, "end_s": seg["end_s"], "duration_s": d,
            "scene_idx": seg["scene_idx"],
            "track_idxs": seg.get("track_idxs", [seg.get("track_idx")]),
            "member_count": seg.get("member_count"),
            "pass_count": seg.get("pass_count"),
            "stats": seg["stats"],
            "identity_tag": seg["identity_tag"],
            "identity_sim": seg["identity_sim"],
            "thresholds": plan["thresholds"],
        }, indent=2))
        sidecars.append(sidecar)
        n_done += 1
    print(f"[cut] {n_done} clips written, {n_err} errors -> {out_dir}", file=sys.stderr)
 # ----------------------------- report -----------------------------
 def cmd_report(args):
    plan = json.loads(Path(args.plan).read_text())
    out_dir = Path(args.out)
    out_dir.mkdir(parents=True, exist_ok=True)
    thumbs_dir = out_dir / "thumbs"
    thumbs_dir.mkdir(exist_ok=True)
    output_dir = Path(args.output_dir)
    # group by video
    by_video: dict[str, list] = {}
    for seg in plan["plan"]:
        by_video.setdefault(seg["video_path"], []).append(seg)
    # generate thumbs from each segment's first frame via ffmpeg
    print(f"[report] generating thumbs for {len(plan['plan'])} segments", file=sys.stderr)
    for seg in plan["plan"]:
        thumb = thumbs_dir / f"{seg['uuid']}.jpg"
        if thumb.exists():
            continue
        s = seg["start_s"] + 0.1
        cmd = [
            "ffmpeg", "-y", "-loglevel", "error",
            "-ss", f"{s}",
            "-i", seg["video_path"],
            "-frames:v", "1",
            "-vf", "scale=240:-1",
            str(thumb),
        ]
        subprocess.run(cmd, capture_output=True, timeout=30)
    # render
    rows = []
    rows.append("<h1>Video target preprocessing &mdash; review</h1>")
    t = plan["totals"]
    th = plan["thresholds"]
    rows.append(f"<p>Tracks accepted: {t['tracks_accepted']}/{t['tracks_total']}; "
                f"segments emitted: {t['segments']}.<br>"
                f"Thresholds: pose &le;{th['yaw_max']}&deg;yaw / {th['pitch_max']}&deg;pitch, "
                f"face_short &ge;{th['face_min']}px, det &ge;{th['det_min']}, "
                f"track-gate &ge;{int(100*th['track_gate_frac'])}%, "
                f"duration {th['min_dur_s']}–{th['max_dur_s']}s. "
                f"Output dir: <code>{output_dir}</code></p>")
    nav = " · ".join(f"<a href='#v{i}'>{Path(v).name}</a>"
                     for i, v in enumerate(by_video.keys()))
    rows.append(f"<div class='nav'>{nav}</div>")
    for vi, (video_path, segs) in enumerate(by_video.items()):
        rows.append(f"<section id='v{vi}' class='vid'>")
        rows.append(f"<h2>{Path(video_path).name} <small>({len(segs)} segments)</small></h2>")
        rows.append("<div class='cells'>")
        for seg in sorted(segs, key=lambda x: x["start_s"]):
            stats = seg["stats"]
            tag = seg["identity_tag"] or ""
            tag_sim = seg["identity_sim"]
            tag_html = (f"<span class='tag'>{tag} ({tag_sim:.2f})</span>" if tag else "<span class='tag none'>untagged</span>")
            sub_name = Path(seg['video_path']).stem
            rows.append(
                f"<div class='cell'>"
                f"<a href='{output_dir}/{sub_name}/{seg['uuid']}.mp4'><img src='thumbs/{seg['uuid']}.jpg' loading='lazy'></a>"
                f"<div class='meta'>"
                f"<code>{sub_name}/{seg['uuid']}.mp4</code><br>"
                f"{seg['start_s']:.1f}s &rarr; {seg['end_s']:.1f}s ({seg['duration_s']:.1f}s)<br>"
                f"yaw={stats['yaw_med']:.0f}&deg; size={stats['size_med']:.0f}px det={stats['det_med']:.2f}<br>"
                f"pass {stats['n_pass']}/{stats['n']}<br>"
                f"{tag_html}"
                f"</div></div>"
            )
        rows.append("</div></section>")
    html = f"""<!doctype html>
 <html><head><meta charset='utf-8'><title>Video targets review</title>
 <style>
 body {{ font-family: system-ui, sans-serif; background:#111; color:#eee; padding:1em; }}
 h1, h2 {{ margin-top: 1em; }} h2 {{ border-bottom: 1px solid #333; padding-bottom: 4px; }}
 small {{ color:#999; font-weight:normal; }}
 section.vid {{ background:#1a1a1a; border-radius:6px; padding:12px; margin:12px 0; }}
 .cells {{ display:flex; flex-wrap:wrap; gap:8px; }}
 .cell {{ background:#222; border-radius:4px; padding:6px; width:260px; font-size:11px; font-family:monospace; }}
 .cell img {{ width:100%; height:auto; border-radius:3px; }}
 .meta {{ padding-top:4px; line-height:1.4; }}
 .tag {{ display:inline-block; padding:1px 6px; background:#5fa05f; color:#000; border-radius:2px; }}
 .tag.none {{ background:#444; color:#aaa; }}
 .nav {{ position:sticky; top:0; background:#111; padding:.5em 0; border-bottom:1px solid #333; font-size:12px; }}
 a {{ color:#6cf; }}
 code {{ background:#000; padding:1px 4px; border-radius:2px; }}
 </style></head>
 <body>
 {''.join(rows)}
 </body></html>"""
    out_html = out_dir / "index.html"
    out_html.write_text(html)
    print(f"[report] -> {out_html}", file=sys.stderr)
 # ----------------------------- main -----------------------------
 def main():
    ap = argparse.ArgumentParser()
    sub = ap.add_subparsers(dest="cmd", required=True)
    s = sub.add_parser("scan")
    s.add_argument("--input", default=str(DEFAULT_INPUT))
    s.add_argument("--output-dir", default=str(DEFAULT_OUTPUT))
    s.add_argument("--recursive", action="store_true")
    s.add_argument("--out", required=True)
    s.set_defaults(func=cmd_scan)
    sc = sub.add_parser("scenes")
    sc.add_argument("--inventory", required=True)
    sc.add_argument("--out-dir", required=True)
    sc.add_argument("--only", default=None, help="comma-separated basenames to limit run")
    sc.add_argument("--force", action="store_true")
    sc.set_defaults(func=cmd_scenes)
    st = sub.add_parser("stage")
    st.add_argument("--inventory", required=True)
    st.add_argument("--scenes-dir", required=True)
    st.add_argument("--sample-fps", type=float, default=SAMPLE_FPS)
    st.add_argument("--out", required=True)
    st.set_defaults(func=cmd_stage)
    m = sub.add_parser("merge")
    m.add_argument("--results", required=True)
    m.add_argument("--out", required=True)
    m.set_defaults(func=cmd_merge)
    tr = sub.add_parser("track")
    tr.add_argument("--frames", required=True)
    tr.add_argument("--scenes-dir", required=True)
    tr.add_argument("--inventory", required=True)
    tr.add_argument("--sample-fps", type=float, default=SAMPLE_FPS)
    tr.add_argument("--out", required=True)
    tr.set_defaults(func=cmd_track)
    sc2 = sub.add_parser("score")
    sc2.add_argument("--tracks", required=True)
    sc2.add_argument("--inventory", required=True)
    sc2.add_argument("--out", required=True)
    sc2.add_argument("--no-identity", action="store_true")
    sc2.add_argument("--max-yaw", type=float, default=QUALITY_YAW_MAX)
    sc2.add_argument("--max-pitch", type=float, default=QUALITY_PITCH_MAX)
    sc2.add_argument("--min-face", type=int, default=QUALITY_FACE_MIN)
    sc2.add_argument("--min-det", type=float, default=QUALITY_DET_MIN)
    sc2.add_argument("--track-gate-frac", type=float, default=TRACK_GATE_FRAC)
    sc2.add_argument("--bridge-gap", type=float, default=SEGMENT_BRIDGE_S,
                     help="bridge within-track failure gaps up to this many seconds")
    sc2.add_argument("--merge-gap", type=float, default=SEGMENT_MERGE_GAP_S,
                     help="merge across-track segments in same scene if within this gap")
    sc2.add_argument("--min-dur", type=float, default=SEGMENT_MIN_S)
    sc2.add_argument("--max-dur", type=float, default=SEGMENT_MAX_S)
    sc2.set_defaults(func=cmd_score)
    cu = sub.add_parser("cut")
    cu.add_argument("--plan", required=True)
    cu.add_argument("--output-dir", default=str(DEFAULT_OUTPUT))
    cu.add_argument("--force", action="store_true")
    cu.add_argument("--clean", action="store_true",
                    help="remove prior UUID-named clips before cutting (preserves non-UUID files)")
    cu.set_defaults(func=cmd_cut)
    rp = sub.add_parser("report")
    rp.add_argument("--plan", required=True)
    rp.add_argument("--output-dir", default=str(DEFAULT_OUTPUT))
    rp.add_argument("--out", required=True)
    rp.set_defaults(func=cmd_report)
    args = ap.parse_args()
    args.func(args)
 if __name__ == "__main__":
    main()