Make per-clip sidecar JSONs opt-in (default off)

Previously every video_target_pipeline cut wrote a <uuid>.json provenance
sidecar alongside each <uuid>.mp4. The same provenance is already in the
per-batch plan.json, so the per-clip sidecars are redundant unless a
downstream tool wants each clip self-describing in isolation.

- video_target_pipeline.py cut: new --write-sidecar flag, default off.
- run_video_pipeline.sh: new SIDECAR env var (default "no"), passes
  --write-sidecar when SIDECAR=yes.
- README + docs/analysis/video-target-preprocessing.md updated.

The 1,984 already-emitted sidecars in /mnt/x/src/vd/ct/ct_src_*/ have
been deleted (1.5 MB).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-28 12:44:27 +02:00
parent 998fa79f81
commit 7960dec350
4 changed files with 29 additions and 22 deletions

View File

@@ -15,6 +15,7 @@
# SKIP_PATTERN regex of basenames to exclude (Python `re` syntax). Applied AFTER FILTER_FROM.
# MAX_DUR score --max-dur (default 120)
# IDENTITY "yes" to enable identity tagging; default "no"
# SIDECAR "yes" to emit <uuid>.json provenance sidecars; default "no"
set -e
@@ -23,6 +24,7 @@ set -e
: ${OUTPUT_DIR:=/mnt/x/src/vd/ct}
: ${MAX_DUR:=120}
: ${IDENTITY:=no}
: ${SIDECAR:=no}
mkdir -p "$WORK" "$WORK/scenes"
@@ -37,7 +39,7 @@ log() { echo "[$(ts)] [$PHASE] $*"; }
PHASE="setup"
log "STARTED — host=$(hostname) pid=$$ work=$WORK"
log "config: input=$INPUT_DIR output=$OUTPUT_DIR filter_from=${FILTER_FROM:-<none>} skip_pattern=${SKIP_PATTERN:-<none>} max_dur=$MAX_DUR identity=$IDENTITY"
log "config: input=$INPUT_DIR output=$OUTPUT_DIR filter_from=${FILTER_FROM:-<none>} skip_pattern=${SKIP_PATTERN:-<none>} max_dur=$MAX_DUR identity=$IDENTITY sidecar=$SIDECAR"
PHASE="inventory"
log "building subset inventory"
@@ -110,7 +112,9 @@ log "done in $(($(date +%s)-T0))s"
PHASE="cut"
log "ffmpeg stream-copy into per-source subfolders (no --clean)"
T0=$(date +%s)
$PY_WSL $PIPELINE cut --plan "$WORK/plan.json" --output-dir "$OUTPUT_DIR"
SIDECAR_FLAG=""
if [ "$SIDECAR" = "yes" ]; then SIDECAR_FLAG="--write-sidecar"; fi
$PY_WSL $PIPELINE cut --plan "$WORK/plan.json" --output-dir "$OUTPUT_DIR" $SIDECAR_FLAG
log "done in $(($(date +%s)-T0))s"
PHASE="report"