Update video preprocessing doc with full-corpus results
After completing the rest-of-corpus run, update docs/analysis to reflect the final numbers across all three batches (test + 13-file + 45-file) and surface the numerical lessons: - 1,984 segments / 10.78h accepted content from 19.76h / 61 input videos - 0 worker errors across 143,137 sampled frames - rest batch sustained 15.78 fps from a fresh JSONL start (vs 7.5 fps for the migrated batch), confirming the append-only fix is the right steady-state design - skip-pattern note: 5-digit basename numbers need full padding (0005[0-9] not 005[0-9]) — bit me on the first relaunch - documented SIDECAR=yes opt-in for the chain script Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -82,30 +82,34 @@ Skipped. Per [Microsoft's WSL D3D12 video acceleration post](https://devblogs.mi
|
||||
|
||||
For cutting we use `-c copy` stream-copy — no re-encode, hardware codecs are moot.
|
||||
|
||||
## 6. First batch run results (ct_src_00050..00062)
|
||||
## 6. Full corpus run results
|
||||
|
||||
| | |
|
||||
|---|---:|
|
||||
| input videos | 13 |
|
||||
| input duration | 6.18 h |
|
||||
| sampled frames | 44,635 (@ 2 fps) |
|
||||
| accepted tracks | 1,193 / 2,564 (47 %) |
|
||||
| **emitted segments** | **600** |
|
||||
| segments built from ≥2 tracks (cross-track merge fired) | 254 |
|
||||
| accepted content total | 239.5 min (64.6 % of input) |
|
||||
| segment duration min/median/mean/max | 1 / 12 / 24 / 119 s |
|
||||
| output size | 3.63 GB |
|
||||
Three runs across the 61-video corpus at `/mnt/x/src/vd/`:
|
||||
|
||||
Phase timings:
|
||||
- scenes: 25 min (cached on later runs)
|
||||
| | test (3 videos) | first batch (13 videos, 50–62) | rest (45 videos, 02–49 minus test) | **total** |
|
||||
|---|---:|---:|---:|---:|
|
||||
| input duration | 0.6 h | 6.18 h | 12.98 h | **19.76 h** |
|
||||
| sampled frames @ 2 fps | 4,472 | 44,635 | 94,030 | 143,137 |
|
||||
| tracks | 187 | 2,564 | 3,823 | 6,574 |
|
||||
| accepted tracks | 94 (50 %) | 1,193 (47 %) | 1,905 (50 %) | 3,192 (49 %) |
|
||||
| **emitted segments** | **83** | **600** | **1,301** | **1,984** |
|
||||
| cross-track-merged segments | 14 | 254 | 382 | 650 |
|
||||
| accepted content | 13 min | 239 min | 395 min | **647 min (= 10.78 h)** |
|
||||
| acceptance rate by time | 36 % | 64.6 % | 50.7 % | **54.6 %** |
|
||||
| output size | 0.135 GB | 3.63 GB | 4.84 GB | **8.6 GB** |
|
||||
|
||||
Phase timings (rest batch — best representative since it ran fully under JSONL append-only from a fresh start):
|
||||
- scenes: 117 min (PySceneDetect, 45 × ~3 min/video)
|
||||
- stage: instant
|
||||
- worker: 78 min @ ~7.5 fps cumulative
|
||||
- merge: 73 s
|
||||
- track: 77 s
|
||||
- score: 21 s
|
||||
- cut (600 ffmpeg stream-copies): 19 min
|
||||
- report (600 thumbs + HTML): 3 min
|
||||
- **total wall-clock: 1h43m**
|
||||
- worker: 100 min @ **15.78 fps** sustained (vs 7.5 fps for first batch which migrated mid-run)
|
||||
- merge: 90 s
|
||||
- track: 92 s
|
||||
- score: 23 s
|
||||
- cut (1,301 ffmpeg stream-copies): 30 min
|
||||
- report (1,301 thumbs + HTML): 5.5 min
|
||||
- **total wall-clock: 4h16m**
|
||||
|
||||
Across all three runs, **0 worker errors on 143,137 sampled frames**.
|
||||
|
||||
## 7. Re-running
|
||||
|
||||
@@ -119,12 +123,20 @@ WORK=/opt/face-sets/work/video_preprocess_<batch_name> \
|
||||
bash work/status_video_pipeline.sh work/logs/video_run_<batch_name>.log
|
||||
```
|
||||
|
||||
Skip patterns can exclude already-processed inputs:
|
||||
Skip patterns can exclude already-processed inputs (note that 5-digit numbers need full padding in the regex, e.g. `0005[0-9]` not `005[0-9]`):
|
||||
|
||||
```bash
|
||||
SKIP_PATTERN='^ct_src_(0001[015]|005[0-9]|006[0-9])\.mp4$' \
|
||||
SKIP_PATTERN='^ct_src_(0001[015]|0005[0-9]|0006[0-2])\.mp4$' \
|
||||
WORK=/opt/face-sets/work/video_preprocess_rest \
|
||||
bash work/run_video_pipeline.sh > work/logs/video_run_rest.log 2>&1 &
|
||||
```
|
||||
|
||||
To also emit per-clip provenance sidecars (off by default):
|
||||
|
||||
```bash
|
||||
SIDECAR=yes \
|
||||
WORK=/opt/face-sets/work/video_preprocess_<batch> \
|
||||
bash work/run_video_pipeline.sh > work/logs/video_run_<batch>.log 2>&1 &
|
||||
```
|
||||
|
||||
`scenes` outputs are cached in the batch's `WORK/scenes/` dir, so re-running the chain after an edit-to-score step doesn't redo detection. The worker is also resumable per `queue_id` — if killed mid-flight, just relaunch.
|
||||
|
||||
Reference in New Issue
Block a user