diff --git a/README.md b/README.md index 7fe0ea0..ff2c943 100644 --- a/README.md +++ b/README.md @@ -266,6 +266,19 @@ that didn't get marked seen), **7,834 staged** (30% face-bearing-with-big-face, denser than peter's 19%), 519 byte-deduped vs `nl_full.npz`, **0 internal byte-duplicates** (cleaner library than peter's 2,976), 54 transient errors. +Embed + cluster on the nic queue: + +| step | result | +|------|------| +| Windows DML embed | 15,627 face records + 1 noface in **59 min** (2.2 img/s end-to-end), 7 load errors | +| matched existing identities | **6,770 of 15,627 (43%)** at cos-dist ≤ 0.45; biggest hits faceset_002 (+3,261), faceset_008 (+1,461), faceset_001 (+955), faceset_007 (+408) | +| new clusters | 3,787 at threshold 0.55 → 129 surviving refine gates → **95 emitted** as `faceset_265..NNN` (gaps where export-swap's 0.45 outlier dropped clusters below the export bar) | + +Top-level `facesets_swap_ready/manifest.json` after both Immich runs: +**311 substantive facesets** (12 auto-cluster nl/lzbkp + 7 hand-sorted + +6 era splits + 6 osrc-discovered + 185 peter-Immich + 95 nic-Immich) + +68 thin_eras under `_thin/`. + `work/immich_stage.py` carries a built-in **outage circuit breaker**: after 12 consecutive HTTP errors it probes Immich; if that probe also fails, the script exits cleanly with code 2, state preserved. This made diff --git a/docs/analysis/immich-import-pipeline.md b/docs/analysis/immich-import-pipeline.md index fee6a89..e720e63 100644 --- a/docs/analysis/immich-import-pipeline.md +++ b/docs/analysis/immich-import-pipeline.md @@ -144,6 +144,69 @@ emitted as new facesets: 185 (54 dropped by export-swap's 0.45 outlier) Top-level `facesets_swap_ready/manifest.json` after this run: **216 facesets** (up from 31; ~7× growth) + 68 thin_eras under `_thin/`. +## 4d. Result of the 2026-04-26..27 run (nic, with per-user API key) + +After issuing nic a per-user API key, the same pipeline ran end-to-end +with no code changes (only the `IMMICH_API_KEY` env var changed). The +run survived one Immich outage mid-stage thanks to the circuit breaker +added in `work/immich_stage.py` (12 consecutive HTTP errors → probe → +exit 2 with state preserved → resume on same command). + +### Stage + +``` +total_assets_seen: 25777 (matches /server/statistics 25,786) +staged_count: 7834 (30% face-bearing-with-big-face; + peter was 19%) +deduped_against_existing: 519 (sha256 in nl_full.npz already) +deduped_against_staged: 0 (nic's library has zero internal + byte-dupes; peter had 2,976) +skipped_no_big_face: 725 +skipped_no_faces: 16695 +skipped_download_error: 54 (transient; not marked seen -> + would be retried on resume) +elapsed: ~75 min wall (across two pause/resume sessions + bracketing one Immich outage) +``` + +### Embed (Windows DML) + +``` +queue: 7834 entries +new face records: 15627 +new noface records: 1 +load errors: 7 +elapsed: 3538.9s (59 min, 2.2 img/s end-to-end) +``` + +### Cluster + +``` +existing canonical centroids: 25 +faces already covered (cos-dist <= 0.45): 6770/15627 (43%) + faceset_002: 3261 (the dominant family identity) + faceset_008: 1461 (cross-match to hand-sorted 'sab') + faceset_001: 955 + faceset_007: 408 (cross-match to hand-sorted 's') + faceset_006: 114 + ... +unmatched: 8857 +clusters at threshold 0.55: 3787 (top sizes [165, 134, 106, 99, 92, + 67, 62, 61, 58, 53]) +survived refine gates: 129 +emitted as new facesets: 95 (faceset_265..NNN with gaps) +``` + +Top-level `facesets_swap_ready/manifest.json` after the nic run: **311 +substantive facesets** + 68 thin_eras. Two-day cumulative growth: + +| date | event | facesets total | +|------|------|------:| +| 2026-04-25 | hand-sorted folder import | 19 | +| 2026-04-26 morning | osrc + age split + cleanup | 31 | +| 2026-04-26 afternoon | Immich peter run | 216 | +| 2026-04-27 (overnight) | Immich nic run | 311 | + ## 5. Surprises and caveats ### 5a. `/search/metadata`'s `userIds` filter is silently ignored (Immich v2.7.2)