Document Immich nic run: 95 new facesets, manifest 216 -> 311

Overnight 2026-04-27 nic finalize completed. Per-user API key worked as
expected. The pipeline survived one mid-stage Immich outage via the
circuit breaker added in 62dba3d -- script paused, operator confirmed
connectivity, same command resumed from saved state.json.

Embed (Windows DML): 7,834 images -> 15,627 face records + 1 noface in
59 minutes (2.2 img/s end-to-end).

Cluster: 6,770 of 15,627 faces (43%) matched existing canonical
identities at cos-dist <= 0.45; biggest hits faceset_002 (+3,261),
faceset_008 (+1,461), faceset_001 (+955), faceset_007 (+408). The
faceset_008 and faceset_007 hits are noteworthy cross-matches: those
are hand-sorted "sab" and "s" identities, recurring frequently in nic's
library.

Of the 8,857 unmatched faces, 3,787 raw clusters at threshold 0.55,
129 surviving refine gates, 95 emitted as new facesets at faceset_265+.

Top-level facesets_swap_ready/manifest.json: 216 -> 311 substantive
facesets + 68 thin_eras unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-27 00:32:11 +02:00
parent 62dba3ddb3
commit e66c97fd58
2 changed files with 76 additions and 0 deletions

View File

@@ -266,6 +266,19 @@ that didn't get marked seen), **7,834 staged** (30% face-bearing-with-big-face,
denser than peter's 19%), 519 byte-deduped vs `nl_full.npz`, **0 internal
byte-duplicates** (cleaner library than peter's 2,976), 54 transient errors.
Embed + cluster on the nic queue:
| step | result |
|------|------|
| Windows DML embed | 15,627 face records + 1 noface in **59 min** (2.2 img/s end-to-end), 7 load errors |
| matched existing identities | **6,770 of 15,627 (43%)** at cos-dist ≤ 0.45; biggest hits faceset_002 (+3,261), faceset_008 (+1,461), faceset_001 (+955), faceset_007 (+408) |
| new clusters | 3,787 at threshold 0.55 → 129 surviving refine gates → **95 emitted** as `faceset_265..NNN` (gaps where export-swap's 0.45 outlier dropped clusters below the export bar) |
Top-level `facesets_swap_ready/manifest.json` after both Immich runs:
**311 substantive facesets** (12 auto-cluster nl/lzbkp + 7 hand-sorted +
6 era splits + 6 osrc-discovered + 185 peter-Immich + 95 nic-Immich) +
68 thin_eras under `_thin/`.
`work/immich_stage.py` carries a built-in **outage circuit breaker**:
after 12 consecutive HTTP errors it probes Immich; if that probe also
fails, the script exits cleanly with code 2, state preserved. This made

View File

@@ -144,6 +144,69 @@ emitted as new facesets: 185 (54 dropped by export-swap's 0.45 outlier)
Top-level `facesets_swap_ready/manifest.json` after this run: **216
facesets** (up from 31; ~7× growth) + 68 thin_eras under `_thin/`.
## 4d. Result of the 2026-04-26..27 run (nic, with per-user API key)
After issuing nic a per-user API key, the same pipeline ran end-to-end
with no code changes (only the `IMMICH_API_KEY` env var changed). The
run survived one Immich outage mid-stage thanks to the circuit breaker
added in `work/immich_stage.py` (12 consecutive HTTP errors → probe →
exit 2 with state preserved → resume on same command).
### Stage
```
total_assets_seen: 25777 (matches /server/statistics 25,786)
staged_count: 7834 (30% face-bearing-with-big-face;
peter was 19%)
deduped_against_existing: 519 (sha256 in nl_full.npz already)
deduped_against_staged: 0 (nic's library has zero internal
byte-dupes; peter had 2,976)
skipped_no_big_face: 725
skipped_no_faces: 16695
skipped_download_error: 54 (transient; not marked seen ->
would be retried on resume)
elapsed: ~75 min wall (across two pause/resume sessions
bracketing one Immich outage)
```
### Embed (Windows DML)
```
queue: 7834 entries
new face records: 15627
new noface records: 1
load errors: 7
elapsed: 3538.9s (59 min, 2.2 img/s end-to-end)
```
### Cluster
```
existing canonical centroids: 25
faces already covered (cos-dist <= 0.45): 6770/15627 (43%)
faceset_002: 3261 (the dominant family identity)
faceset_008: 1461 (cross-match to hand-sorted 'sab')
faceset_001: 955
faceset_007: 408 (cross-match to hand-sorted 's')
faceset_006: 114
...
unmatched: 8857
clusters at threshold 0.55: 3787 (top sizes [165, 134, 106, 99, 92,
67, 62, 61, 58, 53])
survived refine gates: 129
emitted as new facesets: 95 (faceset_265..NNN with gaps)
```
Top-level `facesets_swap_ready/manifest.json` after the nic run: **311
substantive facesets** + 68 thin_eras. Two-day cumulative growth:
| date | event | facesets total |
|------|------|------:|
| 2026-04-25 | hand-sorted folder import | 19 |
| 2026-04-26 morning | osrc + age split + cleanup | 31 |
| 2026-04-26 afternoon | Immich peter run | 216 |
| 2026-04-27 (overnight) | Immich nic run | 311 |
## 5. Surprises and caveats
### 5a. `/search/metadata`'s `userIds` filter is silently ignored (Immich v2.7.2)