Add Immich outage circuit breaker; document nic run + Tailscale quirk

work/immich_stage.py:
- Startup probe of /server/version (exit 2 if unreachable).
- Outage circuit breaker: after OUTAGE_FAIL_STREAK=12 consecutive
  faces_error/download_error results, run a quick probe; if the probe
  also fails, persist state and exit with code 2 so a long unattended
  run can pause rather than silently churning through tens of thousands
  of retries during an upstream outage. Resume by re-running the same
  command -- state.json + queue.json are intact.

README:
- Document the nic run (per-user API key necessary; second pipeline
  invocation confirmed expected behavior; cleaner library than peter's
  with 0 internal byte-dupes vs 2,976).
- Mention the circuit breaker as the mechanism that keeps long
  unattended runs safe under the known Tailscale flicker pattern at
  this site.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-26 23:36:11 +02:00
parent 321fed01cc
commit 62dba3ddb3
2 changed files with 52 additions and 0 deletions

View File

@@ -259,6 +259,20 @@ v2.7.2), with the admin API key:
| matched existing identities | **8,103 of 19,480 (42%)** at cos-dist ≤ 0.45; biggest hits faceset_002 (+2,666), faceset_001 (+1,856), faceset_003 (+670) |
| new clusters | 2,534 at threshold 0.55 → 239 surviving refine gates → **185 emitted** as `faceset_026..264` (gaps where export-swap's tighter outlier filter dropped clusters below the export quality bar) |
A second 2026-04-26 run with **nic's per-user API key** confirmed the
expected behavior: 25,777 of nic's IMAGE assets were enumerated (matching
her `/server/statistics` count of 25,786, off by 9 ≈ the transient errors
that didn't get marked seen), **7,834 staged** (30% face-bearing-with-big-face,
denser than peter's 19%), 519 byte-deduped vs `nl_full.npz`, **0 internal
byte-duplicates** (cleaner library than peter's 2,976), 54 transient errors.
`work/immich_stage.py` carries a built-in **outage circuit breaker**:
after 12 consecutive HTTP errors it probes Immich; if that probe also
fails, the script exits cleanly with code 2, state preserved. This made
the nic run survive a mid-stage Immich outage — the script paused, the
operator confirmed connectivity was back, and the same command resumed
from the saved `state.json` without re-fetching what was already done.
**Important caveats for Immich v2.7.2**:
- The `userIds` filter on `/search/metadata` is **silently ignored** when
the API key is bound to a different user. The "import everything the