Overnight 2026-04-27 nic finalize completed. Per-user API key worked as
expected. The pipeline survived one mid-stage Immich outage via the
circuit breaker added in 62dba3d -- script paused, operator confirmed
connectivity, same command resumed from saved state.json.
Embed (Windows DML): 7,834 images -> 15,627 face records + 1 noface in
59 minutes (2.2 img/s end-to-end).
Cluster: 6,770 of 15,627 faces (43%) matched existing canonical
identities at cos-dist <= 0.45; biggest hits faceset_002 (+3,261),
faceset_008 (+1,461), faceset_001 (+955), faceset_007 (+408). The
faceset_008 and faceset_007 hits are noteworthy cross-matches: those
are hand-sorted "sab" and "s" identities, recurring frequently in nic's
library.
Of the 8,857 unmatched faces, 3,787 raw clusters at threshold 0.55,
129 surviving refine gates, 95 emitted as new facesets at faceset_265+.
Top-level facesets_swap_ready/manifest.json: 216 -> 311 substantive
facesets + 68 thin_eras unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
work/immich_stage.py:
- Startup probe of /server/version (exit 2 if unreachable).
- Outage circuit breaker: after OUTAGE_FAIL_STREAK=12 consecutive
faces_error/download_error results, run a quick probe; if the probe
also fails, persist state and exit with code 2 so a long unattended
run can pause rather than silently churning through tens of thousands
of retries during an upstream outage. Resume by re-running the same
command -- state.json + queue.json are intact.
README:
- Document the nic run (per-user API key necessary; second pipeline
invocation confirmed expected behavior; cleaner library than peter's
with 0 internal byte-dupes vs 2,976).
- Mention the circuit breaker as the mechanism that keeps long
unattended runs safe under the known Tailscale flicker pattern at
this site.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three-piece workflow that imports a self-hosted Immich library and emits
new facesets without disturbing existing identity numbering:
- work/immich_stage.py (WSL): pages /search/metadata, parallel-fetches
/faces?id= per asset, prefilters by face_short>=90 against bbox scaled
to original-image coords, downloads originals, sha256-dedups against
nl_full.npz and same-run staged files. 8-worker ThreadPoolExecutor
doing the full /faces->filter->/original chain per asset; resumable
via state.json. API URL + key come from IMMICH_URL / IMMICH_API_KEY
env vars, label->UUID map from work/immich/users.json (gitignored).
- work/embed_worker.py (Windows venv at C:\face_embed_venv): runs
insightface.FaceAnalysis(buffalo_l) with the DmlExecutionProvider on
AMD Radeon Vega via onnxruntime-directml. Produces a cache file in
the same .npz schema as sort_faces.cmd_embed (loadable via
load_cache). ~7.5x speedup over CPU end-to-end; embeddings bit-
identical to CPU (cosine similarity 1.0000 across 8 sample faces).
- work/cluster_immich.py (WSL): mirrors cluster_osrc.py against an
immich_<user>.npz. Builds existing identity centroids from canonical
faceset_NNN/ in facesets_swap_ready/, drops matches at <=0.45,
clusters the rest at 0.55, applies refine gates, hands off to
cmd_export_swap. Numbers new facesets past the existing maximum.
- work/finalize_immich.sh: chains queue->Windows embed->cache copy->
cluster_immich, with logging.
The 2026-04-26 run on https://fotos.computerliebe.org (Immich v2.7.2)
processed 53,842 admin-accessible assets, staged 10,261, embedded
19,462 face records on Vega DML in 64.6 min, matched 8,103 (42%) to
existing identities, and emitted 185 new facesets (faceset_026..264
with gaps). facesets_swap_ready/ went from 31 to 216 substantive
facesets.
Important caveat surfaced: /search/metadata's userIds filter is
silently ignored when the API key is bound to a different user, so
this run can't enumerate other users' libraries from the admin key.
A per-user API key would be required for nic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
work/cluster_osrc.py mirrors build_folders.py's shape (synthesize a
refine_manifest, hand off to cmd_export_swap, relocate, merge top-level
manifest) but discovers identities by clustering rather than asserting
them by folder. Drops faces already covered by existing identity
centroids, clusters the rest at 0.55, applies refine-equivalent gates
with min_faces=6, numbers new facesets past the existing maximum so
faceset_001..NNN are never disturbed.
The 2026-04-26 run on /mnt/x/src/osrc produced faceset_020..025 (sizes
4-26 exported PNGs); analysis writeup in docs/analysis/.
README also notes the refine-renumbers caveat in passing — extend +
orchestration script is the safe pattern; cmd_refine is for fresh
clusters only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The README documents work/build_folders.py as the orchestration script
for hand-sorted-folder identity import, but it was excluded by the
work/ gitignore. Force-track it for parity with the other orchestration
scripts (age_split_001.py, check_faceset001_age.py) so the documented
workflow points at code that exists in the repo.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the 2026-04-26 split of faceset_001 (707 curated faces) into
6 substantive era buckets + 68 thin fragments, including the readiness
probe evidence, the anchor-based assignment rationale (replaces
transitive union-find that caused year-drift), and the re-run / apply-
to-other-identity workflow.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README: document work/build_folders.py (hand-sorted folder identities)
and the new age-split workflow for splitting a long-running identity
into era-specific facesets after clustering.
- Force-track work/age_split_001.py and work/check_faceset001_age.py;
these are the worked example + readiness probe for faceset_001 and
the template for splitting any other identity by EXIF era.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README.md now covers all six subcommands (embed, cluster, refine, dedup,
extend, enrich, export-swap), an end-to-end pipeline recipe, the delta
recipe for merging a new source into an existing result, the quality-
weight formula used by export-swap, and the GFPGAN blend recommendation
at swap time (0.85, overriding roop-unleashed's 0.65 default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- enrich: re-detects each cached face with buffalo_l (detection +
landmark_2d_106 + landmark_3d_68, recognition module skipped for speed)
and persists landmarks + pose into the cache so per-face frontality and
landmark-symmetry quality signals become available.
- compute_quality: composite score combining det_score, face short-edge,
blur, frontality (from pose pitch/yaw), and 2D-landmark symmetry with
tunable weights. Default weighting 0.30/0.20/0.20/0.15/0.15.
- export-swap: builds facesets_swap_ready/ from an existing refine
manifest. Per identity: tighter outlier gate (default 0.45), visual-
near-dupe collapse (keep best representative per group), multi-face-
per-source-image collapse (keep best bbox), rank by composite score,
single-face-per-PNG crops at 512x512 with 0.5 bbox padding, ready-to-
drop .fsz bundles (top-N + full), per-faceset manifest.json, NAME.txt
placeholder for the operator. The multi-face-per-PNG collapse is the
critical fix: roop-unleashed's .fsz loader appends every detected face
in each PNG to the FaceSet, so any multi-face crop would contaminate
the averaged embedding.
- Optional --candidates rescues raw_full singletons: matches against the
final per-faceset centroids and routes to _candidates/to_<faceset>/
for manual review; orphaned singletons that still cluster among
themselves land in _candidates/new_<NNN>/.
- docs/analysis/: evaluation document captures the evidence, downstream
requirements (FaceSet averaging, inswapper_128), opportunity matrix
(R1-R14), and the recommended target state this export implements.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- embed: sha256-based dedup at listing (embed each unique hash once, carry
other paths as aliases via a top-level path_aliases dict); resumable from
any existing cache; atomic incremental flush every 50 files; explicit
skip-ext filtering; schema bumped with processed_paths + path_aliases.
- extend: new subcommand that merges new embeddings into an existing raw +
facesets output without renumbering. Nearest person-centroid match above
threshold, unmatched faces re-clustered into new person_NNN / _singletons.
Optional --refine-out also extends facesets by centroid + quality gate.
- dedup: new subcommand producing byte-identical + visual near-duplicate
groups as a JSON report.
- cluster/refine: fan every placement across canonical + aliases so each
on-disk location gets represented.
- safe_dst_name now always flattens the absolute path so filenames stay
stable across runs when src_root shifts (fixes duplicate-copy bug that
surfaced during the lzbkp_red extend).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-file CLI (embed / cluster / refine) using InsightFace buffalo_l
embeddings and agglomerative clustering, migrated in from the ad-hoc
/home/peter/face_sort/ directory so this repo is the canonical home for
faceset preparation feeding roop-unleashed and similar tools.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>