face-sets

Author	SHA1	Message	Date
Peter	321fed01cc	Add Immich import pipeline (WSL stage + Windows DML embed + cluster) Three-piece workflow that imports a self-hosted Immich library and emits new facesets without disturbing existing identity numbering: - work/immich_stage.py (WSL): pages /search/metadata, parallel-fetches /faces?id= per asset, prefilters by face_short>=90 against bbox scaled to original-image coords, downloads originals, sha256-dedups against nl_full.npz and same-run staged files. 8-worker ThreadPoolExecutor doing the full /faces->filter->/original chain per asset; resumable via state.json. API URL + key come from IMMICH_URL / IMMICH_API_KEY env vars, label->UUID map from work/immich/users.json (gitignored). - work/embed_worker.py (Windows venv at C:\face_embed_venv): runs insightface.FaceAnalysis(buffalo_l) with the DmlExecutionProvider on AMD Radeon Vega via onnxruntime-directml. Produces a cache file in the same .npz schema as sort_faces.cmd_embed (loadable via load_cache). ~7.5x speedup over CPU end-to-end; embeddings bit- identical to CPU (cosine similarity 1.0000 across 8 sample faces). - work/cluster_immich.py (WSL): mirrors cluster_osrc.py against an immich_<user>.npz. Builds existing identity centroids from canonical faceset_NNN/ in facesets_swap_ready/, drops matches at <=0.45, clusters the rest at 0.55, applies refine gates, hands off to cmd_export_swap. Numbers new facesets past the existing maximum. - work/finalize_immich.sh: chains queue->Windows embed->cache copy-> cluster_immich, with logging. The 2026-04-26 run on https://fotos.computerliebe.org (Immich v2.7.2) processed 53,842 admin-accessible assets, staged 10,261, embedded 19,462 face records on Vega DML in 64.6 min, matched 8,103 (42%) to existing identities, and emitted 185 new facesets (faceset_026..264 with gaps). facesets_swap_ready/ went from 31 to 216 substantive facesets. Important caveat surfaced: /search/metadata's userIds filter is silently ignored when the API key is bound to a different user, so this run can't enumerate other users' libraries from the admin key. A per-user API key would be required for nic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 18:14:26 +02:00
Peter	7ecbfae981	Add osrc identity-discovery pipeline + run analysis work/cluster_osrc.py mirrors build_folders.py's shape (synthesize a refine_manifest, hand off to cmd_export_swap, relocate, merge top-level manifest) but discovers identities by clustering rather than asserting them by folder. Drops faces already covered by existing identity centroids, clusters the rest at 0.55, applies refine-equivalent gates with min_faces=6, numbers new facesets past the existing maximum so faceset_001..NNN are never disturbed. The 2026-04-26 run on /mnt/x/src/osrc produced faceset_020..025 (sizes 4-26 exported PNGs); analysis writeup in docs/analysis/. README also notes the refine-renumbers caveat in passing — extend + orchestration script is the safe pattern; cmd_refine is for fresh clusters only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:40:19 +02:00
Peter	03a0c75531	Document hand-sorted-folder import + age-split workflow - README: document work/build_folders.py (hand-sorted folder identities) and the new age-split workflow for splitting a long-running identity into era-specific facesets after clustering. - Force-track work/age_split_001.py and work/check_faceset001_age.py; these are the worked example + readiness probe for faceset_001 and the template for splitting any other identity by EXIF era. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 12:08:25 +02:00
Peter	4d7a8780de	Document enrich + export-swap + extend; add swap-ready usage guide README.md now covers all six subcommands (embed, cluster, refine, dedup, extend, enrich, export-swap), an end-to-end pipeline recipe, the delta recipe for merging a new source into an existing result, the quality- weight formula used by export-swap, and the GFPGAN blend recommendation at swap time (0.85, overriding roop-unleashed's 0.65 default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 00:09:01 +02:00
Peter	484278e70e	Rewrite pipeline: resumable embed, byte-dedup, extend, dedup report - embed: sha256-based dedup at listing (embed each unique hash once, carry other paths as aliases via a top-level path_aliases dict); resumable from any existing cache; atomic incremental flush every 50 files; explicit skip-ext filtering; schema bumped with processed_paths + path_aliases. - extend: new subcommand that merges new embeddings into an existing raw + facesets output without renumbering. Nearest person-centroid match above threshold, unmatched faces re-clustered into new person_NNN / _singletons. Optional --refine-out also extends facesets by centroid + quality gate. - dedup: new subcommand producing byte-identical + visual near-duplicate groups as a JSON report. - cluster/refine: fan every placement across canonical + aliases so each on-disk location gets represented. - safe_dst_name now always flattens the absolute path so filenames stay stable across runs when src_root shifts (fixes duplicate-copy bug that surfaced during the lzbkp_red extend). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 19:21:50 +02:00
Peter	c5a4e2dfdb	Add face-sort pipeline as the repo's base Single-file CLI (embed / cluster / refine) using InsightFace buffalo_l embeddings and agglomerative clustering, migrated in from the ad-hoc /home/peter/face_sort/ directory so this repo is the canonical home for faceset preparation feeding roop-unleashed and similar tools. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 11:20:00 +02:00
Procuria	01ae516b54	Add README.md	2026-04-23 09:08:59 +00:00

7 Commits