c5a4e2dfdb65c793a84d224e107366b24f3bc81b
Single-file CLI (embed / cluster / refine) using InsightFace buffalo_l embeddings and agglomerative clustering, migrated in from the ad-hoc /home/peter/face_sort/ directory so this repo is the canonical home for faceset preparation feeding roop-unleashed and similar tools. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
face-sets
Sort photos by similar face using InsightFace embeddings + agglomerative clustering, then refine into faceset-ready folders for downstream face-swap tooling (roop-unleashed, etc.).
Pipeline
sort_faces.py is a single-file CLI with three subcommands:
| step | what it does |
|---|---|
| embed | Recursively scan a source tree, detect + embed every face, write .npz cache |
| cluster | Raw agglomerative clustering of the cache into person_NNN/ / _singletons/ / _noface/ |
| refine | Initial cluster → centroid merge → quality gate → outlier rejection → size filter → faceset_NNN/ |
Cache and outputs are kept out of the repo via .gitignore; defaults live under work/.
Typical run
# 1. Embed (CPU; InsightFace buffalo_l). Caches faces + metadata.
python sort_faces.py embed "/mnt/x/src/nl/Neuer Ordner (2)/New Folder" work/cache/nl_all.npz
# 2. Raw clusters (every multi-face cluster -> a person_NNN/ folder).
python sort_faces.py cluster work/cache/nl_all.npz /mnt/e/temp_things/fcswp/nl_sorted/raw
# 3. Refined facesets (filters for faceset-ready quality).
python sort_faces.py refine work/cache/nl_all.npz /mnt/e/temp_things/fcswp/nl_sorted/facesets
Refine defaults
| flag | default | meaning |
|---|---|---|
--initial-threshold |
0.55 | cosine distance for stage-1 clustering |
--merge-threshold |
0.40 | centroid-level merge of over-split clusters |
--outlier-threshold |
0.55 | drop face if cosine dist from cluster centroid exceeds this (only if cluster ≥ 4) |
--min-faces |
15 | minimum unique images per faceset |
--min-short |
90 | minimum short-edge pixels of face bbox |
--min-blur |
40.0 | Laplacian-variance blur gate |
--min-det-score |
0.6 | InsightFace detector score gate |
--mode |
copy | copy / move / symlink |
Prior runs (as of 2026-04-22)
work/cache/kos11.npz— 181 images, 333 faces fromKos '11/→kos11_sorted/work/cache/nl_all.npz— 916 images, 1396 faces fromNeuer Ordner (2)/New Folder/→nl_sorted/raw/, refined to 6 facesets (197, 120, 91, 47, 23, 18 images)
Output lives outside the repo at /mnt/e/temp_things/fcswp/.
Description
Languages
Python
97.5%
Shell
2.5%