Peter c5a4e2dfdb Add face-sort pipeline as the repo's base
Single-file CLI (embed / cluster / refine) using InsightFace buffalo_l
embeddings and agglomerative clustering, migrated in from the ad-hoc
/home/peter/face_sort/ directory so this repo is the canonical home for
faceset preparation feeding roop-unleashed and similar tools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:20:00 +02:00

face-sets

Sort photos by similar face using InsightFace embeddings + agglomerative clustering, then refine into faceset-ready folders for downstream face-swap tooling (roop-unleashed, etc.).

Pipeline

sort_faces.py is a single-file CLI with three subcommands:

step what it does
embed Recursively scan a source tree, detect + embed every face, write .npz cache
cluster Raw agglomerative clustering of the cache into person_NNN/ / _singletons/ / _noface/
refine Initial cluster → centroid merge → quality gate → outlier rejection → size filter → faceset_NNN/

Cache and outputs are kept out of the repo via .gitignore; defaults live under work/.

Typical run

# 1. Embed (CPU; InsightFace buffalo_l). Caches faces + metadata.
python sort_faces.py embed "/mnt/x/src/nl/Neuer Ordner (2)/New Folder" work/cache/nl_all.npz

# 2. Raw clusters (every multi-face cluster -> a person_NNN/ folder).
python sort_faces.py cluster work/cache/nl_all.npz /mnt/e/temp_things/fcswp/nl_sorted/raw

# 3. Refined facesets (filters for faceset-ready quality).
python sort_faces.py refine  work/cache/nl_all.npz /mnt/e/temp_things/fcswp/nl_sorted/facesets

Refine defaults

flag default meaning
--initial-threshold 0.55 cosine distance for stage-1 clustering
--merge-threshold 0.40 centroid-level merge of over-split clusters
--outlier-threshold 0.55 drop face if cosine dist from cluster centroid exceeds this (only if cluster ≥ 4)
--min-faces 15 minimum unique images per faceset
--min-short 90 minimum short-edge pixels of face bbox
--min-blur 40.0 Laplacian-variance blur gate
--min-det-score 0.6 InsightFace detector score gate
--mode copy copy / move / symlink

Prior runs (as of 2026-04-22)

  • work/cache/kos11.npz — 181 images, 333 faces from Kos '11/kos11_sorted/
  • work/cache/nl_all.npz — 916 images, 1396 faces from Neuer Ordner (2)/New Folder/nl_sorted/raw/, refined to 6 facesets (197, 120, 91, 47, 23, 18 images)

Output lives outside the repo at /mnt/e/temp_things/fcswp/.

Description
No description provided
Readme 313 KiB
Languages
Python 97.5%
Shell 2.5%