README.md now covers all six subcommands (embed, cluster, refine, dedup,
extend, enrich, export-swap), an end-to-end pipeline recipe, the delta
recipe for merging a new source into an existing result, the quality-
weight formula used by export-swap, and the GFPGAN blend recommendation
at swap time (0.85, overriding roop-unleashed's 0.65 default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- embed: sha256-based dedup at listing (embed each unique hash once, carry
other paths as aliases via a top-level path_aliases dict); resumable from
any existing cache; atomic incremental flush every 50 files; explicit
skip-ext filtering; schema bumped with processed_paths + path_aliases.
- extend: new subcommand that merges new embeddings into an existing raw +
facesets output without renumbering. Nearest person-centroid match above
threshold, unmatched faces re-clustered into new person_NNN / _singletons.
Optional --refine-out also extends facesets by centroid + quality gate.
- dedup: new subcommand producing byte-identical + visual near-duplicate
groups as a JSON report.
- cluster/refine: fan every placement across canonical + aliases so each
on-disk location gets represented.
- safe_dst_name now always flattens the absolute path so filenames stay
stable across runs when src_root shifts (fixes duplicate-copy bug that
surfaced during the lzbkp_red extend).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-file CLI (embed / cluster / refine) using InsightFace buffalo_l
embeddings and agglomerative clustering, migrated in from the ad-hoc
/home/peter/face_sort/ directory so this repo is the canonical home for
faceset preparation feeding roop-unleashed and similar tools.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>