diff --git a/README.md b/README.md index 2e0cd7c..f1fe9f7 100644 --- a/README.md +++ b/README.md @@ -67,6 +67,92 @@ python sort_faces.py export-swap "$CACHE" \ --raw-manifest "$OUT/raw_full/manifest.json" --candidates ``` +### Importing hand-sorted folders as identities + +When source folders are already hand-sorted by person (one folder per identity), the +clustering path is the wrong tool — the identity is asserted, not inferred. The +orchestration script `work/build_folders.py` covers this case: + +- For each trusted folder, it filters cache records that fall under it, builds an + identity centroid via two-pass outlier rejection (cos-dist 0.55 → 0.45) so + bystanders in group photos drop out, and writes a synthetic `refine_manifest.json`. +- It then routes each face record from a *mixed* folder (e.g. `osrc/`) into every + identity centroid within a tight cosine cutoff (default 0.45). A multi-identity + photo lands in multiple facesets; `export-swap`'s per-bbox outlier filter ensures + each faceset crops only its matching face. +- Finally it invokes `cmd_export_swap` against the synthetic manifest, renames the + emitted `.fsz` bundles after the source folder, drops a `