Commit Graph

7 Commits

Author SHA1 Message Date
e48dd8aec7 Add age-split run analysis for faceset_001
Documents the 2026-04-26 split of faceset_001 (707 curated faces) into
6 substantive era buckets + 68 thin fragments, including the readiness
probe evidence, the anchor-based assignment rationale (replaces
transitive union-find that caused year-drift), and the re-run / apply-
to-other-identity workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:10:37 +02:00
03a0c75531 Document hand-sorted-folder import + age-split workflow
- README: document work/build_folders.py (hand-sorted folder identities)
  and the new age-split workflow for splitting a long-running identity
  into era-specific facesets after clustering.
- Force-track work/age_split_001.py and work/check_faceset001_age.py;
  these are the worked example + readiness probe for faceset_001 and
  the template for splitting any other identity by EXIF era.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 12:08:25 +02:00
4d7a8780de Document enrich + export-swap + extend; add swap-ready usage guide
README.md now covers all six subcommands (embed, cluster, refine, dedup,
extend, enrich, export-swap), an end-to-end pipeline recipe, the delta
recipe for merging a new source into an existing result, the quality-
weight formula used by export-swap, and the GFPGAN blend recommendation
at swap time (0.85, overriding roop-unleashed's 0.65 default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:09:01 +02:00
d53ab9fbfc Add enrich + export-swap pipeline for downstream face-swap ready output
- enrich: re-detects each cached face with buffalo_l (detection +
  landmark_2d_106 + landmark_3d_68, recognition module skipped for speed)
  and persists landmarks + pose into the cache so per-face frontality and
  landmark-symmetry quality signals become available.
- compute_quality: composite score combining det_score, face short-edge,
  blur, frontality (from pose pitch/yaw), and 2D-landmark symmetry with
  tunable weights. Default weighting 0.30/0.20/0.20/0.15/0.15.
- export-swap: builds facesets_swap_ready/ from an existing refine
  manifest. Per identity: tighter outlier gate (default 0.45), visual-
  near-dupe collapse (keep best representative per group), multi-face-
  per-source-image collapse (keep best bbox), rank by composite score,
  single-face-per-PNG crops at 512x512 with 0.5 bbox padding, ready-to-
  drop .fsz bundles (top-N + full), per-faceset manifest.json, NAME.txt
  placeholder for the operator. The multi-face-per-PNG collapse is the
  critical fix: roop-unleashed's .fsz loader appends every detected face
  in each PNG to the FaceSet, so any multi-face crop would contaminate
  the averaged embedding.
- Optional --candidates rescues raw_full singletons: matches against the
  final per-faceset centroids and routes to _candidates/to_<faceset>/
  for manual review; orphaned singletons that still cluster among
  themselves land in _candidates/new_<NNN>/.
- docs/analysis/: evaluation document captures the evidence, downstream
  requirements (FaceSet averaging, inswapper_128), opportunity matrix
  (R1-R14), and the recommended target state this export implements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 22:37:32 +02:00
484278e70e Rewrite pipeline: resumable embed, byte-dedup, extend, dedup report
- embed: sha256-based dedup at listing (embed each unique hash once, carry
  other paths as aliases via a top-level path_aliases dict); resumable from
  any existing cache; atomic incremental flush every 50 files; explicit
  skip-ext filtering; schema bumped with processed_paths + path_aliases.
- extend: new subcommand that merges new embeddings into an existing raw +
  facesets output without renumbering. Nearest person-centroid match above
  threshold, unmatched faces re-clustered into new person_NNN / _singletons.
  Optional --refine-out also extends facesets by centroid + quality gate.
- dedup: new subcommand producing byte-identical + visual near-duplicate
  groups as a JSON report.
- cluster/refine: fan every placement across canonical + aliases so each
  on-disk location gets represented.
- safe_dst_name now always flattens the absolute path so filenames stay
  stable across runs when src_root shifts (fixes duplicate-copy bug that
  surfaced during the lzbkp_red extend).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:21:50 +02:00
c5a4e2dfdb Add face-sort pipeline as the repo's base
Single-file CLI (embed / cluster / refine) using InsightFace buffalo_l
embeddings and agglomerative clustering, migrated in from the ad-hoc
/home/peter/face_sort/ directory so this repo is the canonical home for
faceset preparation feeding roop-unleashed and similar tools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:20:00 +02:00
01ae516b54 Add README.md 2026-04-23 09:08:59 +00:00