Add enrich + export-swap pipeline for downstream face-swap ready output
- enrich: re-detects each cached face with buffalo_l (detection + landmark_2d_106 + landmark_3d_68, recognition module skipped for speed) and persists landmarks + pose into the cache so per-face frontality and landmark-symmetry quality signals become available. - compute_quality: composite score combining det_score, face short-edge, blur, frontality (from pose pitch/yaw), and 2D-landmark symmetry with tunable weights. Default weighting 0.30/0.20/0.20/0.15/0.15. - export-swap: builds facesets_swap_ready/ from an existing refine manifest. Per identity: tighter outlier gate (default 0.45), visual- near-dupe collapse (keep best representative per group), multi-face- per-source-image collapse (keep best bbox), rank by composite score, single-face-per-PNG crops at 512x512 with 0.5 bbox padding, ready-to- drop .fsz bundles (top-N + full), per-faceset manifest.json, NAME.txt placeholder for the operator. The multi-face-per-PNG collapse is the critical fix: roop-unleashed's .fsz loader appends every detected face in each PNG to the FaceSet, so any multi-face crop would contaminate the averaged embedding. - Optional --candidates rescues raw_full singletons: matches against the final per-faceset centroids and routes to _candidates/to_<faceset>/ for manual review; orphaned singletons that still cluster among themselves land in _candidates/new_<NNN>/. - docs/analysis/: evaluation document captures the evidence, downstream requirements (FaceSet averaging, inswapper_128), opportunity matrix (R1-R14), and the recommended target state this export implements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
233
docs/analysis/facesets-downstream-refinement-evaluation.md
Normal file
233
docs/analysis/facesets-downstream-refinement-evaluation.md
Normal file
@@ -0,0 +1,233 @@
|
||||
# Facesets → roop-unleashed: downstream refinement evaluation
|
||||
|
||||
_Analysis date: 2026-04-23. Author: Peter (with Claude Code)._
|
||||
|
||||
## 1. Scope
|
||||
|
||||
**Objective.** Evaluate how the existing face-clustering / person-sorted results in `facesets_full/` can be refined so that the downstream project `roop-unleashed` produces the best practical face-swap results.
|
||||
|
||||
**Repositories / folders inspected**
|
||||
- `/opt/face-sets/` — the upstream project (this repo); code, `README.md`, `sort_faces.py`, `refine_manifest.json`, `duplicates.json`.
|
||||
- `/mnt/e/temp_things/fcswp/nl_sorted/facesets_full/` and `.../raw_full/` — current output.
|
||||
- `/opt/face-sets/work/cache/nl_full.npz` — the underlying embedding cache used to produce the output.
|
||||
- `/opt/roop-unleashed/` — the downstream consumer.
|
||||
- InsightFace 0.7.3 Face class (`/home/peter/face_sort_env/lib/...insightface/app/common.py`) to resolve an ambiguity about embedding averaging.
|
||||
|
||||
**Agent usage.** Subagents (Explore, Trend Researcher) were attempted but rejected by the operator. All investigation was done directly via Read, Grep, Bash, WebFetch, WebSearch. `~/.claude/agents/` was enumerated; no face-swap-specific agent exists.
|
||||
|
||||
**Web research used.** Targeted WebSearch + WebFetch against FaceSwapLab FAQ, FaceFusion docs, and the GitHub roop-unleashed discussion page for faceset creation. The original `C0untFloyd/roop-unleashed` GitHub repo has been disabled by GitHub Staff for ToS violation, so the code in `/opt/roop-unleashed/` is the authoritative source for this analysis.
|
||||
|
||||
## 2. Evidence base
|
||||
|
||||
### 2.1 Files read in `facesets` / output
|
||||
- `sort_faces.py` (full) — current pipeline, esp. `cmd_embed` (embed + sha256 dedup + resume), `cmd_cluster`, `cmd_refine` (centroid-merge + quality gate + outlier rejection), `cmd_extend` (centroid-preserving merge), `cmd_dedup` (byte + visual).
|
||||
- `refine_manifest.json` at `facesets_full/` — post-extend state; `extended: true`; 12 facesets, params `{initial_threshold: 0.55, merge_threshold: 0.40, outlier_threshold: 0.55, min_faces: 15, min_short: 90, min_blur: 40.0, min_det_score: 0.6}`.
|
||||
- `nl_full.npz` — 4756 face embeddings + 133 noface records across 2667 unique files; 113 byte-dupe alias paths; 103 byte-groups + 115 visual-dupe groups in `nl_full.duplicates.json`.
|
||||
|
||||
### 2.2 Files read in `roop-unleashed`
|
||||
- `roop/FaceSet.py` — the downstream identity container; `AverageEmbeddings()` at lines 15–20.
|
||||
- `roop/face_util.py` — `get_face_analyser()` builds InsightFace `buffalo_l` (lines 35–50); `extract_face_images()` at lines 72–144 implements the .fsz unpack + detect path.
|
||||
- `roop/processors/FaceSwapInsightFace.py` — the actual inswapper swap; `Run()` at lines 42–52 uses `source_face.normed_embedding`.
|
||||
- `roop/core.py:178–179` — identifies the swap model as `inswapper_128.onnx` (HuggingFace `countfloyd/deepfake` + Codeberg mirror).
|
||||
- `roop/ProcessMgr.py:626–634` — `process_face` confirms only `face_datas[face_index].faces[0]` is used per identity.
|
||||
- `ui/tabs/facemgr_tab.py` (full) — how .fsz is created by users (cv2.imwrite PNGs → zip).
|
||||
- `ui/tabs/faceswap_tab.py:651–710` — how .fsz / image source is loaded into `INPUT_FACESETS`; `AverageEmbeddings()` is called iff `len(faces) > 1` at line 690.
|
||||
- Insightface `common.py:Face` — `normed_embedding` is a `@property`, so it does re-derive from `self.embedding`; averaging therefore does propagate to the swap (resolves an ambiguity).
|
||||
|
||||
### 2.3 External sources
|
||||
- [FaceSwapLab FAQ](https://glucauze.github.io/sd-webui-faceswaplab/faq/) — practitioner-level guidance on multi-image reference and the checkpoint builder.
|
||||
- [FaceFusion face-swapper docs](https://docs.facefusion.io/usage/cli-arguments/processors/face-swapper) — model list including `inswapper_128_fp16`, `hyperswap_1a_256`, etc.
|
||||
- [InsightFace blog: evolution of face swapping](https://www.insightface.ai/blog/the-evolution-of-neural-network-face-swapping-from-deepfakes-to-one-shot-innovation-with-insightface) — inswapper internal face resolution is 128×128 RGB regardless of input.
|
||||
- [DeepWiki: inswapper_128](https://deepwiki.com/deepinsight/inswapper-512-live/5.1-first-generation:-inswapper_128) — confirms encoder-decoder structure, identity taken from embedding, target appearance preserved.
|
||||
- [SDD-FIQA CVPR 2021](https://openaccess.thecvf.com/content/CVPR2021/papers/Ou_SDD-FIQA_Unsupervised_Face_Image_Quality_Assessment_With_Similarity_Distribution_Distance_CVPR_2021_paper.pdf) — unsupervised face quality metric; a modern alternative to `det_score + blur`.
|
||||
|
||||
## 3. Current upstream output assessment
|
||||
|
||||
### 3.1 Structure of `facesets_full/`
|
||||
- 12 faceset folders (`faceset_001` … `faceset_012`) selected by the refine step (`min_faces=15`).
|
||||
- Each folder contains the full original images (jpg / jpeg / png) that contributed a face to that cluster, filename-flattened from the absolute path so each file is traceable to its on-disk source.
|
||||
- One `refine_manifest.json` at the root with per-faceset `{face_count, image_count, alias_count, images[]}`.
|
||||
- `facesets_full/extended=true` (merged after the lzbkp_red run via `cmd_extend`).
|
||||
|
||||
Counts (manifest):
|
||||
|
||||
| faceset | images | face records | aliases |
|
||||
|--------------|-------:|-------------:|--------:|
|
||||
| faceset_001 | 771 | 1505 | 55 |
|
||||
| faceset_002 | 238 | 543 | 6 |
|
||||
| faceset_003 | 206 | 402 | 2 |
|
||||
| faceset_004 | 103 | 273 | 2 |
|
||||
| faceset_005 | 68 | 218 | 2 |
|
||||
| faceset_006 | 51 | 153 | 1 |
|
||||
| faceset_007 | 89 | 158 | 0 |
|
||||
| faceset_008 | 44 | 131 | 1 |
|
||||
| faceset_009 | 43 | 129 | 0 |
|
||||
| faceset_010 | 25 | 73 | 0 |
|
||||
| faceset_011 | 25 | 71 | 8 |
|
||||
| faceset_012 | 17 | 55 | 0 |
|
||||
|
||||
### 3.2 Observed strengths
|
||||
- **Identity grouping is directionally correct.** The top facesets are credibly large and coherent — the raw `raw_full/person_001` is 2.3 GB; refine extracted a 557→771-image faceset on top of that, which is a significant and useful identity pool by any standard.
|
||||
- **Quality gate is applied.** `min_short=90`, `min_blur=40`, `min_det_score=0.6` are enforced; low-resolution and out-of-focus faces are rejected.
|
||||
- **Outlier rejection is applied.** Faces with cosine distance > 0.55 from their cluster centroid are dropped (when cluster ≥ 4).
|
||||
- **Aliasing preserves provenance.** Every on-disk copy (byte-duplicates between iCloud / manual backups / etc.) is preserved in the folder, so the user can trace every file in a faceset back to its original location.
|
||||
- **Quality metrics already captured per face.** `face_short`, `blur` (Laplacian variance), `det_score`, `bbox` are persisted in the cache — available for any future ranking logic without re-embedding.
|
||||
|
||||
### 3.3 Observed weaknesses
|
||||
|
||||
Evidence is from direct computation on the cache (`nl_full.npz`) + the manifests.
|
||||
|
||||
**W1. face_records / image_count ratio ~2:1 in top facesets.**
|
||||
- faceset_001: 1505 faces / 771 images = 1.95 faces per image.
|
||||
- faceset_002: 543 / 238 = 2.28.
|
||||
- faceset_003: 402 / 206 = 1.95.
|
||||
- A healthy one-identity set should be ~1:1 (one face per image).
|
||||
- **Interpretation**: many of these are multi-face photos (group / family shots) where multiple people's faces were placed into the same cluster, or the same image had multiple faces all passing the centroid gate for the same identity. Either way, the current facesets are contaminated with **faces of other people from the same photo**. This is the single biggest downstream risk — see §4.
|
||||
|
||||
**W2. Intra-faceset pairwise cosine distance is high.**
|
||||
- Mean pairwise distance in faceset_001 = 0.835, p90 = 1.047, max = 1.242.
|
||||
- For reference: same-identity ArcFace cosine distance typically clusters in [0.2, 0.6]. Pairs > 1.0 (negative cosine similarity) cannot be the same person.
|
||||
- All 12 facesets have means in [0.82, 0.90] and p90 in [1.03, 1.07].
|
||||
- **Interpretation**: the clusters were built with `linkage=average, threshold=0.55`, which admits chain-effects — two points with direct distance > 1.0 can end up in the same cluster via intermediate points. Some of this spread is legitimate (the photo library spans 15+ years — same person at different ages and lighting), some is contamination from W1.
|
||||
|
||||
**W3. Near-duplicates inflate the effective size.**
|
||||
- `nl_full.duplicates.json`: 103 byte-identical groups (same file copied around) + 115 visual near-duplicate groups (cross-file cosine-distance ≤ 0.03 with matching bbox size — likely re-encodes / resizes).
|
||||
- faceset_001 alone carries 55 aliased paths.
|
||||
- **Interpretation**: multiple copies of the same photograph contribute the same embedding (or a near-identical one) to the cluster's average. This does not add identity information — at best neutral, at worst biases the average toward whatever pose/expression appears in the duplicate set.
|
||||
|
||||
**W4. Blur / quality gate is lax.**
|
||||
- Cache-wide `blur` (Laplacian variance) p10/p25/p50 = 19/32/60. Refine gate is 40, so roughly the bottom ~35% of faces drop on blur.
|
||||
- Per-faceset p10 blur is 36–90 — many included faces are visibly soft. For downstream swap this is acceptable (identity embedding tolerates modest softness) but tightening would improve the average.
|
||||
|
||||
**W5. No pose / frontality filtering.**
|
||||
- Neither detect-time nor refine uses landmarks / yaw / pitch. A strong profile shot with clear det_score + size still passes. ArcFace embeddings degrade for |yaw| > ~45°. The current set has no way to prefer frontal faces.
|
||||
|
||||
**W6. 583 singletons + 133 noface drop to floor.**
|
||||
- `_singletons/` in raw_full has 583 face-records (some of which are from legitimate subjects that just didn't cluster). `_noface/` has 133 files (hash-deduped images where detection failed). Some of these could belong to existing facesets with a looser centroid-match threshold.
|
||||
|
||||
**W7. Embedding averaging quirk is latent but OK.**
|
||||
- Investigated because `FaceSet.AverageEmbeddings()` at `FaceSet.py:15` overwrites `self.faces[0]["embedding"]` while the swapper reads `source_face.normed_embedding`. Confirmed via InsightFace source that `normed_embedding` is a `@property` that re-normalizes from `embedding`. **So averaging does take effect in the swap.** No action needed; noted to avoid a future misdiagnosis.
|
||||
|
||||
### 3.4 Observed risks for downstream use
|
||||
1. **Multi-face photos in a single-identity folder** (W1) → when zipped into `.fsz` and loaded, roop-unleashed will detect and add ALL faces in each PNG to the FaceSet (`faceswap_tab.py:678–687` loops every face returned by `extract_face_images` into the set). This is identity contamination by design of the loader. **Highest-priority risk.**
|
||||
2. **High intra-faceset variance** (W2, W5) → the averaged embedding becomes a diffuse "average face" rather than a crisp identity vector. Downstream swap will produce generic likenesses, with identity drift on hard frames.
|
||||
3. **Near-dupes biasing the mean** (W3) → identity average tilts toward over-represented poses (e.g., ten copies of one iPhone screenshot skew the mean).
|
||||
4. **No per-face ranking** — users have no signal on which images to include / exclude when hand-curating a subset, and no way to pick "best representative" images for thumbnails.
|
||||
|
||||
## 4. Downstream consumer requirements
|
||||
|
||||
### 4.1 What `roop-unleashed` expects
|
||||
|
||||
- **Input format**: a `.fsz` file, which is a zip of `.png` files (one crop per reference face). Created by `ui/tabs/facemgr_tab.py:on_update_clicked()`:
|
||||
```python
|
||||
filename = os.path.join(roop.globals.output_path, f"{index}.png")
|
||||
cv2.imwrite(filename, img)
|
||||
…
|
||||
util.zip(imgnames, finalzip) # imgnames → "faceset.fsz"
|
||||
```
|
||||
Files inside are named `0.png`, `1.png`, … — only indices.
|
||||
- **Load path** (`ui/tabs/faceswap_tab.py:672–691`): unzip, iterate `*.png`, run `extract_face_images(filename, (False, 0))` (note: `extra_padding` default `-1.0` → plain bbox crop, no resize-to-512 dance). For **every** detected face in each PNG, append the InsightFace `Face` object (with its 512-dim embedding) to `face_set.faces`. If the resulting set has more than one face, call `face_set.AverageEmbeddings()`.
|
||||
- **Use at swap time** (`ProcessMgr.py:626–634` + `processors/FaceSwapInsightFace.py:42–52`): only `face_set.faces[0]` is used; its `normed_embedding` is fed to `inswapper_128.onnx`. The other faces in the set only exist to contribute to the averaged embedding.
|
||||
- **Swap backend**: `inswapper_128.onnx` (see `roop/core.py:178`). Internal face working resolution is 128×128 per the InsightFace blog and FaceSwapLab FAQ; identity is carried entirely in the 512-dim embedding.
|
||||
|
||||
### 4.2 Practical requirements derived from the code
|
||||
1. **One identity per `.fsz`.** Anything else corrupts the averaged embedding.
|
||||
2. **One face per PNG inside the `.fsz`.** Any multi-face PNG → every face gets appended to the set, polluting the average. This is enforced only by the PNG's content, not by the loader.
|
||||
3. **Faces must be detectable by InsightFace `buffalo_l` at `det_size=(640,640)` or `(320,320)`.** Extremely small or cut-off faces will fail detection and be silently skipped on load.
|
||||
4. **Input resolution**: there is no explicit requirement, but since inswapper works at 128×128 and InsightFace aligns on 5 landmarks, a face bbox with a short edge of at least ~100–150 px gives a reliable embedding. Below ~60 px, embedding quality drops measurably (literature). Our `min_short=90` gate is close to the lower end of useful.
|
||||
5. **Frontality helps**. ArcFace embeddings are trained with some pose augmentation, so near-frontal (|yaw| ≤ 30°) is ideal; beyond ~45° the embedding starts to drift. Roop applies no compensation for this.
|
||||
6. **Expression / lighting diversity is desirable but not required.** FaceSwapLab explicitly supports "face blending" and notes it "improves the face's representative accuracy" — so a diverse set of the same identity is better than 100 near-duplicate frames.
|
||||
7. **No metadata is consumed.** roop-unleashed ignores everything outside the PNG bytes — filename, EXIF, sidecar JSON are not read.
|
||||
|
||||
### 4.3 Constraints and uncertainties
|
||||
- The `roop-unleashed` GitHub is unreachable (disabled), so the closest thing to community guidance is the in-repo `CLAUDE.md` and the code itself. Treat this code as authoritative.
|
||||
- **Assumption**: the user will either provide the whole `facesets_full/faceset_NNN/` folder to roop-unleashed's Face Management tab (which accepts image files + a folder button — `faceswap_tab.py:644–647`), OR pre-build `.fsz` files. Both paths run through the same loader; the multi-face-per-PNG issue applies equally.
|
||||
|
||||
## 5. Refinement opportunity matrix
|
||||
|
||||
Each opportunity is scored qualitatively. "Automation feasibility" distinguishes fully automated (A), semi-automated with heuristics that need operator review (S), and manual-only (M). "Best place" is where implementation should live.
|
||||
|
||||
| # | Opportunity | Problem addressed | Evidence | Expected downstream benefit | Automation | Risk / downside | Best place | Priority | Confidence |
|
||||
|---|---|---|---|---|---|---|---|---|---|
|
||||
| R1 | **Pre-crop each faceset image to a single face (the identity's own face)** before export | W1 — multi-face photos pollute FaceSet on load | refine_manifest face/image ratio ~2:1 in top clusters; roop loader adds every detected face in a PNG (`faceswap_tab.py:678–687`) | Large. Cleans the single biggest identity-averaging contaminant | A (use the existing bbox per face record in the cache and cv2.crop with padding, save to a new `facesets_swap_ready/` mirror) | Must pick the correct face of multiple detected per image → use the bbox that the upstream cache already matched to this faceset | `facesets` | **P0** | High |
|
||||
| R2 | **Split known multi-face photos so only the identity's own bbox is included**, alternative to full image export | Same as R1, more conservative | Same as R1 | Same as R1 | A | — | `facesets` | P0 | High |
|
||||
| R3 | **Identity tightening — re-run refine with stricter outlier threshold** (e.g. outlier_threshold=0.45) | W2 — intra-cluster spread too wide, chain effects from average-linkage | pairwise distance max > 1.2 in every faceset | Sharpens averaged embedding; removes obviously-wrong faces | A | Some legitimate same-person faces (age / lighting extremes) may be dropped | `facesets` | P0 | High |
|
||||
| R4 | **Drop visual near-duplicates from the set** (keep the highest-quality representative per dupe group) | W3 — duplicate images bias the average | `duplicates.json` has 115 visual groups (2–5 images each) across 4756 faces | Removes silent bias toward over-represented frames; shrinks set size for faster load | A | Deciding which copy to keep is a tiny judgement call (pick highest det_score × face_short × blur) | `facesets` | P1 | High |
|
||||
| R5 | **Per-face composite quality score** (weighted `det_score · blur · face_short · frontality`) and **ranked export / top-N subset** | Need to give roop-unleashed a small, strong averaging pool rather than all 771 images | Cache already has det_score, blur, face_short; frontality = landmark symmetry, computable from `landmark_2d_106` which InsightFace already provides but we don't store | Smaller `.fsz` files, better average embedding, faster UI | A for the score; S for the top-N choice (operator picks N per identity) | Frontality adds a small extra compute step; needs a re-pass over the cache or a re-embed storing landmarks | `facesets` | P1 | Medium |
|
||||
| R6 | **Produce `.fsz` directly** (zip the cropped PNGs with integer filenames) as an export mode | Saves the operator the manual zipping step; guarantees filename correctness | `facemgr_tab.py:242–255` is the reference implementation; trivially reproducible | Zero-friction import into roop-unleashed | A | — | `facesets` | P1 | High |
|
||||
| R7 | **Pose / frontality filter at refine time** using `pose_2d_106` landmark symmetry or yaw estimation from `face.pose` (if available) | W5 — strong profile faces weaken the average | ArcFace literature; no measurement yet in our cache | Tighter identity average, especially for smaller facesets where one profile shot can dominate | A (compute from cached landmarks if we re-embed or store them; otherwise a one-off enrichment pass) | Landmarks not currently persisted in the cache; requires a small re-embed or enrichment command | `facesets` | P2 | Medium |
|
||||
| R8 | **Singleton rescue pass** — re-classify `_singletons/` against final faceset centroids with a looser threshold + quality gate | W6 — some singletons are legit faceset members | 583 singletons with p50 face_short=149, p50 det_score=0.76 — many look usable | Recovers lost identity examples; modest expansion of useful facesets | A | Some true singletons will be mis-assigned; threshold choice matters | `facesets` | P2 | Medium |
|
||||
| R9 | **Modern face-quality scorer** (SDD-FIQA / CR-FIQA) to replace the `det_score × blur` heuristic | More robust quality ranking than hand-rolled heuristics | Literature; current heuristic is crude | Marginal improvement over R5 for the same goal | A but adds a new model dependency | Model weights to download, more CPU cost at ranking time | `facesets` | P3 | Medium |
|
||||
| R10 | **Person-label sidecars** (e.g. `faceset_001/_label.txt` with an operator-provided name) | UX — the 12 facesets are anonymous; operator has to peek to find "mom" | No evidence; improvement to workflow | Operator-quality-of-life; no effect on swap quality | M | — | `facesets` | P3 | Low |
|
||||
| R11 | **Feed multiple source images selection UI in roop-unleashed improvements** (e.g. a "pick best 20 by quality" button on load) | Better use of large `.fsz` files | Not implemented downstream | Improvement happens at consumption time | A | Requires roop-unleashed patch, which is a disabled upstream | `roop-unleashed` | P4 | Low |
|
||||
| R12 | **Face alignment / crop standardization** (e.g. arcface-aligned 512×512 crops in the `.fsz`) | Some marginal consistency gain on detection | roop re-detects anyway on load (`extract_face_images`) so input alignment is discarded | Very small — roop's loader re-detects and re-aligns regardless | A | Extra compute for no practical gain | — (do not do) | **Not recommended** | High |
|
||||
| R13 | **Increase resolution via upscaling of low-res crops** | Make small faces "bigger" | Identity comes from the embedding, not the pixels | None — upscaling with GAN does not add identity info; inswapper reads 128×128 anyway | A | Can introduce synthetic artifacts | — (do not do) | **Not recommended** | High |
|
||||
| R14 | **Destructive reorganization of `facesets_full/` in place** | Simpler final layout | Operator explicitly told us yesterday to preserve existing output | Marginal tidiness | M | Loses the current "full cluster" reference view, which has diagnostic value | — (do not do without explicit go-ahead) | **Not recommended by default** | High |
|
||||
|
||||
## 6. Recommended target state
|
||||
|
||||
Define a new output view, `facesets_swap_ready/`, produced by a new subcommand (e.g. `sort_faces.py export-swap`). Original `facesets_full/` stays intact. Per faceset:
|
||||
|
||||
```
|
||||
facesets_swap_ready/
|
||||
faceset_001/
|
||||
manifest.json # provenance + per-image score + rank
|
||||
previews/ # 4-image contact sheet thumbnail
|
||||
top_20_grid.jpg
|
||||
faces/ # cropped-to-single-face PNGs named "000.png", "001.png", ...
|
||||
000.png # highest-ranked face, single face per PNG, 512x512 padded/aligned
|
||||
001.png
|
||||
...
|
||||
faceset.fsz # zip of faces/*.png — drop-in for roop-unleashed
|
||||
faceset_002/
|
||||
...
|
||||
```
|
||||
|
||||
Key properties:
|
||||
1. **One face per PNG** — each PNG is a crop of a single face (R1/R2), padded to a consistent 512×512 with the identity's bbox centred. Roop-unleashed's loader will re-detect exactly one face per file.
|
||||
2. **Ranked by composite quality** — `faces/000.png` is the best representative; later indices are weaker. Operator can trivially truncate by dropping later files.
|
||||
3. **Configurable top-N** — default `--top-n 30` per faceset with a `--include-all` flag for the current behaviour. 30 is conservative; FaceSwapLab's "face blending" tool (the most analogous public practitioner reference) shows that blending with diverse but consistent images materially helps; 20–40 is a common practitioner range.
|
||||
4. **Near-duplicates dropped** (R4) — one representative per visual-dupe group.
|
||||
5. **Tighter outlier gate** (R3) — outlier_threshold reduced from 0.55 to ~0.45 for this export, keeping the refine defaults on `facesets_full/`.
|
||||
6. **Ready-to-ship `.fsz`** (R6) in each folder.
|
||||
7. **manifest.json per faceset** — cites every source path and score. Lets the operator see *why* a face was kept (or dropped if we add a `_rejected/` sibling).
|
||||
|
||||
This lets the operator test swap quality end-to-end without any roop-unleashed modification, and preserves full fallback to the raw / full results if anything needs re-examination.
|
||||
|
||||
## 7. Recommended next steps
|
||||
|
||||
### 7.1 Quick wins (high value, low effort)
|
||||
1. **R1 — single-face crop export** as part of `export-swap`. Uses bbox already in the cache; zero new models. Delivers the biggest likely swap-quality improvement.
|
||||
2. **R4 — drop visual near-duplicates** inside the export. Uses `duplicates.json` already produced by `cmd_dedup`. Smaller sets, cleaner averages.
|
||||
3. **R5 — composite quality score + rank + top-N**. Uses existing fields (`det_score`, `blur`, `face_short`). Deliver `.fsz` + `faces/` sorted by descending score.
|
||||
4. **R6 — `.fsz` bundle emission** by simply zipping `faces/*.png` with integer names. Trivial given (1)-(3).
|
||||
|
||||
These four together give a clean, drop-in-usable export in one session of work.
|
||||
|
||||
### 7.2 Medium-effort improvements
|
||||
5. **R3 — re-run refine with stricter `outlier_threshold`** (e.g. 0.45) for the export path; keep `facesets_full/` at 0.55 for reference. Requires a re-cluster over existing embeddings — fast (seconds), no re-embed.
|
||||
6. **R7 — pose/frontality filter** using landmarks. Requires either (a) a re-embed pass that persists `landmark_2d_106`, or (b) an enrichment pass that re-loads each image and computes yaw without redoing the full embed. Modest CPU cost; meaningful for small facesets.
|
||||
7. **R8 — singleton rescue** against final centroids. Low code cost; likely yields a handful of additional good images per identity.
|
||||
|
||||
### 7.3 Items requiring operator decision
|
||||
- **Target top-N per faceset** for the export (proposal: 30, override per run). Affects the average-embedding quality trade-off vs. UI load time.
|
||||
- **Whether to name facesets** (R10) by operator — purely workflow.
|
||||
- **Whether `_singletons/` should be retired** or promoted to "uncertain identity" export with a lower-confidence tag.
|
||||
|
||||
### 7.4 Not recommended
|
||||
- **R11** — patching `roop-unleashed` itself. The upstream repo is disabled; touching it introduces fork-maintenance overhead for no proportional gain we can't already achieve upstream in `facesets`.
|
||||
- **R12 / R13** — pre-aligning or up-scaling source crops. Roop re-detects/aligns on load and inswapper caps at 128×128 internally; effort is wasted.
|
||||
- **R14** — destructive reorganization of `facesets_full/`. The operator already told us (yesterday) to preserve existing results; no new evidence supports re-opening that.
|
||||
|
||||
## 8. Open questions
|
||||
|
||||
- **OQ1**. Is the operator willing to have the export step **drop** faces rather than just rank them? R5-top-N drops everything past rank N; if the operator prefers to keep the full set but marked, we should export ranked without truncation and let the user pick in the UI.
|
||||
- **OQ2**. How many `.fsz` files does the operator actually plan to use? If only 3–4 identities will be used in practice, R5 can stay conservative (N=50) without cost. If all 12 are routinely used, leaner is better (N=20).
|
||||
- **OQ3**. Should singletons (R8) be rescued into existing facesets or exported as their own "candidate_NNN/" bucket for manual triage? The safer default is a separate bucket; the operator may prefer direct merge.
|
||||
- **OQ4**. Is frontality-filtering (R7) worth a re-embed, or should we settle for a cheap "bbox aspect ratio" proxy? A proper yaw estimate needs landmarks; a crude proxy (bbox width/height ratio) is free but weaker.
|
||||
- **OQ5**. Is there appetite for adding a modern FIQA model (R9) as a drop-in dependency? It adds ~50 MB download and a small CPU cost per face; benefit over the current heuristic is real but modest.
|
||||
- **OQ6**. For the export, should the operator name (R10) be **required** before an `.fsz` is emitted (forces thought about which identity is which), or optional (pure convenience)?
|
||||
|
||||
---
|
||||
|
||||
_End of evaluation. No code has been changed as part of this analysis._
|
||||
541
sort_faces.py
541
sort_faces.py
@@ -943,6 +943,519 @@ def cmd_extend(
|
||||
print(f"Updated refine manifest -> {refine_manifest_path}")
|
||||
|
||||
|
||||
# ---------- enrich (landmarks + pose per face record) ---------- #
|
||||
|
||||
def _pick_face_for_bbox(faces: list, stored_bbox: list[int]):
|
||||
"""Given freshly-detected faces and a stored bbox, return the detected face whose
|
||||
bbox has the highest IoU with stored_bbox (or None if no overlap)."""
|
||||
if not faces:
|
||||
return None
|
||||
sx1, sy1, sx2, sy2 = stored_bbox
|
||||
sa = max(1, (sx2 - sx1) * (sy2 - sy1))
|
||||
best = None
|
||||
best_iou = 0.0
|
||||
for f in faces:
|
||||
x1, y1, x2, y2 = [int(round(v)) for v in f.bbox]
|
||||
ix1, iy1 = max(sx1, x1), max(sy1, y1)
|
||||
ix2, iy2 = min(sx2, x2), min(sy2, y2)
|
||||
if ix2 <= ix1 or iy2 <= iy1:
|
||||
continue
|
||||
inter = (ix2 - ix1) * (iy2 - iy1)
|
||||
fa = max(1, (x2 - x1) * (y2 - y1))
|
||||
union = sa + fa - inter
|
||||
iou = inter / union
|
||||
if iou > best_iou:
|
||||
best_iou = iou
|
||||
best = f
|
||||
return best if best_iou >= 0.3 else None
|
||||
|
||||
|
||||
def cmd_enrich(cache_path: Path, force: bool, flush_every: int) -> None:
|
||||
"""Re-detect every face record's source image to persist landmarks + pose.
|
||||
|
||||
Skips the recognition module (we already have embeddings) so detection + the two
|
||||
landmark models are the only ones loaded.
|
||||
"""
|
||||
emb, meta, src_root, processed, path_aliases = load_cache(cache_path)
|
||||
if src_root is None:
|
||||
src_root = Path("/")
|
||||
|
||||
to_do: list[int] = []
|
||||
for i, m in enumerate(meta):
|
||||
if m.get("noface"):
|
||||
continue
|
||||
if force or not m.get("pose"):
|
||||
to_do.append(i)
|
||||
|
||||
if not to_do:
|
||||
print("Enrich: nothing to do; every face record already has pose.")
|
||||
return
|
||||
|
||||
# Group indices by source path so each image is decoded exactly once.
|
||||
path_to_indices: dict[str, list[int]] = {}
|
||||
for i in to_do:
|
||||
path_to_indices.setdefault(meta[i]["path"], []).append(i)
|
||||
|
||||
print(f"Enrich: {len(to_do)} face records to enrich across {len(path_to_indices)} unique files")
|
||||
|
||||
from insightface.app import FaceAnalysis
|
||||
app = FaceAnalysis(
|
||||
name="buffalo_l",
|
||||
providers=["CPUExecutionProvider"],
|
||||
allowed_modules=["detection", "landmark_2d_106", "landmark_3d_68"],
|
||||
)
|
||||
app.prepare(ctx_id=-1, det_size=(640, 640))
|
||||
|
||||
since_flush = 0
|
||||
missing = 0
|
||||
ok = 0
|
||||
try:
|
||||
for path, idxs in tqdm(path_to_indices.items(), desc="enriching"):
|
||||
rgb, bgr = load_rgb_bgr(Path(path))
|
||||
if bgr is None:
|
||||
missing += len(idxs)
|
||||
continue
|
||||
faces = app.get(bgr)
|
||||
for i in idxs:
|
||||
match = _pick_face_for_bbox(faces, meta[i].get("bbox"))
|
||||
if match is None:
|
||||
missing += 1
|
||||
continue
|
||||
if match.landmark_2d_106 is not None:
|
||||
meta[i]["landmark_2d_106"] = match.landmark_2d_106.astype(np.float32).tolist()
|
||||
if match.landmark_3d_68 is not None:
|
||||
meta[i]["landmark_3d_68"] = match.landmark_3d_68.astype(np.float32).tolist()
|
||||
if match.pose is not None:
|
||||
meta[i]["pose"] = match.pose.astype(np.float32).tolist() # [pitch, yaw, roll]
|
||||
ok += 1
|
||||
since_flush += 1
|
||||
if since_flush >= flush_every:
|
||||
save_cache(cache_path, emb, meta, src_root, processed, path_aliases)
|
||||
since_flush = 0
|
||||
finally:
|
||||
save_cache(cache_path, emb, meta, src_root, processed, path_aliases)
|
||||
|
||||
print(f"Enrich done: {ok} records enriched, {missing} could not be matched")
|
||||
|
||||
|
||||
# ---------- quality scoring ---------- #
|
||||
|
||||
QUALITY_WEIGHTS = {
|
||||
"det": 0.20,
|
||||
"size": 0.15,
|
||||
"sharp": 0.15,
|
||||
"frontal": 0.30,
|
||||
"symmetry": 0.20,
|
||||
}
|
||||
|
||||
|
||||
def _norm01(x: float, lo: float, hi: float) -> float:
|
||||
if hi <= lo:
|
||||
return 0.0
|
||||
return max(0.0, min(1.0, (x - lo) / (hi - lo)))
|
||||
|
||||
|
||||
def _landmark_symmetry(lm: list[list[float]] | None, bbox: list[int] | None) -> float:
|
||||
"""Score [0,1] based on how symmetric the 2D 106 landmarks are about the bbox vertical center.
|
||||
A head-on, un-occluded face has high symmetry; a strong profile or half-occluded face has low.
|
||||
Returns 0.5 if landmarks unavailable (neutral)."""
|
||||
if not lm or not bbox:
|
||||
return 0.5
|
||||
try:
|
||||
arr = np.asarray(lm, dtype=np.float32)
|
||||
cx = 0.5 * (bbox[0] + bbox[2])
|
||||
width = max(1.0, bbox[2] - bbox[0])
|
||||
# Mirror each landmark around cx and measure closest-landmark distance (normalized by bbox width).
|
||||
mirrored = arr.copy()
|
||||
mirrored[:, 0] = 2 * cx - mirrored[:, 0]
|
||||
# For each mirrored point, find nearest real landmark.
|
||||
d = np.linalg.norm(mirrored[:, None, :] - arr[None, :, :], axis=2).min(axis=1)
|
||||
mean_err = d.mean() / width
|
||||
# Empirically mean_err is ~0.02 for frontal, ~0.15 for strong profile.
|
||||
score = 1.0 - _norm01(mean_err, 0.02, 0.15)
|
||||
return float(score)
|
||||
except Exception:
|
||||
return 0.5
|
||||
|
||||
|
||||
def _frontality(pose: list[float] | None) -> float:
|
||||
if not pose or len(pose) < 2:
|
||||
return 0.5
|
||||
pitch, yaw = abs(pose[0]), abs(pose[1])
|
||||
# yaw is the dominant signal for arcface-style embedding degradation.
|
||||
yaw_score = 1.0 - _norm01(yaw, 10.0, 45.0)
|
||||
pitch_score = 1.0 - _norm01(pitch, 10.0, 35.0)
|
||||
return 0.7 * yaw_score + 0.3 * pitch_score
|
||||
|
||||
|
||||
def compute_quality(rec: dict) -> dict:
|
||||
"""Return dict with per-signal sub-scores and a composite score in [0,1]."""
|
||||
det = _norm01(float(rec.get("det_score", 0.0)), 0.50, 0.95)
|
||||
size = _norm01(float(rec.get("face_short", 0)), 90.0, 300.0)
|
||||
sharp = _norm01(float(rec.get("blur", 0.0)), 40.0, 250.0)
|
||||
frontal = _frontality(rec.get("pose"))
|
||||
symmetry = _landmark_symmetry(rec.get("landmark_2d_106"), rec.get("bbox"))
|
||||
w = QUALITY_WEIGHTS
|
||||
composite = (
|
||||
w["det"] * det + w["size"] * size + w["sharp"] * sharp
|
||||
+ w["frontal"] * frontal + w["symmetry"] * symmetry
|
||||
)
|
||||
return {
|
||||
"composite": float(composite),
|
||||
"det": float(det), "size": float(size), "sharp": float(sharp),
|
||||
"frontal": float(frontal), "symmetry": float(symmetry),
|
||||
}
|
||||
|
||||
|
||||
# ---------- export-swap ---------- #
|
||||
|
||||
def _crop_face_square(rgb: np.ndarray, bbox: list[int], pad_ratio: float, out_size: int) -> np.ndarray:
|
||||
"""Pad bbox by `pad_ratio` on each side, clamp to image, pad to square, resize to out_size."""
|
||||
import cv2
|
||||
h, w = rgb.shape[:2]
|
||||
x1, y1, x2, y2 = [int(v) for v in bbox]
|
||||
bw, bh = x2 - x1, y2 - y1
|
||||
px = int(bw * pad_ratio)
|
||||
py = int(bh * pad_ratio)
|
||||
ex1 = max(0, x1 - px)
|
||||
ey1 = max(0, y1 - py)
|
||||
ex2 = min(w, x2 + px)
|
||||
ey2 = min(h, y2 + py)
|
||||
crop = rgb[ey1:ey2, ex1:ex2]
|
||||
ch, cw = crop.shape[:2]
|
||||
if ch == 0 or cw == 0:
|
||||
return np.zeros((out_size, out_size, 3), dtype=np.uint8)
|
||||
if ch != cw:
|
||||
sz = max(ch, cw)
|
||||
padded = np.zeros((sz, sz, 3), dtype=crop.dtype)
|
||||
y_off = (sz - ch) // 2
|
||||
x_off = (sz - cw) // 2
|
||||
padded[y_off:y_off + ch, x_off:x_off + cw] = crop
|
||||
crop = padded
|
||||
if crop.shape[0] != out_size:
|
||||
crop = cv2.resize(crop, (out_size, out_size), interpolation=cv2.INTER_AREA)
|
||||
return crop
|
||||
|
||||
|
||||
def _zip_png_list(pngs: list[Path], zip_path: Path) -> None:
|
||||
"""Write a .fsz (zip) with the given PNGs named 0000.png, 0001.png, ..."""
|
||||
import zipfile
|
||||
with zipfile.ZipFile(zip_path, "w", compression=zipfile.ZIP_DEFLATED, compresslevel=4) as zf:
|
||||
for i, p in enumerate(pngs):
|
||||
zf.write(p, arcname=f"{i:04d}.png")
|
||||
|
||||
|
||||
def cmd_export_swap(
|
||||
cache_path: Path,
|
||||
refine_manifest_path: Path,
|
||||
raw_manifest_path: Path | None,
|
||||
out_dir: Path,
|
||||
top_n: int,
|
||||
outlier_threshold: float,
|
||||
pad_ratio: float,
|
||||
out_size: int,
|
||||
include_candidates: bool,
|
||||
candidate_match_threshold: float,
|
||||
candidate_min_score: float,
|
||||
min_face_short: int,
|
||||
) -> None:
|
||||
import cv2
|
||||
emb, meta, src_root, _processed, path_aliases = load_cache(cache_path)
|
||||
rm = json.loads(refine_manifest_path.read_text())
|
||||
|
||||
dup_path = cache_path.with_suffix(".duplicates.json")
|
||||
if not dup_path.exists():
|
||||
dup_path = cache_path.parent / (cache_path.stem + ".duplicates.json")
|
||||
visual_groups: list[list[str]] = []
|
||||
if dup_path.exists():
|
||||
visual_groups = json.loads(dup_path.read_text()).get("visual_groups", [])
|
||||
|
||||
path_to_vgroup: dict[str, tuple[str, ...]] = {}
|
||||
for g in visual_groups:
|
||||
key = tuple(sorted(g))
|
||||
for p in g:
|
||||
path_to_vgroup[p] = key
|
||||
|
||||
face_records = [m for m in meta if not m.get("noface")]
|
||||
if len(face_records) != len(emb):
|
||||
raise SystemExit(f"meta/embedding mismatch: {len(face_records)} vs {len(emb)}")
|
||||
path_idx: dict[str, list[int]] = {}
|
||||
for i, m in enumerate(face_records):
|
||||
path_idx.setdefault(m["path"], []).append(i)
|
||||
|
||||
out_dir.mkdir(parents=True, exist_ok=True)
|
||||
faceset_summary = []
|
||||
final_centroids: dict[str, np.ndarray] = {}
|
||||
placed_cache_indices: set[int] = set()
|
||||
|
||||
for fs in rm.get("facesets", []):
|
||||
name = fs["name"]
|
||||
paths = set(fs.get("images", []))
|
||||
indices = [i for p in paths for i in path_idx.get(p, [])]
|
||||
if not indices:
|
||||
continue
|
||||
|
||||
# Initial centroid for this faceset from all its current members.
|
||||
init_vecs = emb[indices]
|
||||
init_cent = init_vecs.mean(axis=0)
|
||||
nrm = np.linalg.norm(init_cent)
|
||||
if nrm > 0:
|
||||
init_cent = init_cent / nrm
|
||||
|
||||
# Tight outlier filter + quality.
|
||||
ranked: list[dict] = []
|
||||
dropped_outlier = 0
|
||||
for i in indices:
|
||||
cosd = 1.0 - float(emb[i] @ init_cent)
|
||||
if cosd > outlier_threshold:
|
||||
dropped_outlier += 1
|
||||
continue
|
||||
rec = face_records[i]
|
||||
if rec.get("face_short", 0) < min_face_short:
|
||||
continue
|
||||
q = compute_quality(rec)
|
||||
ranked.append({"cache_idx": i, "rec": rec, "cosd": cosd, "quality": q})
|
||||
|
||||
# Visual-dupe collapse: keep best score per group.
|
||||
groups_best: dict[tuple[str, ...], dict] = {}
|
||||
singletons: list[dict] = []
|
||||
for r in ranked:
|
||||
g = path_to_vgroup.get(r["rec"]["path"])
|
||||
if g is None:
|
||||
singletons.append(r)
|
||||
continue
|
||||
prev = groups_best.get(g)
|
||||
if prev is None or r["quality"]["composite"] > prev["quality"]["composite"]:
|
||||
groups_best[g] = r
|
||||
kept = singletons + list(groups_best.values())
|
||||
kept.sort(key=lambda r: -r["quality"]["composite"])
|
||||
dropped_vdupe = len(ranked) - len(kept)
|
||||
|
||||
if not kept:
|
||||
print(f"[{name}] empty after filtering; skipping")
|
||||
continue
|
||||
|
||||
# Recompute centroid from the kept embeddings (used for singleton rescue).
|
||||
kept_vecs = np.stack([emb[r["cache_idx"]] for r in kept])
|
||||
final_cent = kept_vecs.mean(axis=0)
|
||||
nrm = np.linalg.norm(final_cent)
|
||||
if nrm > 0:
|
||||
final_cent = final_cent / nrm
|
||||
final_centroids[name] = final_cent
|
||||
for r in kept:
|
||||
placed_cache_indices.add(r["cache_idx"])
|
||||
|
||||
# Materialize.
|
||||
fs_out = out_dir / name
|
||||
faces_dir = fs_out / "faces"
|
||||
faces_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Deduplicate by source path: within the same faceset, a multi-face photo could
|
||||
# have produced 2 records with different bboxes; we want the one with the best quality
|
||||
# to win, and only crop that face.
|
||||
seen_path = {}
|
||||
unique_kept: list[dict] = []
|
||||
for r in kept:
|
||||
p = r["rec"]["path"]
|
||||
if p not in seen_path or r["quality"]["composite"] > seen_path[p]["quality"]["composite"]:
|
||||
seen_path[p] = r
|
||||
unique_kept = sorted(seen_path.values(), key=lambda r: -r["quality"]["composite"])
|
||||
|
||||
written_pngs: list[Path] = []
|
||||
manifest_faces: list[dict] = []
|
||||
for rank, r in enumerate(unique_kept, start=1):
|
||||
rec = r["rec"]
|
||||
src = Path(rec["path"])
|
||||
rgb = None
|
||||
if src.exists():
|
||||
rgb, _ = load_rgb_bgr(src)
|
||||
if rgb is None:
|
||||
continue
|
||||
crop = _crop_face_square(rgb, rec["bbox"], pad_ratio, out_size)
|
||||
png = faces_dir / f"{rank:04d}.png"
|
||||
cv2.imwrite(str(png), cv2.cvtColor(crop, cv2.COLOR_RGB2BGR))
|
||||
written_pngs.append(png)
|
||||
manifest_faces.append({
|
||||
"rank": rank,
|
||||
"png": f"faces/{rank:04d}.png",
|
||||
"source": rec["path"],
|
||||
"aliases": path_aliases.get(rec["path"], []),
|
||||
"bbox": rec["bbox"],
|
||||
"face_short": rec.get("face_short"),
|
||||
"det_score": rec.get("det_score"),
|
||||
"blur": rec.get("blur"),
|
||||
"pose": rec.get("pose"),
|
||||
"cosd_centroid": float(r["cosd"]),
|
||||
"quality": r["quality"],
|
||||
})
|
||||
|
||||
if not written_pngs:
|
||||
continue
|
||||
|
||||
# Emit .fsz bundles.
|
||||
top_n_eff = min(top_n, len(written_pngs))
|
||||
_zip_png_list(written_pngs[:top_n_eff], fs_out / f"{name}_top{top_n_eff}.fsz")
|
||||
if len(written_pngs) > top_n_eff:
|
||||
_zip_png_list(written_pngs, fs_out / f"{name}_all.fsz")
|
||||
|
||||
# Per-faceset manifest.
|
||||
manifest = {
|
||||
"name": name,
|
||||
"input_face_records": len(indices),
|
||||
"dropped_outlier": dropped_outlier,
|
||||
"dropped_visual_dupes": dropped_vdupe,
|
||||
"dropped_multi_face_same_source": len(kept) - len(unique_kept),
|
||||
"exported": len(written_pngs),
|
||||
"top_n": top_n_eff,
|
||||
"fsz_top": f"{name}_top{top_n_eff}.fsz",
|
||||
"fsz_all": f"{name}_all.fsz" if len(written_pngs) > top_n_eff else None,
|
||||
"quality_weights": QUALITY_WEIGHTS,
|
||||
"faces": manifest_faces,
|
||||
}
|
||||
(fs_out / "manifest.json").write_text(json.dumps(manifest, indent=2))
|
||||
|
||||
# Convenience name placeholder.
|
||||
name_file = fs_out / "NAME.txt"
|
||||
if not name_file.exists():
|
||||
name_file.write_text(
|
||||
"# Optional: write the identity's name on the first line.\n"
|
||||
"# This file is for operator reference only - roop-unleashed ignores it.\n\n"
|
||||
)
|
||||
|
||||
faceset_summary.append(manifest)
|
||||
print(
|
||||
f"[{name}] in={len(indices)} outlier_drop={dropped_outlier} vdupe_drop={dropped_vdupe} "
|
||||
f"multiface_drop={len(kept) - len(unique_kept)} exported={len(written_pngs)} "
|
||||
f"(top{top_n_eff}.fsz)"
|
||||
)
|
||||
|
||||
# Singleton rescue -> _candidates/
|
||||
if include_candidates and raw_manifest_path is not None:
|
||||
raw = json.loads(raw_manifest_path.read_text())
|
||||
# Index singletons: face records in _singletons by (path, bbox) => cache index
|
||||
bbox_key_to_cache = {
|
||||
(m["path"], tuple(m["bbox"]) if m.get("bbox") else None): i
|
||||
for i, m in enumerate(face_records)
|
||||
}
|
||||
singleton_cache_indices: list[int] = []
|
||||
for e in raw:
|
||||
if e.get("folder") != "_singletons":
|
||||
continue
|
||||
key = (e["source"], tuple(e["bbox"]) if e.get("bbox") else None)
|
||||
ci = bbox_key_to_cache.get(key)
|
||||
if ci is not None and ci not in placed_cache_indices:
|
||||
singleton_cache_indices.append(ci)
|
||||
|
||||
if not final_centroids:
|
||||
print("No final centroids; skipping candidates.")
|
||||
elif not singleton_cache_indices:
|
||||
print("No singletons to rescue.")
|
||||
else:
|
||||
cand_root = out_dir / "_candidates"
|
||||
cand_root.mkdir(parents=True, exist_ok=True)
|
||||
cent_names = list(final_centroids.keys())
|
||||
cent_mat = np.stack([final_centroids[n] for n in cent_names])
|
||||
|
||||
to_faceset: dict[str, list[int]] = {}
|
||||
unmatched: list[int] = []
|
||||
rescued_report: list[dict] = []
|
||||
|
||||
for ci in singleton_cache_indices:
|
||||
rec = face_records[ci]
|
||||
if rec.get("face_short", 0) < min_face_short:
|
||||
continue
|
||||
q = compute_quality(rec)
|
||||
if q["composite"] < candidate_min_score:
|
||||
continue
|
||||
sims = cent_mat @ emb[ci]
|
||||
best = int(np.argmax(sims))
|
||||
dist = 1.0 - float(sims[best])
|
||||
if dist <= candidate_match_threshold:
|
||||
to_faceset.setdefault(cent_names[best], []).append(ci)
|
||||
rescued_report.append({
|
||||
"cache_idx": ci, "source": rec["path"], "assigned": cent_names[best],
|
||||
"cosd": dist, "quality": q,
|
||||
})
|
||||
else:
|
||||
unmatched.append(ci)
|
||||
|
||||
# Cluster unmatched among themselves into new_NNN buckets.
|
||||
if len(unmatched) > 1:
|
||||
u_vecs = np.stack([emb[i] for i in unmatched])
|
||||
labels = _cluster_embeddings(u_vecs, 0.55)
|
||||
groups: dict[int, list[int]] = {}
|
||||
for ci, lbl in zip(unmatched, labels):
|
||||
groups.setdefault(int(lbl), []).append(ci)
|
||||
groups_sorted = sorted(groups.items(), key=lambda kv: -len(kv[1]))
|
||||
new_buckets = {}
|
||||
rank = 0
|
||||
for _gid, members in groups_sorted:
|
||||
if len(members) == 1:
|
||||
continue # still a singleton, skip
|
||||
rank += 1
|
||||
new_buckets[f"new_{rank:03d}"] = members
|
||||
to_new = new_buckets
|
||||
else:
|
||||
to_new = {}
|
||||
|
||||
# Materialize candidates
|
||||
def materialize(bucket_name: str, ci_list: list[int]):
|
||||
bd = cand_root / bucket_name
|
||||
fd = bd / "faces"
|
||||
fd.mkdir(parents=True, exist_ok=True)
|
||||
written = []
|
||||
entries = []
|
||||
ranked_cis = sorted(ci_list, key=lambda i: -compute_quality(face_records[i])["composite"])
|
||||
for rk, ci in enumerate(ranked_cis, 1):
|
||||
rec = face_records[ci]
|
||||
src = Path(rec["path"])
|
||||
if not src.exists():
|
||||
continue
|
||||
rgb, _ = load_rgb_bgr(src)
|
||||
if rgb is None:
|
||||
continue
|
||||
crop = _crop_face_square(rgb, rec["bbox"], pad_ratio, out_size)
|
||||
png = fd / f"{rk:04d}.png"
|
||||
cv2.imwrite(str(png), cv2.cvtColor(crop, cv2.COLOR_RGB2BGR))
|
||||
written.append(png)
|
||||
entries.append({
|
||||
"rank": rk,
|
||||
"png": f"faces/{rk:04d}.png",
|
||||
"source": rec["path"],
|
||||
"bbox": rec["bbox"],
|
||||
"quality": compute_quality(rec),
|
||||
})
|
||||
if written:
|
||||
(bd / "manifest.json").write_text(json.dumps({
|
||||
"bucket": bucket_name,
|
||||
"faces": entries,
|
||||
}, indent=2))
|
||||
|
||||
for fs_name, cis in to_faceset.items():
|
||||
materialize(f"to_{fs_name}", cis)
|
||||
for bname, cis in to_new.items():
|
||||
materialize(bname, cis)
|
||||
|
||||
(cand_root / "rescue_report.json").write_text(json.dumps({
|
||||
"rescued_to_existing": len(rescued_report),
|
||||
"new_clusters": len(to_new),
|
||||
"unmatched_singletons_kept_as_singleton": len(unmatched) - sum(len(v) for v in to_new.values()),
|
||||
"assignments": rescued_report,
|
||||
}, indent=2))
|
||||
print(f"Candidates: rescued={len(rescued_report)} to existing facesets; new_clusters={len(to_new)}")
|
||||
|
||||
# Top-level manifest
|
||||
(out_dir / "manifest.json").write_text(json.dumps({
|
||||
"facesets": [{k: v for k, v in m.items() if k != "faces"} for m in faceset_summary],
|
||||
"quality_weights": QUALITY_WEIGHTS,
|
||||
"outlier_threshold": outlier_threshold,
|
||||
"top_n": top_n,
|
||||
"pad_ratio": pad_ratio,
|
||||
"out_size": out_size,
|
||||
}, indent=2))
|
||||
print(f"Wrote top-level manifest -> {out_dir / 'manifest.json'}")
|
||||
|
||||
|
||||
# ---------- main ---------- #
|
||||
|
||||
def main() -> None:
|
||||
@@ -992,6 +1505,25 @@ def main() -> None:
|
||||
px.add_argument("--refine-min-det-score", type=float, default=0.6)
|
||||
px.add_argument("--refine-centroid-threshold", type=float, default=0.55)
|
||||
|
||||
pn = sub.add_parser("enrich", help="Re-detect to persist landmark_2d_106, landmark_3d_68, pose into cache")
|
||||
pn.add_argument("cache", type=Path)
|
||||
pn.add_argument("--force", action="store_true", help="re-enrich even records that already have pose")
|
||||
pn.add_argument("--flush-every", type=int, default=100)
|
||||
|
||||
pxs = sub.add_parser("export-swap", help="Build facesets_swap_ready/ with ranked single-face PNGs + .fsz per identity")
|
||||
pxs.add_argument("cache", type=Path)
|
||||
pxs.add_argument("refine_manifest", type=Path, help="path to refine_manifest.json of the source facesets dir")
|
||||
pxs.add_argument("out_dir", type=Path)
|
||||
pxs.add_argument("--raw-manifest", type=Path, default=None, help="raw_full/manifest.json (required for --candidates)")
|
||||
pxs.add_argument("--top-n", type=int, default=30)
|
||||
pxs.add_argument("--outlier-threshold", type=float, default=0.45)
|
||||
pxs.add_argument("--pad-ratio", type=float, default=0.5)
|
||||
pxs.add_argument("--out-size", type=int, default=512)
|
||||
pxs.add_argument("--min-face-short", type=int, default=100)
|
||||
pxs.add_argument("--candidates", action="store_true", help="rescue singletons into _candidates/")
|
||||
pxs.add_argument("--candidate-match-threshold", type=float, default=0.55)
|
||||
pxs.add_argument("--candidate-min-score", type=float, default=0.40)
|
||||
|
||||
args = p.parse_args()
|
||||
if args.cmd == "embed":
|
||||
cmd_embed(args.src_dir, args.cache, resume=not args.no_resume, flush_every=args.flush_every)
|
||||
@@ -1013,6 +1545,15 @@ def main() -> None:
|
||||
args.refine_min_short, args.refine_min_blur, args.refine_min_det_score,
|
||||
args.refine_centroid_threshold,
|
||||
)
|
||||
elif args.cmd == "enrich":
|
||||
cmd_enrich(args.cache, force=args.force, flush_every=args.flush_every)
|
||||
elif args.cmd == "export-swap":
|
||||
cmd_export_swap(
|
||||
args.cache, args.refine_manifest, args.raw_manifest, args.out_dir,
|
||||
args.top_n, args.outlier_threshold, args.pad_ratio, args.out_size,
|
||||
args.candidates, args.candidate_match_threshold, args.candidate_min_score,
|
||||
args.min_face_short,
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
Reference in New Issue
Block a user