# CLIP zero-shot occlusion filter (masks + sunglasses) _Run date: 2026-04-27. Driver scripts: `work/filter_occlusions.py`, `work/clip_worker.py`._ ## 1. Why `facesets_swap_ready/` ended the Immich import day with 311 substantive facesets and a long tail of identities whose clusters had latched onto *eyewear or mask appearance* instead of identity (covid-era shots, vacation photos with sunglasses dominating the frame). Two failure modes: 1. **Pollution of averaged identity** — roop's `FaceSet.AverageEmbeddings()` averages every face in the .fsz. A faceset where 40 % of images are sunglassed gives a biased centroid; the swap reproduces sunglass-shaped eye sockets. 2. **Whole-cluster identity drift** — clustering at the embedding level sometimes anchors on the eyewear silhouette rather than the face, producing clusters of "the same sunglasses across multiple people". A targeted attribute scorer was the cleanest fix. ## 2. Model + prompts **Model**: `open_clip` `ViT-L-14` / `dfn2b_s39b` (Apple Data Filtering Networks). Best public zero-shot at this size. Loads weights from HF Hub (~890 MB). Bit-identical scores between WSL CPU and Windows DML. **Prompt design**: per-attribute ensembles of 5–6 positive + 5–6 negative prompts. Positive ensembles are mean-pooled and L2-normalized before softmax. **Critical bug if forgotten**: CLIP cosine similarities are tiny (0.2–0.3 range). Raw `softmax([sim_pos, sim_neg])` collapses to ~0.5/0.5 on every image. **Multiply by `model.logit_scale.exp()` (~100) before softmax.** Without that scale the entire scorer outputs a uniform 0.5. **Sunglasses prompt pitfall**: the first set caught faces with sunglasses *pushed up on the forehead* with the same probability as faces with sunglasses *covering the eyes* — CLIP detects "presence of sunglasses in frame", not "eyes occluded". Fixed by putting the false positive into the *negative* class explicitly: ``` positive: "a face with dark sunglasses covering the eyes" "a portrait with the eyes hidden behind opaque sunglasses" ... negative: "a face with sunglasses pushed up on the forehead, eyes visible below" "a face with sunglasses resting on top of the head, eyes visible" "a face wearing clear prescription eyeglasses with visible eyes" ... ``` Validation pair (faceset_005): sunglasses-on-eyes → 0.91, sunglasses-on-forehead → 0.39. Threshold 0.7 cleanly separates. ## 3. Architecture ``` ┌─────────────────────────────────────────────┐ │ WSL /opt/face-sets/work/filter_occlusions.py │ │ • stage: walk facesets/, write queue.json │ │ • merge: ingest worker results │ │ • report: HTML contact sheet │ │ • apply: prune + quarantine + re-zip │ └────────────┬────────────────────────────────┘ │ queue.json (paths) via \\wsl.localhost\ ▼ ┌─────────────────────────────────────────────┐ │ Windows C:\clip_dml_venv\ │ │ /opt/face-sets/work/clip_worker.py │ │ Python 3.12 + torch 2.4.1 CPU │ │ + torch-directml 0.2.5 + open_clip_torch │ │ Reads PNGs from native E:\, writes scores │ └─────────────────────────────────────────────┘ ``` A separate Windows venv (not the existing `C:\face_embed_venv\`) is needed because `torch-directml` brings ~1.5 GB of wheels and version-pinned numpy/pillow that risk breaking the embed_worker venv's `onnxruntime-directml` + `insightface` stack. ## 4. DML throughput surprise Measured on AMD Radeon RX Vega: | input | model | throughput | speedup vs WSL CPU | |------|-------|-----------:|-------------------:| | ViT-L-14 (CLIP, this filter) | open_clip | **1.43 img/s** | **2.4×** | | buffalo_l (insightface, embed_worker) | onnxruntime | 2.6 img/s | 7.5× | Only 2.4× because `aten::_native_multi_head_attention` is not implemented in the directml plugin and falls back to CPU. The vision encoder runs on GPU, attention runs on CPU per layer, both alternating. A silenced UserWarning makes this near-invisible. Workable for a one-shot 73-min corpus run, but the embed_worker pattern (pure ONNX) remains the gold standard for DML. ## 5. Thresholds (validated 2026-04-27 on 6,318 PNGs) | level | threshold | semantics | |-------|----------:|-----------| | image | P(positive) ≥ 0.7 | drop the PNG | | faceset | ≥ 40 % of images flagged for either attr | quarantine whole faceset to `_masked/` | | min-survivors | < 5 surviving AND something pruned | quarantine to `_thin/` | The `AND something pruned` guard is essential — without it, naturally-small facesets (hand-sorted with ≤4 PNGs) get incorrectly quarantined for being small even when they have zero occlusions. ## 6. Run results | action | count | net effect | |--------|------:|------------| | keep | 209 | unchanged | | prune | 46 | 183 PNGs dropped within survivors | | quarantine_masked | 51 | whole faceset → `_masked/` (11 mask-driven, 40 sunglasses-driven) | | quarantine_thin | 3 | survivors < 5 → `_thin/` | Net: 311 active → 255 active after the filter run. 763 PNGs quarantined whole-faceset, 183 pruned within survivors. All dropped PNGs preserved at `/faces/_dropped/` for reversibility. Master manifest gained a `masked[]` array parallel to `thin_eras[]`, plus an `occlusion_filter_run` provenance block. ## 7. Known limitations - **Per-faceset manifests are NOT updated by `apply`** — only the master manifest is. Each faceset's own `/manifest.json` retains stale `faces[]` entries pointing at PNGs that moved into `_dropped/`. Harmless for `.fsz` consumers (the .fsz is re-zipped from current disk state) but downstream tools reading `faces[]` will see broken references. Discovered later by `age_extend_001.py`'s rebuild loop, which generated 42 missing-PNG warnings before being caught. ## 8. Re-running ```bash # 1. Stage queue from current corpus state python work/filter_occlusions.py stage --out work/clip_dml/queue.json # 2. Score on Windows DML (resumable) "/mnt/c/clip_dml_venv/Scripts/python.exe" work/clip_worker.py \ work/clip_dml/queue.json work/clip_dml/scores.json --batch 8 # 3. Reshape into per-faceset format, then HTML for visual approval python work/filter_occlusions.py merge \ --scores work/clip_dml/scores.json --out work/occlusion_scores.json python work/filter_occlusions.py report \ --scores work/occlusion_scores.json --out work/occlusion_review # 4. Apply (always dry-run first) python work/filter_occlusions.py apply \ --scores work/occlusion_scores.json --out-plan work/occlusion_apply_plan.json --dry-run python work/filter_occlusions.py apply \ --scores work/occlusion_scores.json --out-plan work/occlusion_apply_plan.json ```