Add Immich import pipeline (WSL stage + Windows DML embed + cluster)

Three-piece workflow that imports a self-hosted Immich library and emits new facesets without disturbing existing identity numbering: - work/immich_stage.py (WSL): pages /search/metadata, parallel-fetches /faces?id= per asset, prefilters by face_short>=90 against bbox scaled to original-image coords, downloads originals, sha256-dedups against nl_full.npz and same-run staged files. 8-worker ThreadPoolExecutor doing the full /faces->filter->/original chain per asset; resumable via state.json. API URL + key come from IMMICH_URL / IMMICH_API_KEY env vars, label->UUID map from work/immich/users.json (gitignored). - work/embed_worker.py (Windows venv at C:\face_embed_venv): runs insightface.FaceAnalysis(buffalo_l) with the DmlExecutionProvider on AMD Radeon Vega via onnxruntime-directml. Produces a cache file in the same .npz schema as sort_faces.cmd_embed (loadable via load_cache). ~7.5x speedup over CPU end-to-end; embeddings bit- identical to CPU (cosine similarity 1.0000 across 8 sample faces). - work/cluster_immich.py (WSL): mirrors cluster_osrc.py against an immich_<user>.npz. Builds existing identity centroids from canonical faceset_NNN/ in facesets_swap_ready/, drops matches at <=0.45, clusters the rest at 0.55, applies refine gates, hands off to cmd_export_swap. Numbers new facesets past the existing maximum. - work/finalize_immich.sh: chains queue->Windows embed->cache copy-> cluster_immich, with logging. The 2026-04-26 run on https://fotos.computerliebe.org (Immich v2.7.2) processed 53,842 admin-accessible assets, staged 10,261, embedded 19,462 face records on Vega DML in 64.6 min, matched 8,103 (42%) to existing identities, and emitted 185 new facesets (faceset_026..264 with gaps). facesets_swap_ready/ went from 31 to 216 substantive facesets. Important caveat surfaced: /search/metadata's userIds filter is silently ignored when the API key is bound to a different user, so this run can't enumerate other users' libraries from the admin key. A per-user API key would be required for nic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 18:14:26 +02:00
parent 7ecbfae981
commit 321fed01cc
6 changed files with 1340 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -204,6 +204,77 @@ existing identities), this produced 6 new facesets (`faceset_020..025`,
 sizes 4–26 exported PNGs; the 7th candidate cluster lost all 6 faces to
 export-swap's tighter `min_face_short=100` gate).

+### Importing identities from a self-hosted Immich library
+
+`work/immich_stage.py` + `work/embed_worker.py` + `work/cluster_immich.py`
+together import an Immich library at scale, with the embed step running on
+a Windows AMD GPU via DirectML and everything else on WSL. Three pieces:
+
+1. **`work/immich_stage.py` (WSL)** — pages every IMAGE asset via
+   `/search/metadata`, fetches each asset's `/faces?id=` to read Immich's
+   own ML-driven bboxes, scales each bbox to original-image coordinates,
+   and prefilters by `face_short ≥ 90`. For survivors it downloads the
+   original, sha256-deduplicates against the canonical `nl_full.npz` and
+   against same-run staged files, and saves to
+   `/mnt/x/src/immich/<user>/<rel>`. Writes a `queue.json` that the embed
+   worker consumes. 8 concurrent worker threads run the full per-asset
+   I/O chain (`/faces` → filter → `/original`) so 8 workers ≈ 8× the
+   serial throughput.
+2. **`work/embed_worker.py` (Windows venv at `C:\face_embed_venv\`)** —
+   loads `insightface.FaceAnalysis(buffalo_l)` with the
+   `DmlExecutionProvider` and runs detection + landmarks + recognition
+   over the queue. Produces a `.npz` cache that's bit-identical in
+   schema to what `sort_faces.py:cmd_embed` writes, so the result is
+   directly loadable by `load_cache()`. The cache already includes the
+   post-`enrich` fields (`landmark_2d_106`, `landmark_3d_68`, `pose`)
+   because FaceAnalysis returns them for free. AMD Vega gives ~7.5×
+   real-pipeline speedup over CPU.
+3. **`work/cluster_immich.py` (WSL)** — mirrors `cluster_osrc.py`'s
+   shape but reads from `immich_<user>.npz`. Builds existing-identity
+   centroids from every canonical `faceset_NNN/` in
+   `facesets_swap_ready/` (skipping era splits and `_thin/`), drops
+   immich faces matching at cos-dist ≤ 0.45, clusters the rest at 0.55,
+   applies refine gates, numbers new facesets past the existing maximum,
+   and feeds `cmd_export_swap` via a synthetic manifest.
+
+`work/finalize_immich.sh <user>` chains queue → Windows embed → cache
+copy back → cluster_immich, with logging.
+
+The Immich admin API key + base URL come from environment variables:
+
+```bash
+export IMMICH_URL=https://your-immich.example.com
+export IMMICH_API_KEY=...                # admin or per-user key
+python work/immich_stage.py --user peter --workers 8
+bash   work/finalize_immich.sh peter
+```
+
+For the 2026-04-26 run against `https://fotos.computerliebe.org` (Immich
+v2.7.2), with the admin API key:
+
+| step | result |
+|------|------|
+| stage | 53,842 assets seen, **10,261 staged** (~10 GB), 978 byte-deduped against `nl_full.npz`, 2,976 internal byte-duplicates, 39K skipped no-face / no-big-face |
+| Windows DML embed | 19,462 face records + 1 noface in **64.6 min** (2.6 img/s end-to-end) |
+| matched existing identities | **8,103 of 19,480 (42%)** at cos-dist ≤ 0.45; biggest hits faceset_002 (+2,666), faceset_001 (+1,856), faceset_003 (+670) |
+| new clusters | 2,534 at threshold 0.55 → 239 surviving refine gates → **185 emitted** as `faceset_026..264` (gaps where export-swap's tighter outlier filter dropped clusters below the export quality bar) |
+
+**Important caveats for Immich v2.7.2**:
+- The `userIds` filter on `/search/metadata` is **silently ignored** when
+  the API key is bound to a different user. The "import everything the
+  API key can see" semantics are what you actually get; cross-user
+  isolation is enforced server-side.
+- `/server/statistics` reports counts that under-count what
+  `/search/metadata` actually returns (e.g. external library
+  thumbnail-dirs that got indexed because the import path included them).
+  Don't trust the statistics number as a denominator.
+- A meaningful fraction of `originalPath`-based assets are *Immich's own
+  thumbnails* (`<library_root>/thumbs/.../-preview.jpeg`) — included if
+  the external library's import path covers the thumbs directory and the
+  exclusion patterns don't list `**/thumbs/**`. For our run, 5,563 of
+  10,261 staged were thumbnails. They embed and cluster fine but the
+  resulting faces are lower-resolution.
+
 ## Key defaults

 `refine`:
@@ -248,15 +319,22 @@ Highly recommended at swap time: enable **Select post-processing = GFPGAN** with
 ├─ docs/
 │  └─ analysis/
 │     └─ facesets-downstream-refinement-evaluation.md
-└─ work/                                         (gitignored except force-tracked .py)
+└─ work/                                         (gitignored except force-tracked .py / .sh)
   ├─ build_folders.py                           (hand-sorted-folder orchestration)
   ├─ check_faceset001_age.py                    (age-split readiness probe)
   ├─ age_split_001.py                           (age-split orchestration; faceset_001)
   ├─ cluster_osrc.py                            (mixed-bucket identity discovery)
-   ├─ synthetic_refine_manifest.json             (last build_folders.py output)
-   ├─ synthetic_osrc_manifest.json               (last cluster_osrc.py output)
+   ├─ immich_stage.py                            (Immich library staging, parallel)
+   ├─ embed_worker.py                            (Windows DML embed worker, runs from C:\face_embed_venv\)
+   ├─ cluster_immich.py                          (Immich identity discovery + export)
+   ├─ finalize_immich.sh                         (chains queue → embed → cluster)
+   ├─ synthetic_*_manifest.json                  (per-run synthetic refine manifests)
+   ├─ immich/
+   │  ├─ users.json                              (label -> userId map; gitignored)
+   │  └─ <user>/{queue,state,aliases}.json       (per-user staging artifacts)
   ├─ cache/
   │  ├─ nl_full.npz                             (canonical cache + duplicates.json)
+   │  ├─ immich_<user>.npz                       (per-user immich embeddings)
   │  └─ age_split_exif.json                     (path → EXIF-year cache)
   └─ logs/
      └─ *.log                                   (every long step writes here)
--- a/docs/analysis/immich-import-pipeline.md
+++ b/docs/analysis/immich-import-pipeline.md
@@ -0,0 +1,216 @@
+# Importing identities from a self-hosted Immich library
+
+_Run date: 2026-04-26. Target: Immich v2.7.2 at `https://fotos.computerliebe.org`.
+Driver scripts: `work/immich_stage.py`, `work/embed_worker.py`,
+`work/cluster_immich.py`, `work/finalize_immich.sh`._
+
+## 1. Why a split workflow
+
+InsightFace `buffalo_l` on the WSL CPU runs the full detection + landmarks +
+recognition stack at ~3–4 faces/second. Re-detecting all 79K Immich photos
+would have taken ~10–28 days. The available AMD Radeon RX Vega is unusable
+under WSL (no `/dev/dri/`, no ROCm), but **DirectML on Windows native**
+runs the same models bit-identically and ~7.5× faster end-to-end. The
+pipeline therefore splits:
+
+- **WSL side** (`/opt/face-sets/`) — orchestration: API listing, download,
+  sha256 dedup, file management, clustering, faceset emission.
+- **Windows side** (`C:\face_embed_venv\`) — the embed step only. A fresh
+  Python 3.12 (installed via `winget install Python.Python.3.12`) with
+  `numpy`, `pillow`, `opencv-python-headless`, `onnxruntime-directml`,
+  `insightface`. Models copied from `/home/peter/.insightface/models/buffalo_l/`
+  to `C:\face_embed_venv\models\buffalo_l\`.
+
+A 30-iteration synthetic benchmark on Vega:
+
+| model       | DML | CPU | speedup |
+|-------------|----:|----:|--------:|
+| `det_10g.onnx` (640×640) | 10.0 ms | 183.5 ms | 18.4× |
+| `w600k_r50.onnx` (112×112) | 8.2 ms | 90.5 ms | 11.0× |
+
+End-to-end FaceAnalysis on 5 real Immich-sourced images (excluding the
+first-call DML JIT warmup): ~7.5× speedup post-warmup. Per-face cosine
+similarity DML vs CPU was 1.0000 across all 8 detected faces — DML is
+bit-identical to CPU for arcface inference.
+
+## 2. Architecture
+
+```
+   ┌─────────────────────────────────────────────┐
+   │ WSL  /opt/face-sets/work/immich_stage.py    │
+   │ ┌──────────────────────────────────────────┐│
+   │ │ ThreadPoolExecutor.map(_fetch_for_asset, ││
+   │ │   list_assets(user))                     ││
+   │ │  ─ /faces?id=    (Immich, parallel x8)   ││
+   │ │  ─ filter face_short >= 90               ││
+   │ │  ─ /assets/.../original (parallel x8)    ││
+   │ └──────────────────────────────────────────┘│
+   │  consumer (main thread):                    │
+   │   sha256 → dedup vs nl_full.npz             │
+   │   save to /mnt/x/src/immich/<user>/<rel>/   │
+   │   append to queue.json                      │
+   └────────────────┬────────────────────────────┘
+                    │
+                    ▼ queue.json (with WSL + Windows paths)
+   ┌─────────────────────────────────────────────┐
+   │ Windows embed_worker.py (C:\face_embed_venv) │
+   │  insightface.FaceAnalysis(                  │
+   │    providers=[DmlExecutionProvider, ...])   │
+   │  per image: detection + landmarks + arcface │
+   │  emit cache in sort_faces.py:cmd_embed      │
+   │  schema with embeddings + meta + processed  │
+   │  + path_aliases + schema=v2                 │
+   └────────────────┬────────────────────────────┘
+                    │
+                    ▼ immich_<user>.npz
+   ┌─────────────────────────────────────────────┐
+   │ WSL cluster_immich.py                       │
+   │   build centroids of canonical              │
+   │     faceset_NNN/ in facesets_swap_ready/    │
+   │   drop matches at cos-dist <= 0.45          │
+   │   cluster the rest at 0.55                  │
+   │   refine gates -> synthetic refine_manifest │
+   │   cmd_export_swap -> facesets_swap_ready/   │
+   │   merge top-level manifest                  │
+   └─────────────────────────────────────────────┘
+```
+
+Cache artifacts stay separate (per the architecture choice on this run):
+each user's results live in their own `immich_<user>.npz`. A future
+one-shot merge can fold them into `nl_full.npz` if needed; the existing
+`extend` command would do the right thing once schemas align.
+
+## 3. Path mapping
+
+`/mnt/x/` ↔ `X:\`. Cache stores WSL form (matching `nl_full.npz`'s
+existing convention). `wsl_to_win()` translates for the embed worker
+which runs natively on Windows.
+
+`work/cluster_immich.py` always uses the canonical `facesets_swap_ready/`
+view to build identity centroids — meaning the comparison is against the
+*current* set of canonical facesets in the swap-ready directory (skipping
+era splits and `_thin/`), not against the older `facesets_full/` snapshot.
+
+## 4. Result of the 2026-04-26 run (peter / admin)
+
+### 4a. Stage
+
+```
+total_assets_seen:     53842
+staged_count:          10261       (~10 GB on /mnt/x/)
+deduped_against_existing:  978     (sha256 in nl_full.npz already)
+deduped_against_staged:   2976     (internal byte-dupes inside Immich)
+skipped_no_big_face:     9539      (Immich detected only sub-90px faces)
+skipped_no_faces:       29390      (Immich detected zero faces)
+skipped_download_error:   698      (transient DNS / TLS, not seen-marked)
+elapsed:                ~70 min    (6.4 assets/s end-to-end at 8 workers)
+```
+
+The 698 transient errors are recoverable on a re-run because
+`immich_stage.py` does not add them to the `seen` set. Each transient
+asset would be retried.
+
+### 4b. Embed (Windows DML)
+
+```
+queue:                  10261 entries
+new face records:       19462
+new noface records:         1
+load errors:              125    (likely HEIC / unreadable)
+elapsed:                3878.0s  (64.6 min, 2.6 img/s end-to-end)
+```
+
+The 2.6 img/s end-to-end includes CIFS-share image load, image decode,
+DML inference (~50 ms/face), and JSON / NPZ flushing. Pure DML inference
+is faster; the rest of the pipeline dominates at scale.
+
+### 4c. Cluster
+
+```
+existing canonical centroids: 25
+faces already covered (cos-dist <= 0.45): 8103/19480  (42%)
+  faceset_001:  1856
+  faceset_002:  2666
+  faceset_003:   670
+  faceset_004:    48
+  faceset_005:    40
+  ... (smaller hits to the remaining 20)
+unmatched faces to cluster:  11377
+clusters at threshold 0.55:   2534  (top sizes [469, 444, 342, 338, 262, ...])
+survived refine gates:         239
+emitted as new facesets:       185  (54 dropped by export-swap's 0.45 outlier)
+```
+
+Top-level `facesets_swap_ready/manifest.json` after this run: **216
+facesets** (up from 31; ~7× growth) + 68 thin_eras under `_thin/`.
+
+## 5. Surprises and caveats
+
+### 5a. `/search/metadata`'s `userIds` filter is silently ignored (Immich v2.7.2)
+
+When the admin API key is used, passing `userIds=[<other-user-uuid>]`
+returns admin's own assets, not the other user's. The filter is
+silently dropped. Verified by sampling 200 returned items and
+confirming `ownerId` was admin for all of them.
+
+To process another user's library, **a separate API key issued by that
+user is required** — the admin key cannot enumerate cross-user
+libraries through any documented endpoint we tried. `/timeline/buckets`
+with a `userId` query parameter returns
+`Not found or no timeline.read access`.
+
+### 5b. `/server/statistics` undercounts what the search returns
+
+`/server/statistics` reported admin = 53,842 photos. Our
+`/search/metadata` paginated through... **53,842** top-level. So the
+header agrees with the body in this case. But `/server/statistics` does
+NOT count items that live under external libraries' import paths —
+yet `/search/metadata` does include them. For this Immich, two external
+libraries (`/mnt/media/photos` and `/mnt/media/omv_photos`) are
+configured but `/libraries` reports `assetCount=0` for both. Yet 80% of
+our staged paths come from those library import paths. Don't trust
+statistics-vs-search consistency.
+
+### 5c. Indexed Immich thumbnails masquerading as assets
+
+5,563 of our 10,261 staged paths are `<library>/thumbs/.../-preview.jpeg`
+— Immich's own internally-generated thumbnails got indexed because the
+external library import path included the thumbs subdirectory and the
+exclusion patterns didn't list `**/thumbs/**`. They embed and cluster
+fine but produce lower-resolution face records. The fix on the Immich
+side is adding `**/thumbs/**` to the exclusion patterns.
+
+### 5d. Internal byte-duplicates (2,976)
+
+Many Immich assets are byte-identical to other Immich assets — typically
+because the same photo was uploaded both from a phone and from a
+synced cloud folder. sha256 dedup catches all of these on the second
+download (we still pay the bandwidth, but skip the disk write and
+embed work). With Immich v2.7.2's own `assets/duplicates` endpoint we
+could catch this earlier, but it's not currently used.
+
+## 6. Re-running and applying to other Immich instances
+
+```bash
+export IMMICH_URL=https://your-immich.example.com
+export IMMICH_API_KEY=...           # admin or per-user key
+
+# Optional: populate work/immich/users.json with label -> UUID map.
+
+# 1. Stage (parallel /faces + downloads, resumable).
+python work/immich_stage.py --user peter --workers 8
+
+# 2. End-to-end finalize: copy queue to /mnt/c/, run Windows embed worker,
+#    copy the cache back, run cluster_immich.py.
+bash work/finalize_immich.sh peter
+```
+
+For a different Immich instance, the only configuration is the env vars
+and the `users.json` sidecar. `cluster_immich.py`'s tunables (matching
+threshold, clustering threshold, refine gates, MIN_FACES) are at the
+top of the script.
+
+To process a *second* user's library, issue a per-user API key in the
+Immich admin UI for that user, set `IMMICH_API_KEY` to that key, and
+re-run with their `--user <label>`. The admin key cannot impersonate
+other users via the search API.
--- a/work/cluster_immich.py
+++ b/work/cluster_immich.py
@@ -0,0 +1,340 @@
+#!/usr/bin/env python3
+"""Discover new identities in an Immich-sourced cache and emit them as facesets.
+
+Mirrors `work/cluster_osrc.py`, but the source corpus is an arbitrary
+Immich user's `immich_<user>.npz` cache produced by the Windows DML embed
+worker. Existing identity centroids come from the union of every faceset
+already in `facesets_swap_ready/` (faceset_001..NNN, both auto-clustered
+and hand-sorted).
+
+Pipeline:
+ 1. Load immich_<user>.npz; restrict to face records (drop noface).
+ 2. Build centroids of every existing canonical faceset in
+    facesets_swap_ready/ (skip era splits and _thin/).
+ 3. Drop immich faces whose nearest existing centroid is within
+    EXISTING_MATCH_THRESHOLD; those are already covered by the canonical set.
+ 4. Cluster the remaining among themselves at INITIAL_THRESHOLD.
+ 5. Per cluster: refine-equivalent gates (face_short, blur, det_score),
+    plus outlier rejection at OUTLIER_THRESHOLD for clusters of size >= 4.
+ 6. Keep clusters whose surviving unique source-path count is >= MIN_FACES.
+ 7. Number kept clusters past the existing facesets_swap_ready/ max.
+ 8. Synthesize a refine_manifest, hand off to cmd_export_swap, move dirs into
+    facesets_swap_ready/, drop a provenance marker, append to top-level
+    manifest.json (preserving facesets / thin_eras).
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import shutil
+import sys
+from pathlib import Path
+
+import numpy as np
+
+REPO = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(REPO))
+
+from sort_faces import (  # noqa: E402
+    _cluster_embeddings,
+    cmd_export_swap,
+    load_cache,
+)
+
+# ---- config -------------------------------------------------------------- #
+
+REPO_WORK = REPO / "work"
+SWAP_READY = Path("/mnt/e/temp_things/fcswp/nl_sorted/facesets_swap_ready")
+
+EXISTING_MATCH_THRESHOLD = 0.45
+INITIAL_THRESHOLD = 0.55
+
+MIN_FACES = 6
+MIN_SHORT = 90
+MIN_BLUR = 40.0
+MIN_DET_SCORE = 0.6
+OUTLIER_THRESHOLD = 0.55
+
+TOP_N = 30
+EXPORT_OUTLIER_THRESHOLD = 0.45
+PAD_RATIO = 0.5
+OUT_SIZE = 512
+EXPORT_MIN_FACE_SHORT = 100
+
+
+# ---- helpers ------------------------------------------------------------- #
+
+def _normalize(v: np.ndarray) -> np.ndarray:
+    n = np.linalg.norm(v)
+    return v / n if n > 0 else v
+
+
+def _existing_identity_centroids(
+    nl_cache: Path,
+) -> tuple[np.ndarray, list[str]]:
+    """Build identity centroids from every canonical faceset_NNN/ in
+    facesets_swap_ready/. Era-split sub-dirs (faceset_001_<era>) and the
+    _thin/ quarantine are skipped. Each faceset's manifest.json provides
+    (source, bbox) keys we use to look up rows in nl_full.npz."""
+    emb, meta, _src, _proc, _aliases = load_cache(nl_cache)
+    face_records = [m for m in meta if not m.get("noface")]
+    if len(face_records) != len(emb):
+        raise SystemExit(f"meta/embedding mismatch in {nl_cache}: {len(face_records)} vs {len(emb)}")
+    bbox_idx = {(m["path"], tuple(m.get("bbox") or ())): i for i, m in enumerate(face_records)}
+
+    centroids: list[np.ndarray] = []
+    names: list[str] = []
+    for d in sorted(SWAP_READY.iterdir()):
+        if not d.is_dir():
+            continue
+        if d.name.startswith("_"):
+            continue
+        # Skip era-split sub-facesets (faceset_NNN_*).
+        if d.name.startswith("faceset_") and "_" in d.name[len("faceset_"):]:
+            continue
+        man = d / "manifest.json"
+        if not man.exists():
+            continue
+        try:
+            entries = json.loads(man.read_text()).get("faces", [])
+        except Exception:
+            continue
+        keys = [(f["source"], tuple(f.get("bbox") or ())) for f in entries]
+        idxs = [bbox_idx[k] for k in keys if k in bbox_idx]
+        if not idxs:
+            continue
+        centroids.append(_normalize(emb[idxs].mean(axis=0)))
+        names.append(d.name)
+    if not centroids:
+        raise SystemExit("no canonical identity centroids could be built; check facesets_swap_ready/")
+    return np.stack(centroids), names
+
+
+def _next_faceset_number() -> int:
+    nums = []
+    for d in SWAP_READY.iterdir():
+        if not d.is_dir() or not d.name.startswith("faceset_"):
+            continue
+        tail = d.name[len("faceset_"):]
+        # Take only top-level numbered facesets (no era suffix).
+        if "_" in tail:
+            continue
+        try:
+            nums.append(int(tail))
+        except ValueError:
+            continue
+    return (max(nums) + 1) if nums else 1
+
+
+# ---- phase 1: discover --------------------------------------------------- #
+
+def discover_new_clusters(
+    immich_cache: Path, nl_cache: Path, start_nnn: int, source_label: str
+) -> tuple[dict, list[dict]]:
+    print(f"loading immich cache: {immich_cache}")
+    emb, meta, _src, _proc, _aliases = load_cache(immich_cache)
+    face_records = [m for m in meta if not m.get("noface")]
+    if len(face_records) != len(emb):
+        raise SystemExit(f"meta/embedding mismatch: {len(face_records)} vs {len(emb)}")
+    print(f"  {len(face_records)} face records, {sum(1 for m in meta if m.get('noface'))} noface")
+
+    print(f"building existing-identity centroids from {SWAP_READY}")
+    cents, cent_names = _existing_identity_centroids(nl_cache)
+    print(f"  {len(cent_names)} canonical centroids")
+
+    sims = emb @ cents.T
+    nearest_d = 1.0 - sims.max(axis=1)
+    nearest_id = sims.argmax(axis=1)
+    covered = nearest_d <= EXISTING_MATCH_THRESHOLD
+    print(f"\nfaces already covered (cos-dist <= {EXISTING_MATCH_THRESHOLD}): "
+          f"{int(covered.sum())}/{len(emb)}")
+    for j, name in enumerate(cent_names):
+        c = int(((nearest_id == j) & covered).sum())
+        if c:
+            print(f"  -> {name}: {c}")
+
+    new_idx = [i for i in range(len(emb)) if not covered[i]]
+    print(f"\nunmatched immich faces to cluster: {len(new_idx)}")
+    if len(new_idx) <= 1:
+        labels = np.zeros(len(new_idx), dtype=int)
+    else:
+        labels = _cluster_embeddings(emb[new_idx], INITIAL_THRESHOLD)
+    n_clusters = len(set(int(l) for l in labels))
+    sizes = sorted([int((labels == l).sum()) for l in set(labels)], reverse=True)
+    print(f"clusters at threshold {INITIAL_THRESHOLD}: {n_clusters}  "
+          f"top sizes: {sizes[:10]}")
+
+    clusters: dict[int, list[int]] = {}
+    for k, lab in enumerate(labels):
+        clusters.setdefault(int(lab), []).append(new_idx[k])
+
+    kept: list[dict] = []
+    drop_quality_total = 0
+    drop_outlier_total = 0
+    for cid, idxs in clusters.items():
+        good: list[int] = []
+        for i in idxs:
+            r = face_records[i]
+            if r.get("face_short", 0) < MIN_SHORT:
+                drop_quality_total += 1; continue
+            if r.get("blur", 0.0) < MIN_BLUR:
+                drop_quality_total += 1; continue
+            if r.get("det_score", 0.0) < MIN_DET_SCORE:
+                drop_quality_total += 1; continue
+            good.append(i)
+        if not good:
+            continue
+        if len(good) >= 4:
+            cent = _normalize(emb[good].mean(axis=0))
+            d = 1.0 - emb[good] @ cent
+            tight = [good[k] for k, dist in enumerate(d) if dist <= OUTLIER_THRESHOLD]
+            drop_outlier_total += len(good) - len(tight)
+            good = tight
+        if not good:
+            continue
+        unique_paths = sorted({face_records[i]["path"] for i in good})
+        if len(unique_paths) < MIN_FACES:
+            continue
+        kept.append({
+            "indices": good,
+            "unique_paths": unique_paths,
+            "size_face": len(good),
+            "size_paths": len(unique_paths),
+        })
+
+    kept.sort(key=lambda c: -c["size_paths"])
+    print(f"\nafter quality+outlier+min_faces: {len(kept)} clusters kept "
+          f"(dropped: quality={drop_quality_total} outlier={drop_outlier_total})")
+    for rank, c in enumerate(kept, start=start_nnn):
+        print(f"  faceset_{rank:03d}: faces={c['size_face']:3d} "
+              f"unique_paths={c['size_paths']:3d}")
+
+    facesets = [
+        {
+            "name": f"faceset_{rank:03d}",
+            "image_count": c["size_paths"],
+            "face_count": c["size_face"],
+            "images": c["unique_paths"],
+        }
+        for rank, c in enumerate(kept, start=start_nnn)
+    ]
+    manifest = {
+        "params": {
+            "existing_match_threshold": EXISTING_MATCH_THRESHOLD,
+            "initial_threshold": INITIAL_THRESHOLD,
+            "outlier_threshold": OUTLIER_THRESHOLD,
+            "min_faces": MIN_FACES,
+            "min_short": MIN_SHORT,
+            "min_blur": MIN_BLUR,
+            "min_det_score": MIN_DET_SCORE,
+            "source_label": source_label,
+            "source_cache": str(immich_cache),
+        },
+        "facesets": facesets,
+    }
+    return manifest, kept
+
+
+# ---- phase 2: export + relocate ----------------------------------------- #
+
+def export_and_relocate(manifest: dict, immich_cache: Path, source_label: str) -> None:
+    synth_path = REPO_WORK / f"synthetic_{source_label}_manifest.json"
+    synth_path.write_text(json.dumps(manifest, indent=2))
+    print(f"\nsynthetic manifest -> {synth_path}")
+
+    out_tmp = SWAP_READY.parent / f"facesets_swap_ready_{source_label}_new"
+    if out_tmp.exists():
+        shutil.rmtree(out_tmp)
+    out_tmp.mkdir(parents=True)
+
+    print(f"running cmd_export_swap -> {out_tmp}")
+    cmd_export_swap(
+        cache_path=immich_cache,
+        refine_manifest_path=synth_path,
+        raw_manifest_path=None,
+        out_dir=out_tmp,
+        top_n=TOP_N,
+        outlier_threshold=EXPORT_OUTLIER_THRESHOLD,
+        pad_ratio=PAD_RATIO,
+        out_size=OUT_SIZE,
+        include_candidates=False,
+        candidate_match_threshold=0.55,
+        candidate_min_score=0.40,
+        min_face_short=EXPORT_MIN_FACE_SHORT,
+    )
+
+    new_top = json.loads((out_tmp / "manifest.json").read_text())
+    new_entries = new_top.get("facesets", [])
+
+    moved = 0
+    for fs_meta in new_entries:
+        name = fs_meta["name"]
+        src_dir = out_tmp / name
+        if not src_dir.exists():
+            print(f"[{name}] export dir missing; skipping")
+            continue
+        dst_dir = SWAP_READY / name
+        if dst_dir.exists():
+            print(f"[{name}] {dst_dir} already exists; refusing to overwrite")
+            continue
+        (src_dir / f"immich_{source_label}.txt").write_text(
+            f"{name}\n\nSource: Immich user {source_label} cluster (auto-discovered).\n"
+        )
+        shutil.move(str(src_dir), str(dst_dir))
+        moved += 1
+        print(f"[{name}] -> {dst_dir}")
+
+    final_manifest_path = SWAP_READY / "manifest.json"
+    if final_manifest_path.exists():
+        existing = json.loads(final_manifest_path.read_text())
+    else:
+        existing = {"facesets": []}
+    existing.setdefault("facesets", [])
+    existing_names = {fs["name"] for fs in existing["facesets"]}
+    appended = 0
+    for entry in new_entries:
+        if entry["name"] in existing_names:
+            print(f"[manifest] {entry['name']} already present; not duplicating")
+            continue
+        existing["facesets"].append(entry)
+        appended += 1
+    final_manifest_path.write_text(json.dumps(existing, indent=2))
+    print(f"\nmerged manifest: appended {appended} entries -> {final_manifest_path}")
+    print(f"moved {moved} faceset directories into {SWAP_READY}")
+    if out_tmp.exists() and not list(out_tmp.iterdir()):
+        out_tmp.rmdir()
+
+
+# ---- main ---------------------------------------------------------------- #
+
+def main() -> None:
+    p = argparse.ArgumentParser()
+    p.add_argument("immich_cache", type=Path,
+                   help="path to immich_<user>.npz produced by the embed worker")
+    p.add_argument("--nl-cache", type=Path, default=REPO_WORK / "cache" / "nl_full.npz",
+                   help="canonical cache for existing identity centroids")
+    p.add_argument("--source-label", default=None,
+                   help="short label used in marker filenames; default = stem of immich_cache")
+    p.add_argument("--start-nnn", type=int, default=None,
+                   help="first faceset number to assign; default = current max+1 in facesets_swap_ready/")
+    p.add_argument("--dry-run", action="store_true")
+    args = p.parse_args()
+
+    label = args.source_label or args.immich_cache.stem.removeprefix("immich_") or args.immich_cache.stem
+    start_nnn = args.start_nnn if args.start_nnn is not None else _next_faceset_number()
+    print(f"source label: {label!r}; first faceset number: {start_nnn:03d}")
+
+    manifest, kept = discover_new_clusters(args.immich_cache, args.nl_cache, start_nnn, label)
+    if args.dry_run:
+        print("\n--dry-run: stopping after cluster discovery (no exports written).")
+        return
+    if not manifest.get("facesets"):
+        print("no new facesets to build.")
+        return
+    export_and_relocate(manifest, args.immich_cache, label)
+    print("\nDone.")
+
+
+if __name__ == "__main__":
+    main()
--- a/work/embed_worker.py
+++ b/work/embed_worker.py
@@ -0,0 +1,244 @@
+"""Windows / DirectML embed worker.
+
+Reads a queue.json staged by /opt/face-sets/work/immich_stage.py (WSL side),
+runs InsightFace's FaceAnalysis on each image with the DmlExecutionProvider
+backed by the AMD Vega, and writes a cache file in the schema produced by
+sort_faces.py:cmd_embed (so it can be merged into nl_full.npz).
+
+CLI:
+    py -3.12 embed_worker.py <queue.json> <out_cache.npz> [--limit N]
+
+The queue.json entry shape (each item) is:
+    {
+        "asset_id": "...",
+        "sha256":   "...",
+        "wsl_path": "/mnt/x/src/immich/<user>/<rel>",   # canonical path stored
+        "win_path": "X:\\src\\immich\\<user>\\<rel>",   # what we read from
+        "size_bytes": int,
+        "width": int, "height": int,
+        ...
+    }
+
+Per face record matches cmd_embed's schema:
+    path, face_idx, det_score, bbox, face_short, face_area, blur, noface=False, hash
+plus landmark_2d_106, landmark_3d_68, pose (FaceAnalysis returns these for
+free and the existing cache already carries them after `enrich`).
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import sys
+import time
+from pathlib import Path
+
+import numpy as np
+from PIL import Image, ImageOps
+from insightface.app import FaceAnalysis
+
+MODEL_ROOT = r"C:\face_embed_venv\models"
+MIN_DET_SCORE = 0.5
+MIN_FACE_PIX = 40
+FLUSH_EVERY = 50
+
+
+def load_rgb_bgr(path: Path):
+    try:
+        with Image.open(path) as im:
+            im = ImageOps.exif_transpose(im)
+            im = im.convert("RGB")
+            rgb = np.array(im)
+        bgr = rgb[:, :, ::-1].copy()
+        return rgb, bgr
+    except Exception as e:
+        print(f"[warn] failed to load {path}: {e}", file=sys.stderr)
+        return None, None
+
+
+def laplacian_variance(gray: np.ndarray) -> float:
+    g = gray.astype(np.float32)
+    lap = (
+        -4.0 * g[1:-1, 1:-1]
+        + g[:-2, 1:-1] + g[2:, 1:-1]
+        + g[1:-1, :-2] + g[1:-1, 2:]
+    )
+    return float(lap.var())
+
+
+def save_cache(out_path: Path, emb_chunks: list, meta: list, processed: set, src_root: str):
+    emb = np.concatenate(emb_chunks) if emb_chunks else np.zeros((0, 512), dtype=np.float32)
+    tmp = out_path.with_suffix(".tmp.npz")
+    np.savez(
+        str(tmp),
+        embeddings=emb,
+        meta=json.dumps(meta),
+        src_root=str(src_root),
+        processed_paths=json.dumps(sorted(processed)),
+        path_aliases=json.dumps({}),
+        schema="v2",
+    )
+    os.replace(tmp, out_path)
+
+
+def load_cache_if_exists(out_path: Path):
+    """Resume helper. Returns (emb_chunks, meta, processed_set)."""
+    if not out_path.exists():
+        return [], [], set()
+    data = np.load(out_path, allow_pickle=True)
+    emb = data["embeddings"]
+    meta = json.loads(str(data["meta"]))
+    processed = set(json.loads(str(data["processed_paths"])))
+    return [emb] if len(emb) else [], list(meta), processed
+
+
+def main():
+    p = argparse.ArgumentParser()
+    p.add_argument("queue", type=Path)
+    p.add_argument("out", type=Path)
+    p.add_argument("--limit", type=int, default=None)
+    args = p.parse_args()
+
+    queue = json.loads(args.queue.read_text())
+    print(f"queue: {len(queue)} entries from {args.queue}")
+
+    args.out.parent.mkdir(parents=True, exist_ok=True)
+    emb_chunks, meta, processed = load_cache_if_exists(args.out)
+    n_existing_records = len(meta)
+    n_existing_emb = sum(e.shape[0] for e in emb_chunks)
+    if n_existing_records:
+        print(f"resume: {n_existing_records} existing meta records "
+              f"({n_existing_emb} embeddings, {len(processed)} processed paths)")
+
+    print("initializing FaceAnalysis with DmlExecutionProvider")
+    app = FaceAnalysis(
+        name="buffalo_l",
+        root=MODEL_ROOT,
+        providers=["DmlExecutionProvider", "CPUExecutionProvider"],
+    )
+    app.prepare(ctx_id=0, det_size=(640, 640))
+
+    src_root = "/mnt/x/src/immich"
+
+    n_done = 0
+    n_face_records_added = 0
+    n_noface_added = 0
+    n_skipped = 0
+    n_load_err = 0
+    t0 = time.perf_counter()
+    last_flush = time.perf_counter()
+    new_emb_chunks: list[np.ndarray] = []
+    new_meta: list[dict] = []
+
+    def flush():
+        nonlocal new_emb_chunks, new_meta, last_flush
+        if not new_emb_chunks and not new_meta:
+            return
+        if new_emb_chunks:
+            emb_chunks.append(np.concatenate(new_emb_chunks))
+            new_emb_chunks = []
+        for r in new_meta:
+            meta.append(r)
+        new_meta = []
+        save_cache(args.out, emb_chunks, meta, processed, src_root)
+        last_flush = time.perf_counter()
+
+    for i, entry in enumerate(queue):
+        if args.limit is not None and n_done >= args.limit:
+            break
+        wsl_path = entry["wsl_path"]
+        win_path = entry["win_path"]
+        sha = entry["sha256"]
+
+        if wsl_path in processed:
+            n_skipped += 1
+            continue
+
+        rgb, bgr = load_rgb_bgr(Path(win_path))
+        if bgr is None:
+            new_meta.append({
+                "path": wsl_path, "face_idx": -1, "noface": True,
+                "hash": sha, "error": "load",
+            })
+            processed.add(wsl_path)
+            n_load_err += 1
+            n_done += 1
+            continue
+
+        faces = app.get(bgr)
+        kept_any = False
+        for j, f in enumerate(faces):
+            if float(f.det_score) < MIN_DET_SCORE:
+                continue
+            x1, y1, x2, y2 = [int(round(v)) for v in f.bbox]
+            x1 = max(x1, 0); y1 = max(y1, 0)
+            x2 = min(x2, rgb.shape[1]); y2 = min(y2, rgb.shape[0])
+            w, h = x2 - x1, y2 - y1
+            short = min(w, h)
+            if short < MIN_FACE_PIX:
+                continue
+            crop = rgb[y1:y2, x1:x2]
+            if crop.size == 0:
+                continue
+            gray = crop.mean(axis=2)
+            blur = laplacian_variance(gray) if min(gray.shape) > 3 else 0.0
+
+            emb = f.normed_embedding.astype(np.float32)
+            new_emb_chunks.append(emb[None, :])
+            rec = {
+                "path": wsl_path,
+                "face_idx": j,
+                "det_score": float(f.det_score),
+                "bbox": [x1, y1, x2, y2],
+                "face_short": int(short),
+                "face_area": int(w * h),
+                "blur": blur,
+                "noface": False,
+                "hash": sha,
+            }
+            # Enrichment-equivalent fields (FaceAnalysis returns these for free)
+            if hasattr(f, "landmark_2d_106") and f.landmark_2d_106 is not None:
+                rec["landmark_2d_106"] = f.landmark_2d_106.astype(np.float32).tolist()
+            if hasattr(f, "landmark_3d_68") and f.landmark_3d_68 is not None:
+                rec["landmark_3d_68"] = f.landmark_3d_68.astype(np.float32).tolist()
+            if hasattr(f, "pose") and f.pose is not None:
+                rec["pose"] = [float(x) for x in f.pose]
+            new_meta.append(rec)
+            kept_any = True
+            n_face_records_added += 1
+        if not kept_any:
+            new_meta.append({
+                "path": wsl_path, "face_idx": -1, "noface": True, "hash": sha,
+            })
+            n_noface_added += 1
+
+        processed.add(wsl_path)
+        n_done += 1
+
+        if (n_done % FLUSH_EVERY == 0) or (time.perf_counter() - last_flush) > 30.0:
+            flush()
+            elapsed = time.perf_counter() - t0
+            rate = n_done / max(0.1, elapsed)
+            print(
+                f"[embed] done={n_done:5d}/{len(queue)}  faces+={n_face_records_added:5d}  "
+                f"noface+={n_noface_added:4d}  skipped={n_skipped:4d}  "
+                f"load_err={n_load_err:3d}  rate={rate:.1f} img/s  "
+                f"({elapsed:.1f}s elapsed)"
+            )
+
+    flush()
+    elapsed = time.perf_counter() - t0
+    print()
+    print("=== embed done ===")
+    print(f"  done:                    {n_done}")
+    print(f"  new face records:        {n_face_records_added}")
+    print(f"  new noface records:      {n_noface_added}")
+    print(f"  skipped (already done):  {n_skipped}")
+    print(f"  load errors:             {n_load_err}")
+    print(f"  elapsed:                 {elapsed:.1f}s ({n_done/max(0.1,elapsed):.1f} img/s)")
+    print(f"  cache:                   {args.out}")
+
+
+if __name__ == "__main__":
+    main()
--- a/work/finalize_immich.sh
+++ b/work/finalize_immich.sh
@@ -0,0 +1,50 @@
+#!/usr/bin/env bash
+# Finalize an Immich user's stage:
+#   1. Copy queue.json to /mnt/c so the Windows embed worker can read it
+#   2. Run the embed worker on Windows (DML)
+#   3. Copy the resulting cache back to /opt/face-sets/work/cache/
+#   4. Run cluster_immich.py to discover + emit new facesets
+#
+# Usage:  ./work/finalize_immich.sh <user-label>
+set -euo pipefail
+
+USER_LABEL="${1:?usage: $0 <user-label>}"
+
+REPO="$(cd "$(dirname "$0")/.." && pwd)"
+WSL_QUEUE="$REPO/work/immich/$USER_LABEL/queue.json"
+WIN_QUEUE_DIR="/mnt/c/face_embed_venv/work/immich/$USER_LABEL"
+WIN_QUEUE="$WIN_QUEUE_DIR/queue.json"
+WIN_QUEUE_FOR_PS="C:\\face_embed_venv\\work\\immich\\$USER_LABEL\\queue.json"
+
+WIN_CACHE_DIR="/mnt/c/face_embed_venv/work/cache"
+WIN_CACHE="$WIN_CACHE_DIR/immich_${USER_LABEL}.npz"
+WIN_CACHE_FOR_PS="C:\\face_embed_venv\\work\\cache\\immich_${USER_LABEL}.npz"
+WSL_CACHE="$REPO/work/cache/immich_${USER_LABEL}.npz"
+
+LOG="$REPO/work/logs/immich_finalize_${USER_LABEL}.log"
+
+[ -f "$WSL_QUEUE" ] || { echo "missing queue: $WSL_QUEUE" >&2; exit 1; }
+
+echo "=== finalize: $USER_LABEL ===" | tee -a "$LOG"
+date | tee -a "$LOG"
+
+mkdir -p "$WIN_QUEUE_DIR" "$WIN_CACHE_DIR" "$REPO/work/cache"
+
+echo "[1/4] copying queue: $WSL_QUEUE -> $WIN_QUEUE" | tee -a "$LOG"
+cp "$WSL_QUEUE" "$WIN_QUEUE"
+echo "      $(wc -c < "$WIN_QUEUE") bytes; $(/home/peter/face_sort_env/bin/python3 -c "import json,sys; print(len(json.load(open('$WIN_QUEUE'))))") entries"
+
+echo "[2/4] running Windows DML embed worker" | tee -a "$LOG"
+powershell.exe -NoProfile -Command "C:\\face_embed_venv\\Scripts\\python.exe C:\\face_embed_venv\\bench\\embed_worker.py '$WIN_QUEUE_FOR_PS' '$WIN_CACHE_FOR_PS'" 2>&1 | tee -a "$LOG"
+
+[ -f "$WIN_CACHE" ] || { echo "embed produced no cache file at $WIN_CACHE" | tee -a "$LOG"; exit 1; }
+
+echo "[3/4] copying cache back: $WIN_CACHE -> $WSL_CACHE" | tee -a "$LOG"
+cp "$WIN_CACHE" "$WSL_CACHE"
+echo "      $(/home/peter/face_sort_env/bin/python3 -c "import sys,json; sys.path.insert(0,'$REPO'); from sort_faces import load_cache; e,m,_,_,_=load_cache('$WSL_CACHE'); print(f'{len(e)} embeddings, {sum(1 for x in m if x.get(\"noface\"))} noface, {sum(1 for x in m if not x.get(\"noface\"))} faces')")"
+
+echo "[4/4] running cluster_immich.py" | tee -a "$LOG"
+/home/peter/face_sort_env/bin/python3 "$REPO/work/cluster_immich.py" "$WSL_CACHE" 2>&1 | tee -a "$LOG"
+
+echo "=== finalize done: $USER_LABEL ===" | tee -a "$LOG"
+date | tee -a "$LOG"
--- a/work/immich_stage.py
+++ b/work/immich_stage.py
@@ -0,0 +1,409 @@
+#!/usr/bin/env python3
+"""Stage Immich assets for embedding (WSL side of the split workflow).
+
+For one Immich user:
+  1. Page through `/search/metadata` listing every IMAGE asset the user owns.
+  2. For each asset, fetch `/faces?id=` and decide if any detected face has a
+     scaled short side >= MIN_FACE_SHORT on the original. Skip assets that
+     don't.
+  3. Download the original. Compute sha256.
+  4. Dedup against (a) the existing canonical cache `nl_full.npz` and
+     (b) sha256s already staged in this run / earlier runs. If duplicate,
+     do NOT save to disk; record the alias.
+  5. Save survivors to /mnt/x/src/immich/<user>/<rel> mirroring the structure
+     after Immich's `/upload/library/<owner>/` prefix.
+  6. Write a queue file with WSL + Windows paths so the Windows DML embed
+     worker can find them.
+  7. Persist staging state continuously so the run is resumable.
+
+Output artifacts:
+  work/immich/<user>/queue.json         - what the Windows worker should embed
+  work/immich/<user>/state.json         - resume state
+  work/immich/<user>/aliases.json       - asset_id -> existing canonical path
+                                          when sha256 matched something already
+                                          in nl_full.npz
+"""
+
+from __future__ import annotations
+
+import argparse
+import hashlib
+import json
+import os
+import sys
+import time
+import urllib.error
+import urllib.request
+from concurrent.futures import ThreadPoolExecutor
+from pathlib import Path
+
+import numpy as np
+
+REPO = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(REPO))
+
+from sort_faces import load_cache  # noqa: E402
+
+# ---- config -------------------------------------------------------------- #
+
+API = os.environ.get("IMMICH_URL", "").rstrip("/") + "/api" if os.environ.get("IMMICH_URL") else None
+KEY = os.environ.get("IMMICH_API_KEY")
+if not API or not KEY:
+    raise SystemExit(
+        "set IMMICH_URL and IMMICH_API_KEY env vars before running, e.g.\n"
+        "  export IMMICH_URL=https://fotos.example.org\n"
+        "  export IMMICH_API_KEY=...   # admin API key"
+    )
+HEADERS = {"x-api-key": KEY, "Accept": "application/json"}
+
+# Short-label -> Immich userId. The user is responsible for filling this in for
+# their own Immich instance. NOTE: as of Immich v2.7.2, /search/metadata's
+# `userIds` filter is silently ignored when the API key is bound to a different
+# user, so changing this label/UUID does not actually change which assets the
+# API returns; we keep it here for naming output dirs and as future-proofing.
+USERS_FILE = REPO / "work" / "immich" / "users.json"
+USERS: dict[str, str] = {}
+if USERS_FILE.exists():
+    USERS = json.loads(USERS_FILE.read_text())
+
+CACHE_PATH = REPO / "work" / "cache" / "nl_full.npz"  # for sha256 dedup
+STAGE_DIR  = REPO / "work" / "immich"
+DEST_ROOT  = Path("/mnt/x/src/immich")
+WIN_DEST_ROOT = "X:\\src\\immich"  # equivalent on the Windows side
+
+PAGE_SIZE = 1000
+MIN_FACE_SHORT = 90      # match refine's gate
+MIN_DET_SCORE  = 0.5     # weaker than refine's 0.6, since Immich's score scale differs
+HTTP_TIMEOUT = 60        # seconds, conservative for big originals
+HTTP_RETRIES = 3
+HTTP_BACKOFF = 2.0
+
+# ---- helpers ------------------------------------------------------------- #
+
+def http_get(url: str, accept_bytes: bool = False) -> bytes | dict:
+    """GET with retries. Returns parsed JSON unless accept_bytes is True."""
+    last_err = None
+    for attempt in range(HTTP_RETRIES):
+        try:
+            req = urllib.request.Request(url, headers=HEADERS)
+            with urllib.request.urlopen(req, timeout=HTTP_TIMEOUT) as resp:
+                data = resp.read()
+            return data if accept_bytes else json.loads(data)
+        except (urllib.error.URLError, urllib.error.HTTPError, TimeoutError) as e:
+            last_err = e
+            if attempt + 1 < HTTP_RETRIES:
+                time.sleep(HTTP_BACKOFF * (attempt + 1))
+    raise RuntimeError(f"GET {url} failed after {HTTP_RETRIES} attempts: {last_err}")
+
+
+def http_post(url: str, payload: dict) -> dict:
+    last_err = None
+    body = json.dumps(payload).encode("utf-8")
+    for attempt in range(HTTP_RETRIES):
+        try:
+            req = urllib.request.Request(
+                url, data=body, headers={**HEADERS, "Content-Type": "application/json"}
+            )
+            with urllib.request.urlopen(req, timeout=HTTP_TIMEOUT) as resp:
+                return json.loads(resp.read())
+        except (urllib.error.URLError, urllib.error.HTTPError, TimeoutError) as e:
+            last_err = e
+            if attempt + 1 < HTTP_RETRIES:
+                time.sleep(HTTP_BACKOFF * (attempt + 1))
+    raise RuntimeError(f"POST {url} failed after {HTTP_RETRIES} attempts: {last_err}")
+
+
+def sha256_bytes(b: bytes) -> str:
+    return hashlib.sha256(b).hexdigest()
+
+
+def derive_relpath(original_path: str) -> str:
+    """Return a relative subpath rooted at the user dir, mirroring Immich.
+
+    /usr/src/app/upload/library/admin/2026/2026-02-18/foo.jpg
+        -> 2026/2026-02-18/foo.jpg
+    Anything that doesn't match the expected prefix falls back to the basename
+    only.
+    """
+    marker = "/upload/library/"
+    i = original_path.find(marker)
+    if i < 0:
+        return Path(original_path).name
+    rest = original_path[i + len(marker):]
+    parts = rest.split("/", 1)
+    return parts[1] if len(parts) == 2 else parts[0]
+
+
+def wsl_to_win(p: Path) -> str:
+    """Convert /mnt/x/.. -> X:\\.. for the embed worker that runs on Windows."""
+    s = str(p)
+    if s.startswith("/mnt/"):
+        drive = s[5]
+        rest = s[6:].lstrip("/")
+        return f"{drive.upper()}:\\{rest.replace('/', chr(92))}"
+    if s.startswith("/opt/face-sets/"):
+        # /opt/face-sets/work/... is on the WSL ext4 filesystem; reachable from
+        # Windows as \\wsl$\Ubuntu\opt\face-sets\... (slower than C:).  For our
+        # use we keep all stage outputs under /mnt/x or /mnt/c so this branch
+        # should not be hit, but fall back rather than fail.
+        return f"\\\\wsl$\\Ubuntu\\opt\\face-sets\\{s[len('/opt/face-sets/'):].replace('/', chr(92))}"
+    return s
+
+
+def keep_asset(asset: dict, faces: list) -> tuple[bool, list[dict]]:
+    """Return (keep, eligible_face_records). A face is 'eligible' iff its
+    scaled-to-original short side >= MIN_FACE_SHORT and source-type is
+    machine-learning."""
+    W, H = asset.get("width"), asset.get("height")
+    if not W or not H:
+        return False, []
+    eligible = []
+    for f in faces:
+        if f.get("sourceType") and f["sourceType"] != "machine-learning":
+            continue
+        iw = f.get("imageWidth") or W
+        ih = f.get("imageHeight") or H
+        sx = (W / iw) if iw else 1.0
+        sy = (H / ih) if ih else 1.0
+        bw = (f["boundingBoxX2"] - f["boundingBoxX1"]) * sx
+        bh = (f["boundingBoxY2"] - f["boundingBoxY1"]) * sy
+        if min(bw, bh) >= MIN_FACE_SHORT:
+            eligible.append({
+                "id":      f["id"],
+                "x1": int(round(f["boundingBoxX1"] * sx)),
+                "y1": int(round(f["boundingBoxY1"] * sy)),
+                "x2": int(round(f["boundingBoxX2"] * sx)),
+                "y2": int(round(f["boundingBoxY2"] * sy)),
+                "person": (f.get("person") or {}).get("name") or None,
+            })
+    return (len(eligible) > 0), eligible
+
+
+# ---- main staging loop --------------------------------------------------- #
+
+def list_assets(user_id: str):
+    """Yield every IMAGE asset owned by user_id, paginated."""
+    page = 1
+    while True:
+        resp = http_post(f"{API}/search/metadata", {
+            "size": PAGE_SIZE,
+            "type": "IMAGE",
+            "page": page,
+            "userIds": [user_id],
+        })
+        items = resp["assets"]["items"]
+        if not items:
+            return
+        for a in items:
+            yield a
+        nxt = resp["assets"].get("nextPage")
+        if not nxt:
+            return
+        page = int(nxt)
+
+
+def stage(user_label: str, limit: int | None, workers: int) -> None:
+    user_id = USERS[user_label]
+    user_dir = STAGE_DIR / user_label
+    user_dir.mkdir(parents=True, exist_ok=True)
+
+    state_path  = user_dir / "state.json"
+    queue_path  = user_dir / "queue.json"
+    aliases_path = user_dir / "aliases.json"
+
+    # ---- load existing state for resume ---- #
+    state = {
+        "started_at": time.strftime("%Y-%m-%dT%H:%M:%S"),
+        "user_label": user_label,
+        "user_id": user_id,
+        "seen_asset_ids": [],
+        "staged_count": 0,
+        "deduped_against_existing": 0,
+        "deduped_against_staged": 0,
+        "skipped_no_big_face": 0,
+        "skipped_no_faces": 0,
+        "skipped_download_error": 0,
+        "total_assets_seen": 0,
+    }
+    queue: list[dict] = []
+    aliases: dict[str, dict] = {}  # asset_id -> {sha, canonical_path}
+    staged_hashes: set[str] = set()
+    if state_path.exists():
+        prior = json.loads(state_path.read_text())
+        state.update(prior)
+        state["resumed_at"] = time.strftime("%Y-%m-%dT%H:%M:%S")
+        if queue_path.exists():
+            queue = json.loads(queue_path.read_text())
+            staged_hashes = {q["sha256"] for q in queue}
+        if aliases_path.exists():
+            aliases = json.loads(aliases_path.read_text())
+        print(f"[resume] {len(state['seen_asset_ids'])} asset_ids already seen, "
+              f"{len(queue)} in queue, {len(aliases)} aliased to existing cache")
+    seen = set(state["seen_asset_ids"])
+
+    # ---- load existing canonical cache hashes (sha256) ---- #
+    print(f"[init] loading existing cache hashes from {CACHE_PATH}")
+    _emb, meta, _src, _proc, _aliases = load_cache(CACHE_PATH)
+    canonical_by_hash: dict[str, str] = {}
+    for m in meta:
+        h = m.get("hash")
+        if h:
+            canonical_by_hash.setdefault(h, m["path"])
+    print(f"[init] {len(canonical_by_hash)} unique sha256s in nl_full.npz")
+
+    # ---- iterate assets ---- #
+    # Each worker does the entire I/O chain for an asset: /faces -> filter ->
+    # /original. That way 8 workers translate to ~8x parallelism end-to-end.
+    # Main thread does sha256, dedup decisions, and writes (which are CPU/SMB
+    # bound but cheap relative to two HTTPS round-trips per asset).
+    # Worker result tuple:
+    #   (asset, faces|None, blob|None, eligible|None, error|None)
+    def _fetch_for_asset(asset: dict):
+        if asset.get("type") != "IMAGE":
+            return asset, None, None, None, "not_image"
+        aid = asset["id"]
+        if aid in seen:
+            return asset, None, None, None, "already_seen"
+        try:
+            faces = http_get(f"{API}/faces?id={aid}")
+        except Exception as e:
+            return asset, None, None, None, f"faces_error:{e}"
+        if not faces:
+            return asset, [], None, [], "no_faces"
+        keep, eligible = keep_asset(asset, faces)
+        if not keep:
+            return asset, faces, None, eligible, "no_big_face"
+        try:
+            blob = http_get(f"{API}/assets/{aid}/original", accept_bytes=True)
+        except Exception as e:
+            return asset, faces, None, eligible, f"download_error:{e}"
+        return asset, faces, blob, eligible, None
+
+    n = 0
+    last_flush = time.time()
+    t0 = time.time()
+    pool = ThreadPoolExecutor(max_workers=workers)
+    try:
+        for asset, faces, blob, eligible, err in pool.map(_fetch_for_asset, list_assets(user_id)):
+            if asset.get("type") != "IMAGE":
+                continue
+            n += 1
+            state["total_assets_seen"] = n
+            if limit is not None and n > limit:
+                print(f"[stop] hit --limit {limit}")
+                break
+            aid = asset["id"]
+
+            # Already-seen / non-image: silently skip.
+            if err == "already_seen":
+                continue
+
+            # Transient: count, but DON'T mark as seen so resume retries.
+            if err and (err.startswith("faces_error") or err.startswith("download_error")):
+                kind = err.split(":", 1)[0]
+                detail = err.split(":", 1)[1][:160] if ":" in err else err
+                print(f"[err] {kind} {aid}: {detail}")
+                state["skipped_download_error"] += 1
+                continue
+
+            # Permanent classifications -> seen.
+            if err == "no_faces":
+                state["skipped_no_faces"] += 1
+                seen.add(aid); state["seen_asset_ids"] = sorted(seen)
+                continue
+            if err == "no_big_face":
+                state["skipped_no_big_face"] += 1
+                seen.add(aid); state["seen_asset_ids"] = sorted(seen)
+                continue
+
+            # Have faces + blob -> dedup + save.
+            h = sha256_bytes(blob)
+            if h in canonical_by_hash:
+                aliases[aid] = {"sha256": h, "canonical": canonical_by_hash[h]}
+                state["deduped_against_existing"] += 1
+                seen.add(aid); state["seen_asset_ids"] = sorted(seen)
+                continue
+            if h in staged_hashes:
+                state["deduped_against_staged"] += 1
+                seen.add(aid); state["seen_asset_ids"] = sorted(seen)
+                continue
+
+            rel = derive_relpath(asset.get("originalPath", asset.get("originalFileName", aid)))
+            wsl_path = DEST_ROOT / user_label / rel
+            wsl_path.parent.mkdir(parents=True, exist_ok=True)
+            wsl_path.write_bytes(blob)
+            staged_hashes.add(h)
+
+            queue.append({
+                "asset_id": aid,
+                "sha256": h,
+                "wsl_path": str(wsl_path),
+                "win_path": wsl_to_win(wsl_path),
+                "size_bytes": len(blob),
+                "width":  asset.get("width"),
+                "height": asset.get("height"),
+                "originalPath": asset.get("originalPath"),
+                "originalFileName": asset.get("originalFileName"),
+                "localDateTime": asset.get("localDateTime"),
+                "immich_eligible_faces": eligible,
+            })
+            state["staged_count"] += 1
+            seen.add(aid)
+            state["seen_asset_ids"] = sorted(seen)
+
+            if time.time() - last_flush > 5.0 or len(queue) % 25 == 0:
+                queue_path.write_text(json.dumps(queue, indent=2))
+                state_path.write_text(json.dumps(state, indent=2))
+                aliases_path.write_text(json.dumps(aliases, indent=2))
+                last_flush = time.time()
+                elapsed = time.time() - t0
+                rate = state["total_assets_seen"] / max(0.1, elapsed)
+                print(f"[stage] seen={state['total_assets_seen']:6d} "
+                      f"staged={state['staged_count']:5d} "
+                      f"dedup-existing={state['deduped_against_existing']:5d} "
+                      f"dedup-staged={state['deduped_against_staged']:5d} "
+                      f"no-big-face={state['skipped_no_big_face']:6d} "
+                      f"no-faces={state['skipped_no_faces']:6d}  "
+                      f"errs={state['skipped_download_error']:3d}  "
+                      f"({rate:.1f} assets/s)")
+    finally:
+        pool.shutdown(wait=False, cancel_futures=True)
+
+    # final flush
+    queue_path.write_text(json.dumps(queue, indent=2))
+    state_path.write_text(json.dumps(state, indent=2))
+    aliases_path.write_text(json.dumps(aliases, indent=2))
+    print()
+    print(f"=== final state for user {user_label} ===")
+    for k in [
+        "total_assets_seen", "staged_count", "deduped_against_existing",
+        "deduped_against_staged", "skipped_no_big_face", "skipped_no_faces",
+        "skipped_download_error",
+    ]:
+        print(f"  {k}: {state[k]}")
+    total_bytes = sum(q["size_bytes"] for q in queue)
+    print(f"  staged bytes: {total_bytes/1e9:.2f} GB across {len(queue)} files")
+    print(f"  queue:    {queue_path}")
+    print(f"  state:    {state_path}")
+    print(f"  aliases:  {aliases_path}")
+
+
+# ---- cli ----------------------------------------------------------------- #
+
+def main() -> None:
+    p = argparse.ArgumentParser()
+    if not USERS:
+        p.add_argument("--user", required=True,
+                       help=f"label for output dir (USERS map empty; populate {USERS_FILE} to constrain)")
+    else:
+        p.add_argument("--user", choices=list(USERS.keys()), required=True)
+    p.add_argument("--limit", type=int, default=None,
+                   help="stop after seeing N assets total (for testing)")
+    p.add_argument("--workers", type=int, default=8,
+                   help="concurrent /faces fetches (default 8)")
+    args = p.parse_args()
+    stage(args.user, args.limit, args.workers)
+
+
+if __name__ == "__main__":
+    main()