Skip to content

Digest pipeline

This page mirrors implementation notes maintained in CLAUDE.md. Update both when changing this subsystem.

The digest pipeline turns recent screen activity into a short narrative summary stored in digest.md (and SQL digests.summary). Two backends, one fail-closed policy gate, one PII output-scrub layer.

Architecture

keyframes + MCAP events
        │
        ▼
  cue.digest._build_digest()
        │
        ▼
  cue.frame_select.select_digest_frames(max=10)        ← anchors + transitions + dedupe
        │
        ├── ≤ 10 screenshots, downscaled (896 px max), JPEG q75, EXIF stripped
        └── timeline: window/title/event narrative
        ▼
  cue.llm.summarize_digest_with_policy()                ← policy gate
        ├── digest_backend=local                       → LocalVisionBackend
        │     │
        │     ├── llama-server subprocess (managed)
        │     ├── model.gguf + mmproj.gguf
        │     └── POST /v1/chat/completions (OpenAI-compatible, image_url data URLs)
        │     │
        │     └── LocalUnavailable / LocalTimeout
        │           ├── allow_cloud_fallback=False → SKIP (tombstone)
        │           └── allow_cloud_fallback=True  → CloudVisionBackend (opt-in)
        └── digest_backend=cloud                       → CloudVisionBackend (Haiku w/ images)
        ▼
  cue.pii.scrub() (Presidio + custom recognizers)
        ▼
  digest.md + SQL digests.summary
        ▼
  memory.py (Opus) / suggest.py (Opus)                 ← always sees scrubbed text

Two backends

Backend Default What it sends
Cloud (Anthropic Haiku) yes Selected screenshots + event timeline → messages.create with vision content blocks.
Local (bundled llama-server + Gemma 4 GGUF) opt-in Same prompt + screenshots → POST localhost:<port>/v1/chat/completions, OpenAI-compatible. Image bytes never leave the device unless allow_cloud_fallback is explicitly enabled.

cue.llm.summarize_digest_with_policy(prompt, frames, timeline) is the single call site. It picks the backend based on the digest_backend config key, applies the fail-closed policy on LocalUnavailable / LocalTimeout, and returns the raw model output.

Fail-closed policy

flowchart TD
    A[digest_backend?] -->|cloud| B[CloudVisionBackend]
    A -->|local| C[LocalVisionBackend]
    C -->|success| D[summary]
    C -->|LocalUnavailable / LocalTimeout| E{allow_cloud_fallback?}
    E -->|false| F[SKIP — tombstone row]
    E -->|true| B
    B --> D
    D --> G[pii.scrub]
    G --> H[persist digest.md + digests row]

When the local backend fails and allow_cloud_fallback=False, the cycle is skipped — no cloud call, no write. A fixed-string tombstone row goes into the database (summary="[skipped: local vision model unavailable]") so file paths / ports / stderr can't leak. The detailed reason (model path, exception type, stderr tail) is logged only into privacy.log, post-scrub.

Frame selection

cue.frame_select.select_digest_frames(keyframe_dir, window_secs, max_frames=10) returns up to 10 frames using:

  • Protected anchors — first / middle / last in the window are never deduped or dropped. Load-bearing for narrative continuity.
  • Score the rest with four scorers:
  • window_app_title_change_score — foreground app / window title delta.
  • user_event_spike_score — keystrokes / clicks burst near frame.
  • visual_change_score — dhash distance to the previous keyframe (catches scene changes where the title doesn't move — switching tabs in the same browser, scrolling through a long doc).
  • text_density_score — OCR / dhash-text-region heuristic.
  • Dedupe non-anchors against each other via dhash Hamming distance ≤ 4. Anchors stay regardless.
  • Top up if dedupe under-fills the budget.
  • Sorted chronologically before return.

Frames inside paused intervals (read from privacy.is_paused() history) are excluded.

Image preprocessing

cue.image_preprocess.prepare_for_digest(path, max_dim=896, quality=75) -> bytes:

  • Open via Pillow.
  • Short-circuit return original bytes if image is already small, has no EXIF, and no ICC profile.
  • Apply ImageOps.exif_transpose, convert to RGB if needed.
  • Recompute width/height (transpose can swap them).
  • Downscale to fit within max_dim (default 896 px = Gemma 4 vision tower native).
  • JPEG re-encode at quality 75 with optimize=True. Strip EXIF + ICC by omission.
  • Return bytes — never written to disk.

Pixels go straight from Cue → llama-server (or Anthropic API) as data:image/jpeg;base64,... URLs. Original keyframe paths never appear in prompt text or in any log line. Frame metadata that does appear in the prompt is just frame_index + chronological_offset_seconds.

PII output scrub

cue.pii.scrub() runs Presidio defaults plus Cue-specific custom recognizers (API keys / DB URLs / file paths / meeting URLs / Korean PII) on the digest text before writing to digest.md, the SQL row, or memory.md. Idempotence is a tested contract.

Defense in depth:

  • Output-scrub on the LocalVisionBackend / CloudVisionBackend return.
  • Output-scrub on memory._compute_memory() return.
  • One-time backfill on existing rows when _meta.scrub_version bumps.
  • All log sites that touch summary / prompt / payload are scrubbed or redacted. Image bytes are never written to SQL or log files.

See cue.pii for the recognizer catalog.

Tombstone on skip

When the policy gate returns None (skip), mark_digest_skipped(reason) inserts a fixed-string row so digest.md reflects "[no recent activity]" rather than the previous summary. The detailed reason is logged to privacy.log only.

Latency budgets

Backend Target p95 Notes
Cloud Haiku w/ ≤10 images ~2 s Default.
Local Gemma 4 E2B Q8 (CPU) 22-35 s Apple Silicon M2 Pro measured during the spike.
Local Gemma 4 E2B Q8 (M5+ Metal) ~6 s (planned) Hardware not yet generally available; default flip is gated on this. See On-device vision.

See also