Digest pipeline¶
This page mirrors implementation notes maintained in CLAUDE.md.
Update both when changing this subsystem.
The digest pipeline turns recent screen activity into a short
narrative summary stored in digest.md (and SQL digests.summary).
Two backends, one fail-closed policy gate, one PII output-scrub
layer.
Architecture¶
keyframes + MCAP events
│
▼
cue.digest._build_digest()
│
▼
cue.frame_select.select_digest_frames(max=10) ← anchors + transitions + dedupe
│
├── ≤ 10 screenshots, downscaled (896 px max), JPEG q75, EXIF stripped
└── timeline: window/title/event narrative
▼
cue.llm.summarize_digest_with_policy() ← policy gate
├── digest_backend=local → LocalVisionBackend
│ │
│ ├── llama-server subprocess (managed)
│ ├── model.gguf + mmproj.gguf
│ └── POST /v1/chat/completions (OpenAI-compatible, image_url data URLs)
│ │
│ └── LocalUnavailable / LocalTimeout
│ ├── allow_cloud_fallback=False → SKIP (tombstone)
│ └── allow_cloud_fallback=True → CloudVisionBackend (opt-in)
└── digest_backend=cloud → CloudVisionBackend (Haiku w/ images)
▼
cue.pii.scrub() (Presidio + custom recognizers)
▼
digest.md + SQL digests.summary
▼
memory.py (Opus) / suggest.py (Opus) ← always sees scrubbed text
Two backends¶
| Backend | Default | What it sends |
|---|---|---|
| Cloud (Anthropic Haiku) | yes | Selected screenshots + event timeline → messages.create with vision content blocks. |
Local (bundled llama-server + Gemma 4 GGUF) |
opt-in | Same prompt + screenshots → POST localhost:<port>/v1/chat/completions, OpenAI-compatible. Image bytes never leave the device unless allow_cloud_fallback is explicitly enabled. |
cue.llm.summarize_digest_with_policy(prompt, frames, timeline) is
the single call site. It picks the backend based on the
digest_backend config key, applies the fail-closed policy on
LocalUnavailable / LocalTimeout, and returns the raw model
output.
Fail-closed policy¶
flowchart TD
A[digest_backend?] -->|cloud| B[CloudVisionBackend]
A -->|local| C[LocalVisionBackend]
C -->|success| D[summary]
C -->|LocalUnavailable / LocalTimeout| E{allow_cloud_fallback?}
E -->|false| F[SKIP — tombstone row]
E -->|true| B
B --> D
D --> G[pii.scrub]
G --> H[persist digest.md + digests row]
When the local backend fails and allow_cloud_fallback=False, the
cycle is skipped — no cloud call, no write. A fixed-string
tombstone row goes into the database
(summary="[skipped: local vision model unavailable]") so
file paths / ports / stderr can't leak. The detailed reason
(model path, exception type, stderr tail) is logged only into
privacy.log, post-scrub.
Frame selection¶
cue.frame_select.select_digest_frames(keyframe_dir, window_secs,
max_frames=10) returns up to 10 frames using:
- Protected anchors — first / middle / last in the window are never deduped or dropped. Load-bearing for narrative continuity.
- Score the rest with four scorers:
window_app_title_change_score— foreground app / window title delta.user_event_spike_score— keystrokes / clicks burst near frame.visual_change_score— dhash distance to the previous keyframe (catches scene changes where the title doesn't move — switching tabs in the same browser, scrolling through a long doc).text_density_score— OCR / dhash-text-region heuristic.- Dedupe non-anchors against each other via dhash Hamming distance ≤ 4. Anchors stay regardless.
- Top up if dedupe under-fills the budget.
- Sorted chronologically before return.
Frames inside paused intervals (read from privacy.is_paused()
history) are excluded.
Image preprocessing¶
cue.image_preprocess.prepare_for_digest(path, max_dim=896,
quality=75) -> bytes:
- Open via Pillow.
- Short-circuit return original bytes if image is already small, has no EXIF, and no ICC profile.
- Apply
ImageOps.exif_transpose, convert to RGB if needed. - Recompute width/height (transpose can swap them).
- Downscale to fit within
max_dim(default 896 px = Gemma 4 vision tower native). - JPEG re-encode at quality 75 with
optimize=True. Strip EXIF + ICC by omission. - Return bytes — never written to disk.
Pixels go straight from Cue → llama-server (or Anthropic API) as
data:image/jpeg;base64,... URLs. Original keyframe paths
never appear in prompt text or in any log line. Frame metadata
that does appear in the prompt is just frame_index +
chronological_offset_seconds.
PII output scrub¶
cue.pii.scrub() runs Presidio defaults plus Cue-specific custom
recognizers (API keys / DB URLs / file paths / meeting URLs /
Korean PII) on the digest text before writing to digest.md,
the SQL row, or memory.md. Idempotence is a tested contract.
Defense in depth:
- Output-scrub on the LocalVisionBackend / CloudVisionBackend return.
- Output-scrub on
memory._compute_memory()return. - One-time backfill on existing rows when
_meta.scrub_versionbumps. - All log sites that touch summary / prompt / payload are scrubbed or redacted. Image bytes are never written to SQL or log files.
See cue.pii for the recognizer catalog.
Tombstone on skip¶
When the policy gate returns None (skip), mark_digest_skipped(reason)
inserts a fixed-string row so digest.md reflects "[no recent
activity]" rather than the previous summary. The detailed reason
is logged to privacy.log only.
Latency budgets¶
| Backend | Target p95 | Notes |
|---|---|---|
| Cloud Haiku w/ ≤10 images | ~2 s | Default. |
| Local Gemma 4 E2B Q8 (CPU) | 22-35 s | Apple Silicon M2 Pro measured during the spike. |
| Local Gemma 4 E2B Q8 (M5+ Metal) | ~6 s (planned) | Hardware not yet generally available; default flip is gated on this. See On-device vision. |
See also¶
cue.digest— module API reference.cue.llm— backend abstraction.cue.frame_select— selector internals.cue.image_preprocess— pre-encode pipeline.cue.pii— PII scrub layer.- Prompts —
DIGEST_HEADERtemplate + companions. - On-device vision — local backend lifecycle.