Skip to content

cue.pii / cue.pii_recognizers

The PII scrub layer: Presidio + Cue-specific custom recognizers.

cue.pii

PII detection and anonymization using Microsoft Presidio. Strips sensitive information (emails, phone numbers, credit cards, etc.) before sending text to the Claude API.

scrub

scrub(text: str, language: str = 'en') -> str

Remove PII from text, returning the anonymized version.

Falls back to regex-based scrubbing if Presidio is unavailable (fail-closed).

scrub_strict

scrub_strict(text: str, language: str = 'en') -> tuple[str, list[str]]

Like scrub, but also returns the sorted unique entity types Presidio identified. Used by the digest eval harness to count residual PII leakage after a model summary has already been (over-)scrubbed upstream.

cue.pii_recognizers

Cue-specific Presidio custom recognizers.

Presidio's default recognizers cover names / emails / phone numbers / credit cards / SSN well, but Cue's typical screen content includes secret material that the defaults miss: API keys, tokens, private-key headers, DB URLs, local file paths, meeting URLs, and Korean PII (전화번호 / 주민등록번호 / 한글 도메인 이메일). This module declares those patterns; cue.pii._get_analyzer() registers them at first use.

Each recognizer is anchored by score (0.85+) so Presidio doesn't accidentally drop a real match in favor of a weaker default recognizer that overlaps the same span.

all_recognizers

all_recognizers() -> list[PatternRecognizer]

All Cue-specific recognizers, in registration order.