cue.pii / cue.pii_recognizers¶
The PII scrub layer: Presidio + Cue-specific custom recognizers.
cue.pii¶
PII detection and anonymization using Microsoft Presidio. Strips sensitive information (emails, phone numbers, credit cards, etc.) before sending text to the Claude API.
scrub ¶
scrub(text: str, language: str = 'en') -> str
Remove PII from text, returning the anonymized version.
Falls back to regex-based scrubbing if Presidio is unavailable (fail-closed).
scrub_strict ¶
scrub_strict(text: str, language: str = 'en') -> tuple[str, list[str]]
Like scrub, but also returns the sorted unique entity types Presidio
identified. Used by the digest eval harness to count residual PII leakage
after a model summary has already been (over-)scrubbed upstream.
cue.pii_recognizers¶
Cue-specific Presidio custom recognizers.
Presidio's default recognizers cover names / emails / phone numbers /
credit cards / SSN well, but Cue's typical screen content includes
secret material that the defaults miss: API keys, tokens, private-key
headers, DB URLs, local file paths, meeting URLs, and Korean PII
(전화번호 / 주민등록번호 / 한글 도메인 이메일). This module declares
those patterns; cue.pii._get_analyzer() registers them at first use.
Each recognizer is anchored by score (0.85+) so Presidio doesn't accidentally drop a real match in favor of a weaker default recognizer that overlaps the same span.
all_recognizers ¶
all_recognizers() -> list[PatternRecognizer]
All Cue-specific recognizers, in registration order.