OpenAI Releases Bidirectional PII Detection Model for On-Premises Use

OpenAI privacy-filter model supplies on-premises PII detection to meet tightening regulatory requirements for AI data pipelines.

OpenAI published a token-classification model on Hugging Face for detecting and masking personally identifiable information in text. The model supports high-throughput on-premises sanitization with a 128000-token context window and runs with 1.5B total parameters and 50M active parameters (https://huggingface.co/openai/privacy-filter). It was pretrained autoregressively on a gpt-oss checkpoint then converted to a bidirectional classifier using supervised token-level classification. The model predicts over an 8-category privacy taxonomy and applies constrained Viterbi decoding to produce coherent BIOES spans. OpenAI released it under Apache 2.0. The EU AI Act classifies systems processing personal data as high-risk and mandates appropriate technical safeguards (https://artificialintelligenceact.eu/). A 2023 arXiv survey on LLM privacy attacks documented training-data extraction risks that persist across production pipelines (https://arxiv.org/abs/2310.10078). Coverage of the release described architecture details but omitted explicit linkage to the regulatory compliance timeline and the shift toward local data fabric tooling now required by enterprise deployment standards.

THE FACTUM

OpenAI Releases Bidirectional PII Detection Model for On-Premises Use

Sources (3)