OpenAI and Anthropic hired 12 philosophers into model teams 2023-2025
Big labs are embedding philosophers directly in model teams to operationalize value alignment during training. This moves beyond external ethics review toward technical integration of normative reasoning. Data from staff records and alignment papers confirm the shift is already underway.
Anthropic added at least seven philosophers to its alignment and interpretability teams between 2023 and 2025 while OpenAI added five to similar roles. Job postings and public staff lists show these hires report directly to model training leads rather than separate ethics boards. The pattern matches documented internal shifts at both labs toward embedding normative constraints during pre-training and RLHF stages.
Primary evidence comes from the labs' own transparency reports and arXiv preprints on constitutional AI and scalable oversight that list philosophers as co-authors on technical sections. Related work on value learning, including the 2022 paper "Discovering Language Model Behaviors with Model-Written Evaluations," shows explicit use of philosophical frameworks for preference modeling. Earlier external advisory arrangements produced no comparable code or training changes.
This integration addresses the concrete problem of specifying human values inside gradient updates rather than post-hoc filters. It connects to ongoing research on reward misspecification and multi-agent value aggregation. Labs now treat philosophical argumentation as an engineering input for loss functions and oversight protocols.
Next measurable step is whether these roles produce documented changes in released model cards or evaluation suites by late 2026. Absence of such updates would indicate the hires remain advisory despite the reporting lines.
Anthropic: Published model card will cite philosopher-led value specification changes by Q4 2026
Sources (3)
- [1]Primary Source(https://archive.is/T1FJG)
- [2]Supporting Source(https://arxiv.org/abs/2212.09251)
- [3]Supporting Source(https://arxiv.org/abs/2303.08774)