OpenAI Releases Open-Weight `gpt-oss-safeguard` Models for Content Moderation

Executive Summary

OpenAI has announced the release of two open-weight reasoning models, `gpt-oss-safeguard-120b` and `gpt-oss-safeguard-20b`, licensed under Apache 2.0. These models are specifically designed for content classification, enabling developers to label content based on a custom, user-provided policy. Post-trained from the `gpt-oss` model series, they offer customizable and transparent reasoning for safety applications.

Key Takeaways

* New Models: Introduction of `gpt-oss-safeguard-120b` and `gpt-oss-safeguard-20b`.

* Primary Use Case: Designed to classify and label content against a provided policy, not for direct end-user interaction.

* Open-Weight: Released under the Apache 2.0 license, making them freely available for use and modification.

* Key Capabilities: The models provide full chain-of-thought (CoT) reasoning, support structured outputs, and allow for different levels of reasoning effort (low, medium, high).

* Customizable: As fine-tuned versions of the `gpt-oss` models, they can be adapted for specific content moderation needs.

* API Compatibility: The text-only models are compatible with the company's Responses API.

Strategic Importance

This release provides the open-source community with powerful, customizable safety tools, empowering developers to implement sophisticated content moderation and aligning the company with the movement towards more transparent AI safety solutions.

Original article