Model data and training

The o1 large language model family is trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long chain of thought before responding to the user. OpenAI o1 is the next model in this series (previously OpenAI o1-preview), while OpenAI o1-mini is a faster version of this model that is particularly effective at coding. Through training, the models learn to refine their thinking process, try different strategies, and recognize their mistakes. Reasoning allows o1 models to follow specific guidelines and model policies we’ve set, helping them act in line with our safety expectations. This means they are better at providing helpful answers and resisting attempts to bypass safety rules, to avoid producing unsafe or inappropriate content.

The two models were pre-trained on diverse datasets, including a mix of publicly available data, proprietary data accessed through partnerships, and custom datasets developed in-house, which collectively contribute to the models’ robust reasoning and conversational capabilities.

Select Public Data: Both models were trained on a variety of publicly available datasets, including web data and open-source datasets. Key components include reasoning data and scientific literature. This ensures that the models are well-versed in both general knowledge and technical topics, enhancing their ability to perform complex reasoning tasks.

Proprietary Data from Data Partnerships: To further enhance the capabilities of o1 and o1-mini, we formed partnerships to access high-value non-public datasets. These proprietary data sources include paywalled content, specialized archives, and other domain-specific datasets that provide deeper insights into industry-specific knowledge and use cases.

Data Filtering and Refinement: Our data processing pipeline includes rigorous filtering to maintain data quality and mitigate potential risks. We use advanced data filtering processes to reduce personal information from training data. We also employ a combination of our Moderation API and safety classifiers to prevent the use of harmful or sensitive content, including explicit materials such as CSAM.

PreviousRanger is doing it differently NextScope of testing

Last updated 5 months ago