GPT-OSS Model

Released August 5, 2025

GPT-OSS is OpenAI's groundbreaking open-source large language model family, featuring advanced MoE architecture. GPT-OSS offers performance comparable to o4-mini and o3-mini while supporting consumer-grade hardware deployment under the Apache 2.0 license, making GPT-OSS a significant milestone in AI ecosystem democratization.

120B

GPT-OSS-120B

Large-scale GPT-OSS version

Powerful GPT-OSS performance • Complex task processing
20B

GPT-OSS-20B

Lightweight efficient GPT-OSS version

Fast response • Resource-friendly GPT-OSS deployment

Model Specifications & Performance

Detailed technical specifications and performance metrics of GPT-OSS series models

GPT-OSS-120B

120B Parameters

Large GPT-OSS version, suitable for complex task processing

Technical Specifications

  • 120B parameter scale
  • MoE hybrid expert architecture
  • 128K context length
  • Tool usage capabilities
  • Chain-of-thought reasoning support

Performance Features

  • Performance comparable to o4-mini
  • Multi-turn conversation support
  • Code generation optimization
  • Multilingual understanding

GPT-OSS-20B

20B Parameters

Lightweight GPT-OSS version, fast response and efficient deployment

Technical Specifications

  • 20B parameter scale
  • Optimized inference speed
  • 128K context length
  • Basic tool usage
  • Standard reasoning capabilities

Performance Features

  • Performance comparable to o3-mini
  • Fast response time
  • Resource usage optimization
  • Consumer hardware friendly

Core Features

MoE Architecture

Mixture of Experts system for efficient parameter utilization

128K Context

Extended context processing capabilities

Tool Usage

Support for external tool calling and integration

Chain-of-Thought

Step-by-step problem solving for complex issues

Consumer Hardware

Runs on regular GPUs for deployment

Apache 2.0

Fully open source with business-friendly license

Deployment Guide

Multiple deployment methods to fit different use cases and hardware configurations

System Requirements

GPT-OSS-20B

Memory:16GB+ RAM
GPU:Consumer GPU (RTX 3080+)
Storage:50GB available space
Model Size:21B parameters (3.6B active parameters)

GPT-OSS-120B

Memory:32GB+ RAM
GPU:Single H100 or equivalent
Storage:200GB available space
Model Size:117B parameters (5.1B active parameters)

Transformers

Quick deployment using Hugging Face Transformers library

1. Install Dependencies

pip install -U transformers kernels torch

2. Run the Model

from transformers import pipeline
import torch

model_id = "openai/gpt-oss-20b"  # or "openai/gpt-oss-120b"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Hello, how are you?"},
]

outputs = pipe(messages, max_new_tokens=256)
print(outputs[0]["generated_text"][-1])

3. Start Web Server

transformers serve
transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20b

vLLM

High-performance inference server for production environments

1. Install vLLM

uv pip install --pre vllm==0.10.1+gptoss \
    --extra-index-url https://wheels.vllm.ai/gpt-oss/ \
    --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
    --index-strategy unsafe-best-match

2. Start Server

vllm serve openai/gpt-oss-20b  # or openai/gpt-oss-120b

Ollama

Consumer hardware friendly local deployment solution

1. Pull Model

# GPT-OSS-20B
ollama pull gpt-oss:20b

# GPT-OSS-120B  
ollama pull gpt-oss:120b

2. Run Model

# Run 20B version
ollama run gpt-oss:20b

# Run 120B version
ollama run gpt-oss:120b

Direct Download

Download model weights locally for custom deployment

1. Using Hugging Face CLI

# GPT-OSS-20B
huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/

# GPT-OSS-120B
huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/

2. Install GPT-OSS Package

pip install gpt-oss

3. Run Chat Interface

python -m gpt_oss.chat model/

Important Notes

Harmony Format: Models use a dedicated harmony response format and must be used accordingly for proper functionality

Reasoning Levels: Supports three reasoning levels (Low/Medium/High), configurable in system prompts

Tool Support: Built-in support for web browsing, function calling, Python code execution, and more

Fine-tuning Support: 20B version can be fine-tuned on consumer hardware, 120B version requires H100-class GPUs

Open Source License: Apache 2.0 license supports commercial use and free modification

Frequently Asked Questions

Everything you need to know about GPT-OSS