GPT-OSS Model
GPT-OSS-120B
Large-scale GPT-OSS version
GPT-OSS-20B
Lightweight efficient GPT-OSS version
Model Specifications & Performance
Detailed technical specifications and performance metrics of GPT-OSS series models
GPT-OSS-120B
Large GPT-OSS version, suitable for complex task processing
Technical Specifications
- •120B parameter scale
- •MoE hybrid expert architecture
- •128K context length
- •Tool usage capabilities
- •Chain-of-thought reasoning support
Performance Features
- ✓Performance comparable to o4-mini
- ✓Multi-turn conversation support
- ✓Code generation optimization
- ✓Multilingual understanding
GPT-OSS-20B
Lightweight GPT-OSS version, fast response and efficient deployment
Technical Specifications
- •20B parameter scale
- •Optimized inference speed
- •128K context length
- •Basic tool usage
- •Standard reasoning capabilities
Performance Features
- ✓Performance comparable to o3-mini
- ✓Fast response time
- ✓Resource usage optimization
- ✓Consumer hardware friendly
Core Features
MoE Architecture
Mixture of Experts system for efficient parameter utilization
128K Context
Extended context processing capabilities
Tool Usage
Support for external tool calling and integration
Chain-of-Thought
Step-by-step problem solving for complex issues
Consumer Hardware
Runs on regular GPUs for deployment
Apache 2.0
Fully open source with business-friendly license
Deployment Guide
Multiple deployment methods to fit different use cases and hardware configurations
System Requirements
GPT-OSS-20B
GPT-OSS-120B
Transformers
Quick deployment using Hugging Face Transformers library
1. Install Dependencies
pip install -U transformers kernels torch2. Run the Model
from transformers import pipeline
import torch
model_id = "openai/gpt-oss-20b" # or "openai/gpt-oss-120b"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Hello, how are you?"},
]
outputs = pipe(messages, max_new_tokens=256)
print(outputs[0]["generated_text"][-1])3. Start Web Server
transformers serve
transformers chat localhost:8000 --model-name-or-path openai/gpt-oss-20bvLLM
High-performance inference server for production environments
1. Install vLLM
uv pip install --pre vllm==0.10.1+gptoss \
--extra-index-url https://wheels.vllm.ai/gpt-oss/ \
--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
--index-strategy unsafe-best-match2. Start Server
vllm serve openai/gpt-oss-20b # or openai/gpt-oss-120bOllama
Consumer hardware friendly local deployment solution
1. Pull Model
# GPT-OSS-20B
ollama pull gpt-oss:20b
# GPT-OSS-120B
ollama pull gpt-oss:120b2. Run Model
# Run 20B version
ollama run gpt-oss:20b
# Run 120B version
ollama run gpt-oss:120bDirect Download
Download model weights locally for custom deployment
1. Using Hugging Face CLI
# GPT-OSS-20B
huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/
# GPT-OSS-120B
huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/2. Install GPT-OSS Package
pip install gpt-oss3. Run Chat Interface
python -m gpt_oss.chat model/Important Notes
• Harmony Format: Models use a dedicated harmony response format and must be used accordingly for proper functionality
• Reasoning Levels: Supports three reasoning levels (Low/Medium/High), configurable in system prompts
• Tool Support: Built-in support for web browsing, function calling, Python code execution, and more
• Fine-tuning Support: 20B version can be fine-tuned on consumer hardware, 120B version requires H100-class GPUs
• Open Source License: Apache 2.0 license supports commercial use and free modification
Frequently Asked Questions
Everything you need to know about GPT-OSS