Models & Pricing

Available Models

General Compute offers a wide range of open-source and open-weight models. All models are served via our OpenAI-compatible API.

MiniMax M2.7

Our best general-purpose model. Exceptional quality with a massive 196k context window at an unbeatable price. Perfect for production workloads.Model ID: minimax-m2.7

DeepSeek V3.2

State-of-the-art reasoning model. Best-in-class performance on complex tasks with built-in chain-of-thought reasoning.Model ID: deepseek-v3.2

All Models

Model	Model ID	Context	Input / 1M tokens	Output / 1M tokens	Capabilities
MiniMax M2.7	`minimax-m2.7`	196k	$0.40	$2.34
MiniMax M2.5	`minimax-m2.5`	160k	$0.20	$1.17
DeepSeek V3.2	`deepseek-v3.2`	32k	$3.00	$4.50	Reasoning
DeepSeek V3.1	`deepseek-v3.1`	128k	$3.00	$4.50	Reasoning
DeepSeek V3.1 CB	`deepseek-v3.1-cb`	128k	$0.15	$0.75	Reasoning
Llama 3.3 70B	`llama-3.3-70b`	128k	$0.60	$1.20
Llama 4 Maverick 17B	`llama-4-maverick-17b`	128k	$0.63	$1.80	Vision
GPT-OSS 120B	`gpt-oss-120b`	128k	$0.21	$0.79
Gemma 3 12B	`gemma-3-12b-it`	131k	$0.04	$0.13	Vision

Model Capabilities

Reasoning — Models with built-in chain-of-thought reasoning. These models think step-by-step before producing a final answer, leading to significantly better results on complex tasks like math, code, and analysis.
Vision — Models that accept image inputs alongside text. Pass images via the standard OpenAI-compatible image_url content type.

Choosing a Model

Use Case	Recommended Model	Why
General-purpose chat & generation	`minimax-m2.7`	Best quality-to-cost ratio, 196k context
Complex reasoning & analysis	`deepseek-v3.2`	State-of-the-art reasoning capabilities
Budget-friendly reasoning	`deepseek-v3.1-cb`	Strong reasoning at $0.15/M input
Fast & cheap	`gemma-3-12b-it`	$0.04/M input, supports vision too
Vision tasks	`llama-4-maverick-17b`	Multimodal with strong image understanding
Large context windows	`minimax-m2.7`	196k context at low cost
Cost-optimized bulk processing	`gemma-3-12b-it`	$0.04/M input, supports vision too

Custom checkpoints

Bring your own LoRA, GGUF, or full-finetuned checkpoints and run them on the same ultra-low latency infrastructure:

Share the model artifact (S3, Hugging Face, or direct upload) with the GeneralCompute team.
We containerize the checkpoint, attach accelerators in us-west-2, and expose it behind a private model ID (for example acme/my-custom-model).
Your private IDs behave exactly like any other model parameter in the OpenAI-compatible API, including streaming, tool calling, and function execution.
Enterprise plans layer on SLAs, dedicated pools, and per-org allow lists.

Contact support@generalcompute.com to schedule onboarding or to request deployments in additional regions.

All prices are in USD. Pricing is based on token usage with no minimum commitment on the Pay As You Go plan.

Getting started

Products & plans

Features

Integrations

Examples

Available Models

Recommended

MiniMax M2.7

DeepSeek V3.2

All Models

Model Capabilities

Choosing a Model

Custom checkpoints

Getting started

Products & plans

Features

Integrations

Examples

Documentation Index

​Available Models

​Recommended

MiniMax M2.7

DeepSeek V3.2

​All Models

​Model Capabilities

​Choosing a Model

​Custom checkpoints

Available Models

Recommended

All Models

Model Capabilities

Choosing a Model

Custom checkpoints