Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.generalcompute.com/llms.txt

Use this file to discover all available pages before exploring further.

Available Models

General Compute offers a wide range of open-source and open-weight models. All models are served via our OpenAI-compatible API. These are our top picks for the best balance of quality, speed, and cost.

MiniMax M2.7

Our best general-purpose model. Exceptional quality with a massive 196k context window at an unbeatable price. Perfect for production workloads.Model ID: minimax-m2.7

DeepSeek V3.2

State-of-the-art reasoning model. Best-in-class performance on complex tasks with built-in chain-of-thought reasoning.Model ID: deepseek-v3.2

All Models

ModelModel IDContextInput / 1M tokensOutput / 1M tokensCapabilities
MiniMax M2.7minimax-m2.7196k$0.40$2.34
MiniMax M2.5minimax-m2.5160k$0.20$1.17
DeepSeek V3.2deepseek-v3.232k$3.00$4.50Reasoning
DeepSeek V3.1deepseek-v3.1128k$3.00$4.50Reasoning
DeepSeek V3.1 CBdeepseek-v3.1-cb128k$0.15$0.75Reasoning
Llama 3.3 70Bllama-3.3-70b128k$0.60$1.20
Llama 4 Maverick 17Bllama-4-maverick-17b128k$0.63$1.80Vision
GPT-OSS 120Bgpt-oss-120b128k$0.21$0.79
Gemma 3 12Bgemma-3-12b-it131k$0.04$0.13Vision

Model Capabilities

  • Reasoning — Models with built-in chain-of-thought reasoning. These models think step-by-step before producing a final answer, leading to significantly better results on complex tasks like math, code, and analysis.
  • Vision — Models that accept image inputs alongside text. Pass images via the standard OpenAI-compatible image_url content type.

Choosing a Model

Use CaseRecommended ModelWhy
General-purpose chat & generationminimax-m2.7Best quality-to-cost ratio, 196k context
Complex reasoning & analysisdeepseek-v3.2State-of-the-art reasoning capabilities
Budget-friendly reasoningdeepseek-v3.1-cbStrong reasoning at $0.15/M input
Fast & cheapgemma-3-12b-it$0.04/M input, supports vision too
Vision tasksllama-4-maverick-17bMultimodal with strong image understanding
Large context windowsminimax-m2.7196k context at low cost
Cost-optimized bulk processinggemma-3-12b-it$0.04/M input, supports vision too

Custom checkpoints

Bring your own LoRA, GGUF, or full-finetuned checkpoints and run them on the same ultra-low latency infrastructure:
  • Share the model artifact (S3, Hugging Face, or direct upload) with the GeneralCompute team.
  • We containerize the checkpoint, attach accelerators in us-west-2, and expose it behind a private model ID (for example acme/my-custom-model).
  • Your private IDs behave exactly like any other model parameter in the OpenAI-compatible API, including streaming, tool calling, and function execution.
  • Enterprise plans layer on SLAs, dedicated pools, and per-org allow lists.
Contact support@generalcompute.com to schedule onboarding or to request deployments in additional regions.
All prices are in USD. Pricing is based on token usage with no minimum commitment on the Pay As You Go plan.