Skip to main content

Available Models

NEAR AI Cloud provides access to leading AI models, each optimized for different use cases ranging from advanced reasoning and tool calling to long-context processing and multilingual tasks. All models run in secure TEE environments with transparent, pay-per-use pricing.

Quick Reference

ModelModel IDContextInput PriceOutput Price
Claude Opus 4.6
anthropic/claude-opus-4-6
200K$5.00/M$25.00/M
Claude Sonnet 4.5
anthropic/claude-sonnet-4-5
200K$3.00/M$15.50/M
FLUX.2-klein-4B
black-forest-labs/FLUX.2-klein-4B
128K$1.00/M$1.00/M
DeepSeek V3.1
deepseek-ai/DeepSeek-V3.1
128K$1.05/M$3.10/M
Gemini 3 Pro Preview
google/gemini-3-pro
1000K$1.25/M$15.00/M
OpenAI GPT-5.2
openai/gpt-5.2
400K$1.80/M$15.50/M
GPT OSS 120B
openai/gpt-oss-120b
131K$0.15/M$0.55/M
Qwen3 30B A3B Instruct 2507
Qwen/Qwen3-30B-A3B-Instruct-2507
262K$0.15/M$0.55/M
GLM 4.7
zai-org/GLM-4.7
131K$0.85/M$3.30/M

Model Details

Claude Opus 4.6

Claude Opus 4.6

Anthropic's most intelligent model for building agents and coding

200K context$5.00/M input$25.00/M output

Model ID:

anthropic/claude-opus-4-6
Claude Sonnet 4.5

Claude Sonnet 4.5

Anthropic's Claude Sonnet 4.5 - a powerful, efficient model balancing intelligence and speed. Excels at complex reasoning, coding, and creative tasks with 200K context window. Anonymized, not TEE-protected.

200K context$3.00/M input$15.50/M output

Model ID:

anthropic/claude-sonnet-4-5
FLUX.2-klein-4B

FLUX.2-klein-4B

The FLUX.2 [klein] model family are our fastest image models to date. FLUX.2 [klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second. Built for applications that require real-time image generation without sacrificing quality.

128K context$1.00/M input$1.00/M output

Model ID:

black-forest-labs/FLUX.2-klein-4B
DeepSeek V3.1

DeepSeek V3.1

DeepSeek V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:

128K context$1.05/M input$3.10/M output

Model ID:

deepseek-ai/DeepSeek-V3.1
Gemini 3 Pro Preview

Gemini 3 Pro Preview

Google's Gemini 3 Pro Preview - a highly capable multimodal model with an industry-leading 1M token context window. Optimized for complex reasoning, code generation, and long document analysis. Anonymized, not TEE-protected.

1000K context$1.25/M input$15.00/M output

Model ID:

google/gemini-3-pro
OpenAI GPT-5.2

OpenAI GPT-5.2

OpenAI GPT-5.2 with 400k context window. Anonymized endpoint optimized for deep reasoning and large-context workflows.

400K context$1.80/M input$15.50/M output

Model ID:

openai/gpt-5.2
GPT OSS 120B

GPT OSS 120B

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

131K context$0.15/M input$0.55/M output

Model ID:

openai/gpt-oss-120b
Qwen3 30B A3B Instruct 2507

Qwen3 30B A3B Instruct 2507

Qwen3-30B-A3B-Instruct-2507 is a mixture-of-experts (MoE) causal language model featuring 30.5 billion total parameters and 3.3 billion activated parameters per inference. It supports ultra-long context up to 262 K tokens and operates exclusively in non-thinking mode, delivering strong enhancements in instruction following, reasoning, logical comprehension, mathematics, coding, multilingual understanding, and alignment with user preferences.

262K context$0.15/M input$0.55/M output

Model ID:

Qwen/Qwen3-30B-A3B-Instruct-2507
GLM 4.7

GLM 4.7

GLM 4.7 is the latest model from Z.ai, it's an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified.

131K context$0.85/M input$3.30/M output

Model ID:

zai-org/GLM-4.7