Releases · tensorzero/tensorzero

@Nfemz

Caution

Breaking Changes

The --config-file globbing behavior has changed: single-level wildcards (*) no longer match files across directory boundaries. To match files across directory boundaries, use recursive wildcards (**). This aligns the behavior with standard glob semantics. For example:
- --config-file *.toml matches tensorzero.toml, but not subdir/tensorzero.toml.
- --config-file **/*.toml matches both tensorzero.toml and subdir/tensorzero.toml.

Warning

Completed Deprecations

Removed deprecated legacy endpoints for dataset management. The functionality is fully covered by the new endpoints.

New Features

Add cost tracking and cost-based rate limiting.
Add namespaces: the ability to set up multiple granular experiments (A/B tests) for the same TensorZero function.
Improve reasoning support for Anthropic (including adaptive thinking), Fireworks AI, SGLang, and Together AI.
Allow users to whitelist automatic tool approvals for TensorZero Autopilot.
Report provider errors when include_raw_response is enabled.
Add include_aggregated_response to streaming inferences. When enabled, the final chunk includes an aggregated output aggregated_response that combines previous chunks.
Allow users to kill ongoing evaluation runs from UI.
Allow custom gateway bind addresses with the environment variable TENSORZERO_GATEWAY_BIND_ADDRESS.

& multiple under-the-hood and UI improvements (thanks @Nfemz @greg80303)!

Caution

Breaking Changes

The default value for cache_options.enabled changed from write_only to off.

New Features

Support reasoning models from Groq, Mistral, and vLLM.
Support multi-turn reasoning with Gemini and OpenAI-compatible models.
Support embedding models from Together AI.
Add configurable total_ms timeout to streaming inferences.
Display charts with top-k evaluation results in the TensorZero Autopilot UI.
Add "Ask Autopilot" buttons throughout the UI.
Allow TensorZero Autopilot to edit your local configuration files.
Return thought and unknown content blocks in the OpenAI-compatible endpoint (tensorzero_extra_content).

& multiple under-the-hood and UI improvements!

@pratikbuilds

Warning

Planned Deprecations

Anthropic's structured output feature is out of beta, so the TensorZero configuration field beta_structured_outputs is now ignored and deprecated. It'll be removed in a future release.

Bug Fixes

Fix a regression in the aws_bedrock provider that affected long-term bearer API keys.
Fix a horizontal overflow issue for tool calls and results in the inference detail UI page.

New Features

Add YOLO Mode for TensorZero Autopilot.
Add interruption feature for TensorZero Autopilot sessions.
Add summary to the TensorZero Autopilot session table in the UI.

& multiple under-the-hood and UI improvements (thanks @pratikbuilds)!

Bug Fixes

Fix a race condition in the TensorZero Autopilot UI that could disable the chat input.
Increase timeouts for slow tool calls triggered by TensorZero Autopilot (e.g. evaluations).

& multiple under-the-hood and UI improvements!

New Features

[Preview] TensorZero Autopilot — an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Learn more → Join the waitlist →
Support multi-turn reasoning for xAI (reasoning_content only).

& multiple under-the-hood and UI improvements!

Caution

Breaking Changes

Moving forward, TensorZero will use the OpenAI API's error format ({"error": {"message": "Bad!"}) instead of TensorZero's error format ({"error": "Bad!"}) in the OpenAI-compatible endpoints.

Warning

Planned Deprecations

When using unstable_error_json with the OpenAI-compatible inference endpoint, use tensorzero_error_json instead of error_json. For now, TensorZero will emit both fields with identical data. The TensorZero inference endpoint is not affected.

New Features

Add native support for provider tools (e.g. web search) to the Anthropic and GCP Vertex AI Anthropic model providers. Previously, clients had to use extra_body to handle these tools.
Improve handling of reasoning content blocks when streaming with the OpenAI Responses API.
Handle inferences with missing usage fields gracefully in the OpenAI model provider.
Improve error handling across the UI.

& multiple under-the-hood and UI improvements!

@jonaylor89

Caution

Breaking Changes

TensorZero will normalize the reported usage from different model providers. Moving forward, input_tokens and output_tokens include all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers with include_raw_usage.

Warning

Planned Deprecations

Migrate include_original_response to include_raw_response. For advanced variant types, the former only returned the last model inference, whereas the latter returns every model inference with associated metadata.
Migrate allow_auto_detect_region = true to region = "sdk" when configuring AWS model providers. The behavior is identical.
Provide the proper API base rather than the full endpoint when configuring custom Anthropic providers. Example:
- Before: api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/messages"
- Now: api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/"

Bug Fixes

Fix a regression that triggered incorrect warnings about usage reporting for streaming inferences with Anthropic models.
Fix a bug in the TensorZero Python SDK that discarded some request fields in certain multi-turn inferences with tools.

New Features

Improve error handling across many areas: TensorZero UI, JSON deserialization, AWS providers, streaming inferences, timeouts, etc.
Support Valkey (Redis) for improving performance of rate limiting checks (recommended at 100+ QPS).
Support reasoning_effort for Gemini 3 models (mapped to thinkingLevel).
Improve handling of Anthropic reasoning models in TensorZero JSON functions. Moving forward, json_mode = "strict" will use the beta structured outputs feature; json_mode = "on" still uses the legacy assistant message prefill.
Improve handling of reasoning content in the OpenRouter and xAI model providers.
Add extra_headers support for embedding models. (thanks @jonaylor89!)
Support dynamic credentials for AWS Bedrock and AWS SageMaker model providers.

& multiple under-the-hood and UI improvements (thanks @ndoherty-xyz)!

@ecalifornica

New Features

Support appending to arrays with extra_body using the /my_array/- notation.
Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio.

& multiple under-the-hood and UI improvements (thanks @ecalifornica!)

Warning

Planned Deprecations

In a future release, the parameter model will be required when initializing DICLOptimizationConfig. The parameter remains optional (defaults to openai::gpt-5-mini) in the meantime.

Bug Fixes

Stop buffering raw_usage when streaming with the OpenAI-compatible inference endpoint; instead, emit raw_usage as soon as possible, just like in the native endpoint.
Stop reporting zero usage in every chunk when streaming a cached inference; instead, report zero usage only in the final chunk, as expected.

New Features

Support stream_options.include_usage for every model under the Azure provider.

& multiple under-the-hood and UI improvements!

@ecalifornica

Caution

Breaking Changes

The Prometheus metric tensorzero_inference_latency_overhead_seconds will report a histogram instead of a summary. You can customize the buckets using gateway.metrics.tensorzero_inference_latency_overhead_seconds_buckets in the configuration (default: 1ms, 10ms, 100ms).

Warning

Planned Deprecations

Deprecate the TENSORZERO_CLICKHOUSE_URL environment variable from the UI. Moving forward, the UI will query data through the gateway and does not communicate directly with ClickHouse.
Rename the Prometheus metric tensorzero_inference_latency_overhead_seconds_histogram to tensorzero_inference_latency_overhead_seconds. Both metrics will be emitted for now.
Rename the configuration field tensorzero_inference_latency_overhead_seconds_histogram_buckets to tensorzero_inference_latency_overhead_seconds_buckets. Both fields are available for now.

New Features

Add optional include_raw_usage parameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalized usage response field.
Add optional --bind-address CLI flag to the gateway.
Add optional description field to metrics in the configuration.
Add option to fine-tune Fireworks models without automatic deployment.

& multiple under-the-hood and UI improvements (thanks @ecalifornica @achaljhawar @rguilmont)!

Releases: tensorzero/tensorzero

2026.2.2

Contributors

Uh oh!

2026.2.1

Uh oh!

2026.2.0

Contributors

Uh oh!

2026.1.8

Uh oh!

2026.1.7

Uh oh!

2026.1.6

Uh oh!

2026.1.5

Contributors

Uh oh!

2026.1.2

Contributors

Uh oh!

2026.1.1

Uh oh!

2026.1.0

Contributors

Uh oh!