Releases: tensorzero/tensorzero
2026.2.2
Caution
Breaking Changes
- The
--config-fileglobbing behavior has changed: single-level wildcards (*) no longer match files across directory boundaries. To match files across directory boundaries, use recursive wildcards (**). This aligns the behavior with standard glob semantics. For example:--config-file *.tomlmatchestensorzero.toml, but notsubdir/tensorzero.toml.--config-file **/*.tomlmatches bothtensorzero.tomlandsubdir/tensorzero.toml.
Warning
Completed Deprecations
- Removed deprecated legacy endpoints for dataset management. The functionality is fully covered by the new endpoints.
New Features
- Add cost tracking and cost-based rate limiting.
- Add namespaces: the ability to set up multiple granular experiments (A/B tests) for the same TensorZero function.
- Improve reasoning support for Anthropic (including adaptive thinking), Fireworks AI, SGLang, and Together AI.
- Allow users to whitelist automatic tool approvals for TensorZero Autopilot.
- Report provider errors when
include_raw_responseis enabled. - Add
include_aggregated_responseto streaming inferences. When enabled, the final chunk includes an aggregated outputaggregated_responsethat combines previous chunks. - Allow users to kill ongoing evaluation runs from UI.
- Allow custom gateway bind addresses with the environment variable
TENSORZERO_GATEWAY_BIND_ADDRESS.
& multiple under-the-hood and UI improvements (thanks @Nfemz @greg80303)!
2026.2.1
Caution
Breaking Changes
- The default value for
cache_options.enabledchanged fromwrite_onlytooff.
New Features
- Support reasoning models from Groq, Mistral, and vLLM.
- Support multi-turn reasoning with Gemini and OpenAI-compatible models.
- Support embedding models from Together AI.
- Add configurable
total_mstimeout to streaming inferences. - Display charts with top-k evaluation results in the TensorZero Autopilot UI.
- Add "Ask Autopilot" buttons throughout the UI.
- Allow TensorZero Autopilot to edit your local configuration files.
- Return
thoughtandunknowncontent blocks in the OpenAI-compatible endpoint (tensorzero_extra_content).
& multiple under-the-hood and UI improvements!
2026.2.0
Warning
Planned Deprecations
- Anthropic's structured output feature is out of beta, so the TensorZero configuration field
beta_structured_outputsis now ignored and deprecated. It'll be removed in a future release.
Bug Fixes
- Fix a regression in the
aws_bedrockprovider that affected long-term bearer API keys. - Fix a horizontal overflow issue for tool calls and results in the inference detail UI page.
New Features
- Add YOLO Mode for TensorZero Autopilot.
- Add interruption feature for TensorZero Autopilot sessions.
- Add summary to the TensorZero Autopilot session table in the UI.
& multiple under-the-hood and UI improvements (thanks @pratikbuilds)!
2026.1.8
Bug Fixes
- Fix a race condition in the TensorZero Autopilot UI that could disable the chat input.
- Increase timeouts for slow tool calls triggered by TensorZero Autopilot (e.g. evaluations).
& multiple under-the-hood and UI improvements!
2026.1.7
New Features
- [Preview] TensorZero Autopilot — an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. Learn more → Join the waitlist →
- Support multi-turn reasoning for xAI (
reasoning_contentonly).
& multiple under-the-hood and UI improvements!
2026.1.6
Caution
Breaking Changes
- Moving forward, TensorZero will use the OpenAI API's error format (
{"error": {"message": "Bad!"}) instead of TensorZero's error format ({"error": "Bad!"}) in the OpenAI-compatible endpoints.
Warning
Planned Deprecations
- When using
unstable_error_jsonwith the OpenAI-compatible inference endpoint, usetensorzero_error_jsoninstead oferror_json. For now, TensorZero will emit both fields with identical data. The TensorZero inference endpoint is not affected.
New Features
- Add native support for provider tools (e.g. web search) to the Anthropic and GCP Vertex AI Anthropic model providers. Previously, clients had to use
extra_bodyto handle these tools. - Improve handling of reasoning content blocks when streaming with the OpenAI Responses API.
- Handle inferences with missing
usagefields gracefully in the OpenAI model provider. - Improve error handling across the UI.
& multiple under-the-hood and UI improvements!
2026.1.5
Caution
Breaking Changes
- TensorZero will normalize the reported
usagefrom different model providers. Moving forward,input_tokensandoutput_tokensinclude all token variations (provider prompt caching, reasoning, etc.), just like OpenAI. Tokens cached by TensorZero remain excluded. You can still access the raw usage reported by providers withinclude_raw_usage.
Warning
Planned Deprecations
- Migrate
include_original_responsetoinclude_raw_response. For advanced variant types, the former only returned the last model inference, whereas the latter returns every model inference with associated metadata. - Migrate
allow_auto_detect_region = truetoregion = "sdk"when configuring AWS model providers. The behavior is identical. - Provide the proper API base rather than the full endpoint when configuring custom Anthropic providers. Example:
- Before:
api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/messages" - Now:
api_base = "https://YOUR-RESOURCE-NAME.services.ai.azure.com/anthropic/v1/"
- Before:
Bug Fixes
- Fix a regression that triggered incorrect warnings about usage reporting for streaming inferences with Anthropic models.
- Fix a bug in the TensorZero Python SDK that discarded some request fields in certain multi-turn inferences with tools.
New Features
- Improve error handling across many areas: TensorZero UI, JSON deserialization, AWS providers, streaming inferences, timeouts, etc.
- Support Valkey (Redis) for improving performance of rate limiting checks (recommended at 100+ QPS).
- Support
reasoning_effortfor Gemini 3 models (mapped tothinkingLevel). - Improve handling of Anthropic reasoning models in TensorZero JSON functions. Moving forward,
json_mode = "strict"will use the beta structured outputs feature;json_mode = "on"still uses the legacy assistant message prefill. - Improve handling of reasoning content in the OpenRouter and xAI model providers.
- Add
extra_headerssupport for embedding models. (thanks @jonaylor89!) - Support dynamic credentials for AWS Bedrock and AWS SageMaker model providers.
& multiple under-the-hood and UI improvements (thanks @ndoherty-xyz)!
2026.1.2
New Features
- Support appending to arrays with
extra_bodyusing the/my_array/-notation. - Handle cross-model thought signatures in GCP Vertex AI Gemini and Google AI Studio.
& multiple under-the-hood and UI improvements (thanks @ecalifornica!)
2026.1.1
Warning
Planned Deprecations
- In a future release, the parameter
modelwill be required when initializingDICLOptimizationConfig. The parameter remains optional (defaults toopenai::gpt-5-mini) in the meantime.
Bug Fixes
- Stop buffering
raw_usagewhen streaming with the OpenAI-compatible inference endpoint; instead, emitraw_usageas soon as possible, just like in the native endpoint. - Stop reporting zero usage in every chunk when streaming a cached inference; instead, report zero usage only in the final chunk, as expected.
New Features
- Support
stream_options.include_usagefor every model under the Azure provider.
& multiple under-the-hood and UI improvements!
2026.1.0
Caution
Breaking Changes
- The Prometheus metric
tensorzero_inference_latency_overhead_secondswill report a histogram instead of a summary. You can customize the buckets usinggateway.metrics.tensorzero_inference_latency_overhead_seconds_bucketsin the configuration (default: 1ms, 10ms, 100ms).
Warning
Planned Deprecations
- Deprecate the
TENSORZERO_CLICKHOUSE_URLenvironment variable from the UI. Moving forward, the UI will query data through the gateway and does not communicate directly with ClickHouse. - Rename the Prometheus metric
tensorzero_inference_latency_overhead_seconds_histogramtotensorzero_inference_latency_overhead_seconds. Both metrics will be emitted for now. - Rename the configuration field
tensorzero_inference_latency_overhead_seconds_histogram_bucketstotensorzero_inference_latency_overhead_seconds_buckets. Both fields are available for now.
New Features
- Add optional
include_raw_usageparameter to inference requests. If enabled, the gateway returns the raw usage objects from model provider responses in addition to the normalizedusageresponse field. - Add optional
--bind-addressCLI flag to the gateway. - Add optional
descriptionfield to metrics in the configuration. - Add option to fine-tune Fireworks models without automatic deployment.
& multiple under-the-hood and UI improvements (thanks @ecalifornica @achaljhawar @rguilmont)!