GraTAG: Production AI Search via Graph-Based Query Decomposition and Triplet-Aligned Generation with Rich Multimodal Representations
GraTAG is an end-to-end production-ready AI search engine framework that addresses key challenges in relevance, comprehensiveness, and presentation through three core innovations:
-
Graph-Based Query Decomposition (GQD) — decomposes complex queries into atomic sub-queries represented as a directed acyclic graph (DAG), capturing parallel and joint dependencies for finer-grained reasoning. The GQD model is post-trained via SFT followed by GRPO alignment with RAG task performance.
-
Triplet-Aligned Generation (TAG) — extracts relation triplets from retrieved documents and aligns them with the answer generation process to bridge missing logic across chunks, enhancing coherence and mitigating hallucination. TAG employs a cold-start triplet extraction stage followed by REINFORCE-based triplet alignment.
-
Rich Multimodal Presentations — integrates timeline visualization and textual-visual choreography (image-text matching via Hungarian algorithm) to reduce cognitive load and enhance information verification.
Evaluated on 1,000 recent real-world queries with over 243,000 expert ratings across 9 criteria, GraTAG outperforms eight existing systems in human expert assessments. Compared to the strongest baseline, GraTAG improves comprehensiveness by 10.8%, insightfulness by 7.9%, and the overall average score by 4.8%. On the public benchmark BrowseComp, GraTAG outperforms the best baseline by 17.3%.
- Architecture Overview
- Infrastructure Dependencies
- Deployment Guide
- Model Training
- Documentation
- License
- Disclaimer
GraTAG adopts a three-tier architecture. The services communicate via HTTP: Frontend → Backend API → Algorithm Service.
| Directory | Description |
|---|---|
alg/ |
Algorithm service — core AI pipeline implementing GQD, TAG, multi-source retrieval, and multimodal presentation (Flask + NetworkX + Transformers) |
backend/ |
Backend service — RESTful API layer handling user management, QA session persistence, and algorithm service orchestration (Flask + MongoEngine + JWT) |
frontend/ |
Frontend application — search interface with streaming answer display, timeline visualization, and document preview (Nuxt 3 + TypeScript + SCSS) |
experiments/ |
Evaluation benchmarks (SearchBench-1000, BrowseComp) and baseline answer collection via browser automation (Playwright + GPT-4o) |
docs/ |
Technical documentation — module reference, API reference, and configuration guide |
assets/ |
Static resources — images and figures used in documentation |
The algorithm service exposes two endpoints (/execute and /stream_execute) for synchronous and streaming invocations respectively. Key sub-directories under alg/src/ include pipeline/ (orchestration), modules/ (GQD, TAG, retrieval, timeline), model_training/ (GQD and TAG training scripts), and include/ (shared config and context management).
| Service | Purpose | Version |
|---|---|---|
| MongoDB | Application data storage (QA sessions, users, subscriptions) | 4.x+ |
| Elasticsearch | Context persistence, search indexing, and full-text retrieval | 7.10+ |
| Milvus | Vector similarity search for dense retrieval | 2.4+ |
| LLM Inference | Language model serving (vLLM / HuggingFace TGI compatible) | — |
| Nacos | Service discovery and registration (optional) | 1.x |
| OSS / MinIO | Object storage for documents and images | — |
Before deploying GraTAG, ensure the following infrastructure services are running and accessible:
| Service | Required | Default Port | Setup |
|---|---|---|---|
| MongoDB | Yes | 27017 | Stores user data, QA sessions, subscriptions. Create a database (e.g. gratag) with authentication. |
| Elasticsearch | Yes | 9200 | Stores QA context for multi-turn conversations and provides full-text retrieval. Version 7.10+ recommended. |
| Milvus | Optional | 19530 | Provides dense vector retrieval. Required for vector similarity search in the recall stage. Version 2.4+ recommended. |
| LLM Inference | Yes | — | vLLM or HuggingFace TGI compatible endpoint serving models such as Qwen2.5-72B-Instruct. |
| OSS / MinIO | Optional | 9000 (MinIO) | Object storage for uploaded documents and images. Required for document QA mode. |
| Nacos | Optional | 8848 | Service discovery and centralized configuration management. Can be replaced by local config files. |
1.1 Environment Setup
cd alg/src
conda create -n gratag python=3.9 -y
conda activate gratag
pip install -r requirements.txt
# Chinese NLP model (required for spaCy tokenization)
pip install zh_core_web_sm-3.8.0.tar.gz1.2 Configuration
Edit include/config/common_config.py with your infrastructure endpoints:
CommonConfig = {
# LLM inference endpoints
"FSCHAT": {
"vllm_url": "http://<llm-host>:8000/v1",
"hf_url": "http://<llm-host>:8001"
},
# Elasticsearch (QA context storage)
"ES_QA": {
"url": "http://<es-host>:9200",
"index": "gratag_qa_context",
"auth": "<username>",
"passwd": "<password>"
},
# MongoDB
"MONGODB": {
"Host": "<mongodb-host>",
"Port": 27017,
"DB": "gratag",
"Username": "<username>",
"Password": "<password>",
"authDB": "admin"
},
# Milvus (vector search, optional)
"MILVUS": {
"host": "<milvus-host>",
"port": 19530,
"collection": "gratag_vectors"
},
# Reranking thresholds
"RERANK": {
"topk_es": 1000,
"topk_vec": 500,
"topk_rerank": 150
},
# External search API
"IAAR_DataBase": {
"url": "http://<search-api-host>/search"
}
}Also configure include/config/query_recommend_config.py for query recommendation settings.
1.3 Launch
# Development
python run.py --host 0.0.0.0 --port 10051
# Production (with gunicorn)
gunicorn -w 4 -b 0.0.0.0:10051 --timeout 300 run:app1.4 Docker Deployment
cd alg/src
docker build -t gratag-alg .
docker run -d \
--name gratag-alg \
-p 10051:10051 \
gratag-algThe algorithm Dockerfile (alg/src/Dockerfile) is based on the iaar/ainews:v4.1 image with Python 3.9, and automatically installs dependencies and the Chinese spaCy model. The exposed port range is 10000–20000.
1.5 Verify
curl -X POST http://localhost:10051/execute \
-H "Content-Type: application/json" \
-d '{"function": "recommend_query", "body": {"query": "test"}}'2.1 Environment Setup
cd backend
pip install -r requirements.txt2.2 Configuration
The backend supports two configuration modes: local config file or Nacos centralized config.
Option A: Local Configuration
Copy Backend/config/config.ini to Backend/config/config_local.ini and fill in all fields:
[DEFAULT]
Host = 0.0.0.0
Port = 5000
LOG_DIR = ./logs
TOKEN_KEY = <your-jwt-secret-key>
ALGORITHM_URL = http://<alg-host>:10051
[MONGO]
Host = <mongodb-host>
Port = 27017
DB = gratag
Username = <username>
Password = <password>
authDB = admin
[ES]
url = http://<es-host>:9200
auth = <username>
passwd = <password>
search_index = gratag_search
[PROMETHEUS]
enable_flask = True
process_name = gratag-backend
[MINIO]
url = http://<minio-host>:9000
access_key = <access-key>
secret_key = <secret-key>
[OSS]
endpoint = <oss-endpoint>
access_key_id = <access-key-id>
access_key_secret = <access-key-secret>
bucket_name = <bucket-name>
img_bucket = <img-bucket>
oss_env = prodThen set Nacos to use local mode in Backend/config/nacos_config.ini:
[NACOS]
REGISTRATION_SWITCH = false
LOCAL_CONFIG = trueOption B: Nacos Centralized Configuration
Edit Backend/config/nacos_config.ini:
[NACOS]
REGISTRATION_SWITCH = true
LOCAL_CONFIG = false
SERVER_ADDRESSES = <nacos-host>:8848
NAMESPACE = <namespace-id>
AK = <access-key>
SK = <secret-key>
DATA_ID = gratag-backend-config
GROUP = DEFAULT_GROUP
LOG_DIR = ./logs
SERVICE_NAME = gratag-backend
CLUSTER_NAME = DEFAULTPush the configuration JSON to Nacos with the same structure as config.ini sections (keys: default, mongo, ES, PROMETHEUS, MINIO, OSS, etc.).
2.3 Launch
cd Backend
# Development
python run.py
# Production (with gunicorn)
gunicorn -w 4 -b 0.0.0.0:5000 --timeout 300 "app:app"When Nacos registration is enabled, the backend automatically registers itself and sends heartbeats every 3 seconds. The environment variable NACOS_HOST_IP must be set to the host's accessible IP.
2.4 Docker Deployment
Using python:3.9-slim base image (recommended for fresh builds):
cd backend
docker build -f Dockerfile1 -t gratag-backend .
docker run -d \
--name gratag-backend \
-p 5000:5000 \
-e NACOS_HOST_IP=<host-ip> \
-v $(pwd)/Backend/logs:/app/Backend/logs \
gratag-backend \
python run.pyThis Dockerfile installs Java Runtime, LibreOffice (for document conversion), and additional dependencies including Azure Cognitive Services and MinIO client.
Using pre-built base image:
cd backend
docker build -t gratag-backend .
docker run -d \
--name gratag-backend \
-p 5000:5000 \
-e NACOS_HOST_IP=<host-ip> \
gratag-backend \
python run.py2.5 Verify
# Health check
curl http://localhost:5000/api/heartbeat
# Expected: {"status": "1"}2.6 Key Backend Features
- JWT Authentication: 30-day access token expiry, blacklist-enabled, login status validation with 7-day rolling expiry.
- CORS: Fully open (
*) for cross-origin requests. - Prometheus Monitoring: Enabled via config, exposes
/metricsendpoint. - Request Logging: All requests/responses logged to
logs/response.logwith user identity, timing, and full payloads. - Admin Access Control:
/admin/*routes requireaccess_type != 'normal'.
3.1 Environment Setup
cd frontend
# Uses pnpm-compatible .npmrc; npm/pnpm both work
npm install3.2 Environment Configuration
The frontend uses environment files to configure the backend API endpoint:
Development (.env.dev):
VITE_API=https://<dev-api-gateway>/back/
VITE_ENV=sit
Production (.env.prod):
VITE_API=https://<prod-api-gateway>/back/
VITE_ENV=prod
3.3 Launch
# Development (with hot-reload)
npm run dev
# Production build
npm run build
# Production server (Nuxt 3 SSR)
npm run start3.4 Docker Deployment
The project includes a .dockerignore for the frontend. A typical Dockerfile for the Nuxt 3 frontend:
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["node", ".output/server/index.mjs"]cd frontend
docker build -t gratag-frontend .
docker run -d \
--name gratag-frontend \
-p 3000:3000 \
gratag-frontend3.5 Reverse Proxy (Nginx Example)
In production, use a reverse proxy to unify frontend and backend under one domain:
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location /api/ {
proxy_pass http://localhost:5000/api/;
proxy_http_version 1.1;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# SSE streaming support
proxy_set_header Connection '';
proxy_buffering off;
proxy_cache off;
chunked_transfer_encoding on;
proxy_read_timeout 300s;
}
}- Infrastructure services (MongoDB, Elasticsearch, Milvus, LLM)
- Algorithm service (
alg/src/run.py) - Backend service (
backend/Backend/run.py) - Frontend (
frontend/)
| Variable | Service | Description |
|---|---|---|
NACOS_HOST_IP |
Backend | Host IP for Nacos service registration |
VITE_API |
Frontend | Backend API gateway URL |
VITE_ENV |
Frontend | Environment identifier (sit / prod) |
GraTAG maximizes parallelism (e.g., executing lateral sub-queries in GQD concurrently) and employs model fine-tuning and quantization to reduce latency.
| System | Latency (s) |
|---|---|
| GraTAG | 14.2 |
| Perplexity AI | 13.9 |
| Tiangong AI | 4.0 |
| Ernie Bot | 6.0 |
| KIMI | 2.8 |
| Metaso | 10.4 |
| ChatGLM | 7.9 |
| Baichuan | 6.1 |
| Tongyi | 10.4 |
GraTAG latency is measured on a cluster of 16 Muxi MXC500 GPUs (each ~70% computing power of NVIDIA A800). Baselines are measured via their publicly available interfaces.
GQD adopts a two-stage training procedure: supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO).
Prerequisites:
cd alg/src/model_training/GQD
pip install -r requirements.txtStage 1 — Supervised Fine-Tuning:
Training data format (JSONL):
{"query": "user query", "decomposition": "{'is_complex': True, 'sub_queries': [...], 'parent_child': [...]}"}python GQD_Stage_1_SFT.py \
--model_path Qwen/Qwen2.5-72B-Instruct \
--dataset_path ./data/sft_data.jsonl \
--output_dir ./outputs/stage1 \
--model_type qwen \
--lr 5e-5 \
--use_lora| Hyperparameter | Value |
|---|---|
| Base Model | Qwen2.5-72B-Instruct |
| Learning Rate | 5e-5 |
| Batch Size | 2–4 |
| Gradient Accumulation | 4–8 |
| Epochs | 3 |
| Max Sequence Length | 2048 |
| LoRA r / alpha | 16 / 32 |
Stage 2 — GRPO Alignment:
Training data format (JSONL):
{"query": "user query", "answer": "reference answer"}python GQD_Stage_2_GRPO.py \
--model_path ./outputs/stage1/final_model \
--dataset_path ./data/grpo_data.jsonl \
--output_dir ./outputs/stage2 \
--lr 5e-7 \
--K_samples 8| Hyperparameter | Value |
|---|---|
| Learning Rate | 5e-7 |
| K (sampled GQDs per query) | 8 |
| C (independent retrievals per GQD) | multiple |
| Beta KL (β) | 0.01 |
| Epsilon Clip (ε) | 0.2 |
| Temperature | 0.7 |
| Top-p | 0.9 |
| Evidence Cache Similarity | 0.95 |
One-click Training:
bash quick_start.shTAG employs a two-stage approach: (1) triplet extraction cold start via SFT, and (2) answer generation training with REINFORCE-based triplet alignment.
Stage 1 — Triplet Extraction Cold Start:
A strong teacher model (GPT-4o) first extracts high-quality relation triplets for each (sub-query, chunks) pair. The triplets encapsulate entity details, inter-entity relations, and implicit factual/logical dependencies. After manual quality verification, the target LLM is fine-tuned via SFT to produce concise triplets, using dedicated extraction tokens ⟨startextraction⟩ and ⟨endextraction⟩.
python TAG_Stage_1_train_lora_all_lr5e-5.py \
--warmup \
--model_path <qwen2.5-72b-instruct-path>| Hyperparameter | Value |
|---|---|
| Base Model | Qwen2.5-72B-Instruct |
| Teacher Model | GPT-4o |
| Learning Rate | 5e-5 |
| Batch Size | 1 |
Stage 2 — Answer Generation Training and Triplet Alignment:
The model learns to generate answers with and without triplet augmentation. A three-layer MLP with ReLU computes per-token weights ω from concatenated hidden states. The REINFORCE algorithm selects the most beneficial triplet for each sample, with a length-aware bonus encouraging concise triplets. The total loss is:
L = L_ans + α · L_REINFORCE
python TAG_Stage_2_train.py \
--model_path <stage1-model-path> \
--lr 5e-7 \
--n_passes 3 \
--n_ahead 200 \
--original_loss_weight 0.5| Hyperparameter | Value |
|---|---|
| Learning Rate | 5e-7 |
| α (REINFORCE weight) | 0.5 |
| γ (length-aware bonus) | > 0 |
| MLP layers | 3 (ReLU) |
| N Passes | 3 |
| N Ahead (lookahead tokens) | 200 |
Evaluation:
python pipeline_evaluation_new_exp.pyFor detailed technical references, see the docs/ directory:
| Document | Description |
|---|---|
| Module Reference | Core pipeline modules — GQD, TAG, retrieval, timeline, context management |
| API Reference | Backend REST endpoints, streaming protocol, search modes, data models |
| Configuration | Algorithm service and backend configuration parameters |
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
- NonCommercial — You may not use the material for commercial purposes.
For commercial licensing inquiries, please contact the authors.
This software is provided for academic and research purposes only. The authors make no warranties regarding the accuracy, completeness, or reliability of the software. Use of this software in any commercial product or service is strictly prohibited without prior written consent from the authors.
The evaluation benchmarks (SearchBench-1000, BrowseComp) and baseline results included in this repository are intended solely for research reproducibility. Any use of crawled data must comply with the respective platforms' terms of service.

