GraTAG: Production AI Search via Graph-Based Query Decomposition and Triplet-Aligned Generation with Rich Multimodal Representations

GraTAG is an end-to-end production-ready AI search engine framework that addresses key challenges in relevance, comprehensiveness, and presentation through three core innovations:

Graph-Based Query Decomposition (GQD) — decomposes complex queries into atomic sub-queries represented as a directed acyclic graph (DAG), capturing parallel and joint dependencies for finer-grained reasoning. The GQD model is post-trained via SFT followed by GRPO alignment with RAG task performance.
Triplet-Aligned Generation (TAG) — extracts relation triplets from retrieved documents and aligns them with the answer generation process to bridge missing logic across chunks, enhancing coherence and mitigating hallucination. TAG employs a cold-start triplet extraction stage followed by REINFORCE-based triplet alignment.
Rich Multimodal Presentations — integrates timeline visualization and textual-visual choreography (image-text matching via Hungarian algorithm) to reduce cognitive load and enhance information verification.

Evaluated on 1,000 recent real-world queries with over 243,000 expert ratings across 9 criteria, GraTAG outperforms eight existing systems in human expert assessments. Compared to the strongest baseline, GraTAG improves comprehensiveness by 10.8%, insightfulness by 7.9%, and the overall average score by 4.8%. On the public benchmark BrowseComp, GraTAG outperforms the best baseline by 17.3%.

Architecture Overview

GraTAG adopts a three-tier architecture. The services communicate via HTTP: Frontend → Backend API → Algorithm Service.

Directory	Description
`alg/`	Algorithm service — core AI pipeline implementing GQD, TAG, multi-source retrieval, and multimodal presentation (Flask + NetworkX + Transformers)
`backend/`	Backend service — RESTful API layer handling user management, QA session persistence, and algorithm service orchestration (Flask + MongoEngine + JWT)
`frontend/`	Frontend application — search interface with streaming answer display, timeline visualization, and document preview (Nuxt 3 + TypeScript + SCSS)
`experiments/`	Evaluation benchmarks (SearchBench-1000, BrowseComp) and baseline answer collection via browser automation (Playwright + GPT-4o)
`docs/`	Technical documentation — module reference, API reference, and configuration guide
`assets/`	Static resources — images and figures used in documentation

The algorithm service exposes two endpoints (/execute and /stream_execute) for synchronous and streaming invocations respectively. Key sub-directories under alg/src/ include pipeline/ (orchestration), modules/ (GQD, TAG, retrieval, timeline), model_training/ (GQD and TAG training scripts), and include/ (shared config and context management).

Infrastructure Dependencies

Service	Purpose	Version
MongoDB	Application data storage (QA sessions, users, subscriptions)	4.x+
Elasticsearch	Context persistence, search indexing, and full-text retrieval	7.10+
Milvus	Vector similarity search for dense retrieval	2.4+
LLM Inference	Language model serving (vLLM / HuggingFace TGI compatible)	—
Nacos	Service discovery and registration (optional)	1.x
OSS / MinIO	Object storage for documents and images	—

Deployment Guide

Prerequisites: Infrastructure Services

Before deploying GraTAG, ensure the following infrastructure services are running and accessible:

Service	Required	Default Port	Setup
MongoDB	Yes	27017	Stores user data, QA sessions, subscriptions. Create a database (e.g. `gratag`) with authentication.
Elasticsearch	Yes	9200	Stores QA context for multi-turn conversations and provides full-text retrieval. Version 7.10+ recommended.
Milvus	Optional	19530	Provides dense vector retrieval. Required for vector similarity search in the recall stage. Version 2.4+ recommended.
LLM Inference	Yes	—	vLLM or HuggingFace TGI compatible endpoint serving models such as Qwen2.5-72B-Instruct.
OSS / MinIO	Optional	9000 (MinIO)	Object storage for uploaded documents and images. Required for document QA mode.
Nacos	Optional	8848	Service discovery and centralized configuration management. Can be replaced by local config files.

Step 1: Deploy Algorithm Service

1.1 Environment Setup

cd alg/src

conda create -n gratag python=3.9 -y
conda activate gratag

pip install -r requirements.txt

# Chinese NLP model (required for spaCy tokenization)
pip install zh_core_web_sm-3.8.0.tar.gz

1.2 Configuration

Edit include/config/common_config.py with your infrastructure endpoints:

CommonConfig = {
    # LLM inference endpoints
    "FSCHAT": {
        "vllm_url": "http://<llm-host>:8000/v1",
        "hf_url":   "http://<llm-host>:8001"
    },

    # Elasticsearch (QA context storage)
    "ES_QA": {
        "url":   "http://<es-host>:9200",
        "index": "gratag_qa_context",
        "auth":  "<username>",
        "passwd": "<password>"
    },

    # MongoDB
    "MONGODB": {
        "Host": "<mongodb-host>",
        "Port": 27017,
        "DB":   "gratag",
        "Username": "<username>",
        "Password": "<password>",
        "authDB":   "admin"
    },

    # Milvus (vector search, optional)
    "MILVUS": {
        "host": "<milvus-host>",
        "port": 19530,
        "collection": "gratag_vectors"
    },

    # Reranking thresholds
    "RERANK": {
        "topk_es":     1000,
        "topk_vec":    500,
        "topk_rerank": 150
    },

    # External search API
    "IAAR_DataBase": {
        "url": "http://<search-api-host>/search"
    }
}

Also configure include/config/query_recommend_config.py for query recommendation settings.

1.3 Launch

# Development
python run.py --host 0.0.0.0 --port 10051

# Production (with gunicorn)
gunicorn -w 4 -b 0.0.0.0:10051 --timeout 300 run:app

1.4 Docker Deployment

cd alg/src
docker build -t gratag-alg .
docker run -d \
  --name gratag-alg \
  -p 10051:10051 \
  gratag-alg

The algorithm Dockerfile (alg/src/Dockerfile) is based on the iaar/ainews:v4.1 image with Python 3.9, and automatically installs dependencies and the Chinese spaCy model. The exposed port range is 10000–20000.

1.5 Verify

curl -X POST http://localhost:10051/execute \
  -H "Content-Type: application/json" \
  -d '{"function": "recommend_query", "body": {"query": "test"}}'

Step 2: Deploy Backend Service

2.1 Environment Setup

cd backend

pip install -r requirements.txt

2.2 Configuration

The backend supports two configuration modes: local config file or Nacos centralized config.

Option A: Local Configuration

Copy Backend/config/config.ini to Backend/config/config_local.ini and fill in all fields:

[DEFAULT]
Host = 0.0.0.0
Port = 5000
LOG_DIR = ./logs
TOKEN_KEY = <your-jwt-secret-key>
ALGORITHM_URL = http://<alg-host>:10051

[MONGO]
Host = <mongodb-host>
Port = 27017
DB = gratag
Username = <username>
Password = <password>
authDB = admin

[ES]
url = http://<es-host>:9200
auth = <username>
passwd = <password>
search_index = gratag_search

[PROMETHEUS]
enable_flask = True
process_name = gratag-backend

[MINIO]
url = http://<minio-host>:9000
access_key = <access-key>
secret_key = <secret-key>

[OSS]
endpoint = <oss-endpoint>
access_key_id = <access-key-id>
access_key_secret = <access-key-secret>
bucket_name = <bucket-name>
img_bucket = <img-bucket>
oss_env = prod

Then set Nacos to use local mode in Backend/config/nacos_config.ini:

[NACOS]
REGISTRATION_SWITCH = false
LOCAL_CONFIG = true

Option B: Nacos Centralized Configuration

Edit Backend/config/nacos_config.ini:

[NACOS]
REGISTRATION_SWITCH = true
LOCAL_CONFIG = false
SERVER_ADDRESSES = <nacos-host>:8848
NAMESPACE = <namespace-id>
AK = <access-key>
SK = <secret-key>
DATA_ID = gratag-backend-config
GROUP = DEFAULT_GROUP
LOG_DIR = ./logs
SERVICE_NAME = gratag-backend
CLUSTER_NAME = DEFAULT

Push the configuration JSON to Nacos with the same structure as config.ini sections (keys: default, mongo, ES, PROMETHEUS, MINIO, OSS, etc.).

2.3 Launch

cd Backend

# Development
python run.py

# Production (with gunicorn)
gunicorn -w 4 -b 0.0.0.0:5000 --timeout 300 "app:app"

When Nacos registration is enabled, the backend automatically registers itself and sends heartbeats every 3 seconds. The environment variable NACOS_HOST_IP must be set to the host's accessible IP.

2.4 Docker Deployment

Using python:3.9-slim base image (recommended for fresh builds):

cd backend
docker build -f Dockerfile1 -t gratag-backend .
docker run -d \
  --name gratag-backend \
  -p 5000:5000 \
  -e NACOS_HOST_IP=<host-ip> \
  -v $(pwd)/Backend/logs:/app/Backend/logs \
  gratag-backend \
  python run.py

This Dockerfile installs Java Runtime, LibreOffice (for document conversion), and additional dependencies including Azure Cognitive Services and MinIO client.

Using pre-built base image:

cd backend
docker build -t gratag-backend .
docker run -d \
  --name gratag-backend \
  -p 5000:5000 \
  -e NACOS_HOST_IP=<host-ip> \
  gratag-backend \
  python run.py

2.5 Verify

# Health check
curl http://localhost:5000/api/heartbeat
# Expected: {"status": "1"}

2.6 Key Backend Features

JWT Authentication: 30-day access token expiry, blacklist-enabled, login status validation with 7-day rolling expiry.
CORS: Fully open (*) for cross-origin requests.
Prometheus Monitoring: Enabled via config, exposes /metrics endpoint.
Request Logging: All requests/responses logged to logs/response.log with user identity, timing, and full payloads.
Admin Access Control: /admin/* routes require access_type != 'normal'.

Step 3: Deploy Frontend

3.1 Environment Setup

cd frontend

# Uses pnpm-compatible .npmrc; npm/pnpm both work
npm install

3.2 Environment Configuration

The frontend uses environment files to configure the backend API endpoint:

Development (.env.dev):

VITE_API=https://<dev-api-gateway>/back/
VITE_ENV=sit

Production (.env.prod):

VITE_API=https://<prod-api-gateway>/back/
VITE_ENV=prod

3.3 Launch

# Development (with hot-reload)
npm run dev

# Production build
npm run build

# Production server (Nuxt 3 SSR)
npm run start

3.4 Docker Deployment

The project includes a .dockerignore for the frontend. A typical Dockerfile for the Nuxt 3 frontend:

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["node", ".output/server/index.mjs"]

cd frontend
docker build -t gratag-frontend .
docker run -d \
  --name gratag-frontend \
  -p 3000:3000 \
  gratag-frontend

3.5 Reverse Proxy (Nginx Example)

In production, use a reverse proxy to unify frontend and backend under one domain:

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }

    location /api/ {
        proxy_pass http://localhost:5000/api/;
        proxy_http_version 1.1;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # SSE streaming support
        proxy_set_header Connection '';
        proxy_buffering off;
        proxy_cache off;
        chunked_transfer_encoding on;
        proxy_read_timeout 300s;
    }
}

Startup Order

Infrastructure services (MongoDB, Elasticsearch, Milvus, LLM)
Algorithm service (alg/src/run.py)
Backend service (backend/Backend/run.py)
Frontend (frontend/)

Environment Variables

Variable	Service	Description
`NACOS_HOST_IP`	Backend	Host IP for Nacos service registration
`VITE_API`	Frontend	Backend API gateway URL
`VITE_ENV`	Frontend	Environment identifier (`sit` / `prod`)

Test-Time Latency

GraTAG maximizes parallelism (e.g., executing lateral sub-queries in GQD concurrently) and employs model fine-tuning and quantization to reduce latency.

System	Latency (s)
GraTAG	14.2
Perplexity AI	13.9
Tiangong AI	4.0
Ernie Bot	6.0
KIMI	2.8
Metaso	10.4
ChatGLM	7.9
Baichuan	6.1
Tongyi	10.4

GraTAG latency is measured on a cluster of 16 Muxi MXC500 GPUs (each ~70% computing power of NVIDIA A800). Baselines are measured via their publicly available interfaces.

Model Training

GQD Training

GQD adopts a two-stage training procedure: supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO).

Prerequisites:

cd alg/src/model_training/GQD
pip install -r requirements.txt

Stage 1 — Supervised Fine-Tuning:

Training data format (JSONL):

{"query": "user query", "decomposition": "{'is_complex': True, 'sub_queries': [...], 'parent_child': [...]}"}

python GQD_Stage_1_SFT.py \
    --model_path Qwen/Qwen2.5-72B-Instruct \
    --dataset_path ./data/sft_data.jsonl \
    --output_dir ./outputs/stage1 \
    --model_type qwen \
    --lr 5e-5 \
    --use_lora

Hyperparameter	Value
Base Model	Qwen2.5-72B-Instruct
Learning Rate	5e-5
Batch Size	2–4
Gradient Accumulation	4–8
Epochs	3
Max Sequence Length	2048
LoRA r / alpha	16 / 32

Stage 2 — GRPO Alignment:

Training data format (JSONL):

{"query": "user query", "answer": "reference answer"}

python GQD_Stage_2_GRPO.py \
    --model_path ./outputs/stage1/final_model \
    --dataset_path ./data/grpo_data.jsonl \
    --output_dir ./outputs/stage2 \
    --lr 5e-7 \
    --K_samples 8

Hyperparameter	Value
Learning Rate	5e-7
K (sampled GQDs per query)	8
C (independent retrievals per GQD)	multiple
Beta KL (β)	0.01
Epsilon Clip (ε)	0.2
Temperature	0.7
Top-p	0.9
Evidence Cache Similarity	0.95

One-click Training:

bash quick_start.sh

TAG Training

TAG employs a two-stage approach: (1) triplet extraction cold start via SFT, and (2) answer generation training with REINFORCE-based triplet alignment.

Stage 1 — Triplet Extraction Cold Start:

A strong teacher model (GPT-4o) first extracts high-quality relation triplets for each (sub-query, chunks) pair. The triplets encapsulate entity details, inter-entity relations, and implicit factual/logical dependencies. After manual quality verification, the target LLM is fine-tuned via SFT to produce concise triplets, using dedicated extraction tokens ⟨startextraction⟩ and ⟨endextraction⟩.

python TAG_Stage_1_train_lora_all_lr5e-5.py \
    --warmup \
    --model_path <qwen2.5-72b-instruct-path>

Hyperparameter	Value
Base Model	Qwen2.5-72B-Instruct
Teacher Model	GPT-4o
Learning Rate	5e-5
Batch Size	1

Stage 2 — Answer Generation Training and Triplet Alignment:

The model learns to generate answers with and without triplet augmentation. A three-layer MLP with ReLU computes per-token weights ω from concatenated hidden states. The REINFORCE algorithm selects the most beneficial triplet for each sample, with a length-aware bonus encouraging concise triplets. The total loss is:

L = L_ans + α · L_REINFORCE

python TAG_Stage_2_train.py \
    --model_path <stage1-model-path> \
    --lr 5e-7 \
    --n_passes 3 \
    --n_ahead 200 \
    --original_loss_weight 0.5

Hyperparameter	Value
Learning Rate	5e-7
α (REINFORCE weight)	0.5
γ (length-aware bonus)	> 0
MLP layers	3 (ReLU)
N Passes	3
N Ahead (lookahead tokens)	200

Evaluation:

python pipeline_evaluation_new_exp.py

Documentation

For detailed technical references, see the docs/ directory:

Document	Description
Module Reference	Core pipeline modules — GQD, TAG, retrieval, timeline, context management
API Reference	Backend REST endpoints, streaming protocol, search modes, data models
Configuration	Algorithm service and backend configuration parameters

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material

Under the following terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
NonCommercial — You may not use the material for commercial purposes.

For commercial licensing inquiries, please contact the authors.

Disclaimer

This software is provided for academic and research purposes only. The authors make no warranties regarding the accuracy, completeness, or reliability of the software. Use of this software in any commercial product or service is strictly prohibited without prior written consent from the authors.

The evaluation benchmarks (SearchBench-1000, BrowseComp) and baseline results included in this repository are intended solely for research reproducibility. Any use of crawled data must comply with the respective platforms' terms of service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraTAG: Production AI Search via Graph-Based Query Decomposition and Triplet-Aligned Generation with Rich Multimodal Representations

Table of Contents

Architecture Overview

Infrastructure Dependencies

Deployment Guide

Prerequisites: Infrastructure Services

Step 1: Deploy Algorithm Service

Step 2: Deploy Backend Service

Step 3: Deploy Frontend

Startup Order

Environment Variables

Test-Time Latency

Model Training

GQD Training

TAG Training

Documentation

License

Disclaimer

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
alg		alg
assets		assets
backend		backend
docs		docs
experiments		experiments
frontend		frontend
LICENSE		LICENSE
README.md		README.md

License

tangbotony/GraTAG

Folders and files

Latest commit

History

Repository files navigation

GraTAG: Production AI Search via Graph-Based Query Decomposition and Triplet-Aligned Generation with Rich Multimodal Representations

Table of Contents

Architecture Overview

Infrastructure Dependencies

Deployment Guide

Prerequisites: Infrastructure Services

Step 1: Deploy Algorithm Service

Step 2: Deploy Backend Service

Step 3: Deploy Frontend

Startup Order

Environment Variables

Test-Time Latency

Model Training

GQD Training

TAG Training

Documentation

License

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages