GitHub - pixeltable/pixelbot: Multimodal AI agent, an interactive data studio with on-demand ML inference, media generation, and a database explore

Open-source sandbox for exploring everything Pixeltable can do

Pixelbot wires up tables, views, computed columns, embedding indexes, UDFs, tool calling, similarity search, version control, and model orchestration into a single full-stack app — so we can stress-test Pixeltable and ship what we learn as cookbooks.

Features

Chat — Multimodal RAG agent

Semantic search across documents, images, video frames, and audio via .similarity() on embedding indexes. Tool calling with external APIs (NewsAPI, yfinance, DuckDuckGo). Inline image generation (Imagen 4.0 / DALL-E 3), video generation (Veo 3.0), and text-to-speech (OpenAI TTS with 6 voice options). Follow-up suggestions via Gemini structured output with response_schema. Personas with adjustable system prompts and LLM parameters. Persistent chat history and memory bank.

Prompt Lab — Multi-model experimentation

Run the same prompt against Claude, Gemini, Mistral, and GPT-4o in parallel via ThreadPoolExecutor. Editable model IDs — override presets or add custom models. Response time, word count, and character count metrics with "Fastest" highlight and normalized comparison bars. Every experiment stored in agents.prompt_experiments for replay.

Studio — File explorer + data wrangler

Documents: Auto-summaries (Gemini structured JSON), sentence-level chunks
Images: PIL transforms with live preview, save or download
Videos: Keyframe extraction, clip creation, text overlay, scene detection, transcriptions
Audio: Transcriptions with sentence-level breakdown
CSV: Inline CRUD, infinite undo via table.revert(), version history via table.get_versions()
Detection & Segmentation: On-demand DETR (ResNet-50/101) with SVG bounding boxes, DETR Panoptic segmentation with color-coded regions, ViT classification with confidence bars
Search: Cross-modal semantic search via .similarity() on embedding indexes
Embedding map: Interactive 2D UMAP projection of text/visual embedding spaces

Media Library — Gallery + AI editing

Gallery for generated images and videos. Save to collection triggers CLIP embedding, keyframe extraction, transcription, and RAG indexing automatically. Reve AI editing via reve.edit() (natural language instructions) and reve.remix() (creative blending) with side-by-side preview.

Developer — Export, API reference, SDK, MCP

Export: Download any table as JSON, CSV, or Parquet with row-limit control and live preview
API: Categorized endpoint browser with method badges and expandable curl examples
SDK: Python code snippets — connect, query, semantic search, export to Pandas, versioning
Connect: MCP server config for Claude/Cursor, direct Python access, REST API examples

Database — Catalog explorer

Tables and views grouped by type (Agent Pipeline, Documents, Images, Videos, Audio, Generation, Memory, Data Tables). Schema inspection with computed vs. insertable column badges. Paginated row browser with client-side search, row filter, and CSV download. Cross-table join panel (INNER/LEFT/CROSS) with table/column pickers and result preview.

Architecture — Interactive diagram

React Flow diagram with 38 nodes and 40 edges in swim-lane layout. Click any node to highlight its connections. Covers the full data flow: document chunking, image CLIP, video dual pipeline, audio transcription, 11-step agent pipeline, generation, and feedback edges.

History & Memory

Searchable conversation history with workflow detail dialog and JSON export. Unified timeline across all timestamped Pixeltable tables. Memory bank with semantic search and manual entry.

Pixeltable Coverage

Every row maps to a Pixeltable feature exercised in this app:

Feature	Usage	Docs
Tables + multimodal types	`Document`, `Image`, `Video`, `Audio`, `Json`	Tables
Computed columns	11-step agent pipeline, thumbnails, summarization	Computed Columns
Views + iterators	`DocumentSplitter`, `FrameIterator`, `AudioSplitter`	Iterators
Embedding indexes	E5-large-instruct, CLIP ViT-B/32 → `.similarity()`	Embedding Indexes
`@pxt.udf`	News API, financial data, context assembly	UDFs
`@pxt.query`	`search_documents`, `search_images`, `search_video_frames`	RAG
`pxt.tools()` + `invoke_tools()`	Agent tool selection + execution	Tool Calling
Agent memory	Chat history + memory bank with embedding search	Memory
LLM integrations	Anthropic, Google, OpenAI, Mistral	Integrations
Reve AI	`reve.edit()` / `reve.remix()` for image editing	Reve
PIL transforms	Resize, rotate, blur, sharpen, edge detect	PIL
Video UDFs	`extract_frame`, `clip`, `overlay_text`, `scene_detect_content`	Video
Document processing	Gemini structured-JSON summarization, chunking	Chunking
CSV / tabular data	Dynamic table creation, inline CRUD, type coercion	CSV Import
Object detection	On-demand DETR with bounding box overlay	Detection
Panoptic segmentation	DETR Panoptic with color-coded segment regions	Segmentation
Text-to-speech	OpenAI TTS computed column with 6 voice options	TTS
Cross-table joins	`table.join()` with inner/left/cross modes	Joins
Table versioning	`tbl.revert()`, `tbl.get_versions()`	Versioning
Structured output	Gemini `response_schema` + Pydantic models	Structured Output
Catalog introspection	`pxt.list_tables()`, `tbl.columns()`, `tbl.count()`	Tables
Data export	JSON, CSV, Parquet via `/api/export/`	Export
MCP	Config for Claude, Cursor, AI IDEs	MCP

Getting Started

Prerequisites: Python 3.10+, Node.js 18+

Required: ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY Optional: MISTRAL_API_KEY, REVE_API_KEY, NEWS_API_KEY

All providers are swappable. Pixeltable supports local runtimes and 20+ integrations.

# Install
cd backend && python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
cd ../frontend && npm install

# Configure — create backend/.env with your API keys

# Run
cd backend && python setup_pixeltable.py   # first time only
python main.py                             # :8000
cd ../frontend && npm run dev              # :5173 → proxies /api to :8000

Production: cd frontend && npm run build → backend/static/, then python main.py serves at :8000.

Project Structure

backend/
├── main.py                 FastAPI app, CORS, static serving
├── config.py               model IDs, system prompts, LLM parameters
├── models.py               Pydantic request/response schemas
├── functions.py            @pxt.udf and @pxt.query definitions
├── setup_pixeltable.py     full schema (tables, views, columns, indexes)
└── routers/
    ├── chat.py             11-step agent workflow
    ├── studio.py           transforms, detection, segmentation, CSV, Reve, embeddings
    ├── images.py           Imagen/DALL-E/Veo generation, TTS
    ├── experiments.py      parallel multi-model prompt runs
    ├── export.py           JSON/CSV/Parquet for any table
    ├── database.py         catalog introspection, timeline, joins
    ├── files.py            upload, URL import
    ├── history.py          conversation detail, debug export
    ├── memory.py           memory bank CRUD
    └── personas.py         persona CRUD

frontend/src/
├── components/
│   ├── chat/               agent UI, personas, image/video/voice modes
│   ├── experiments/        prompt lab, model select, metrics
│   ├── studio/             file browser, transforms, CSV, detection, segmentation, embedding map
│   ├── developer/          export, API reference, SDK snippets, MCP config
│   ├── database/           catalog browser, search, filter, download, joins
│   ├── architecture/       React Flow diagram (38 nodes, swim lanes)
│   ├── images/             media library, Reve edit/remix
│   ├── history/            conversations, timeline
│   ├── memory/             memory bank
│   └── settings/           persona editor
├── lib/api.ts              typed fetch wrapper
└── types/index.ts          shared interfaces

Related Projects

Project	Description
Pixeltable	The core library — declarative AI data infrastructure
Pixelagent	Lightweight agent framework with built-in memory
Pixelmemory	Persistent memory layer for AI apps
MCP Server	Model Context Protocol server for Claude, Cursor, AI IDEs

Contributing

Rough edges are expected. If you find a Pixeltable feature that's missing or awkward, open an issue or PR.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github		.github
backend		backend
docs		docs
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
CODEGASE_GUIDE.md		CODEGASE_GUIDE.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Pixeltable Coverage

Getting Started

Project Structure

Related Projects

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

pixeltable/pixelbot

Folders and files

Latest commit

History

Repository files navigation

Features

Pixeltable Coverage

Getting Started

Project Structure

Related Projects

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages