PROJECT INTEL

Shahnameh AI Agent

A RAG agent over the 60,000-couplet Shahnameh — citation-grounded answers where frontier LLMs guess.

ACTIVE SINCE:: 2025 — present
STATUS:: ACTIVE

FIREPOWER: 7/10
ARMOR: 7/10
SPEED: 6/10
SPECIAL: 10/10

A retrieval-augmented agent over Ferdowsi's Shahnameh, the foundational epic of Persian literature. This is exactly where frontier models fail: training data is sparse, the domain demands real scholarship, and confident hallucination is worse than no answer. So every answer is grounded in the actual verses plus expert scholarly interpretation, with citations.

The ingestion pipeline is end-to-end: scholarly audio is transcribed with Whisper (large-v3), normalised from SRT, semantically chunked, embedded, and loaded into a Postgres/pgvector store — raw audio to vector DB to grounded answers. A custom MCP server, built on top of a collaborator's API, exposes verse retrieval and interpretation tools to any MCP-compatible LLM client.

BATTLE RECORD

Citation-grounded answers over a 60,000-couplet corpus
Audio → text → vectors: Whisper large-v3, SRT normalisation, semantic chunking, pgvector
Custom MCP server: verse retrieval and interpretation for any MCP client
Expert scholarly interpretation, not just raw verses
Solves what frontier LLMs can't: precise answers where training data is sparse

TECH

Python
Whisper large-v3
Postgres/pgvector
MCP
RAG
Semantic chunking

BATTLE RECORD

TECH

LINKS