PROJECT INTEL
Shahnameh AI Agent
A RAG agent over the 60,000-couplet Shahnameh — citation-grounded answers where frontier LLMs guess.
- ACTIVE SINCE:
- 2025 — present
- STATUS:
- ACTIVE
- FIREPOWER
- 7/10
- ARMOR
- 7/10
- SPEED
- 6/10
- SPECIAL
- 10/10
A retrieval-augmented agent over Ferdowsi's Shahnameh, the foundational epic of Persian literature. This is exactly where frontier models fail: training data is sparse, the domain demands real scholarship, and confident hallucination is worse than no answer. So every answer is grounded in the actual verses plus expert scholarly interpretation, with citations.
The ingestion pipeline is end-to-end: scholarly audio is transcribed with Whisper (large-v3), normalised from SRT, semantically chunked, embedded, and loaded into a Postgres/pgvector store — raw audio to vector DB to grounded answers. A custom MCP server, built on top of a collaborator's API, exposes verse retrieval and interpretation tools to any MCP-compatible LLM client.
BATTLE RECORD
- Citation-grounded answers over a 60,000-couplet corpus
- Audio → text → vectors: Whisper large-v3, SRT normalisation, semantic chunking, pgvector
- Custom MCP server: verse retrieval and interpretation for any MCP client
- Expert scholarly interpretation, not just raw verses
- Solves what frontier LLMs can't: precise answers where training data is sparse
TECH
- Python
- Whisper large-v3
- Postgres/pgvector
- MCP
- RAG
- Semantic chunking