Symposion: My LLM Council for Research-Driven Builds
I stitched together Temporal, Mattermost, and three CLI agents to turn a paper into a shipped repo.
The Problem: Research Is a Team Sport
Reading papers is easy. Turning them into real systems is not.
I wanted a pipeline that could take a research question, argue about it like a real team, produce a plan, scaffold a repo, and then peer review the output. Not in my head. Not in a chat log. In my infrastructure, with durable state and a visible audit trail.
Symposion: The LLM Council
Symposion is a multi-agent research council that coordinates Claude, Codex, and Gemini to debate, plan, build, and review. Temporal orchestrates the workflow. Mattermost shows the whole conversation. PostgreSQL + pgvector stores memory across sessions.
This is not a single-prompt toy. It is a workflow system with state, gates, and traceability.
Architecture at a Glance
User Prompt + Paper URL
|
v
Temporal Workflow <----> PostgreSQL + pgvector (memory)
|
+-- Claude CLI (analysis/build)
+-- Codex CLI (review)
+-- Gemini CLI (review)
|
v
Mattermost (debate, planning, build, reviews)
Workflow Phases
- Initialization: Create topic + session, spin up Mattermost channels.
- Debate: Three agents argue for 1-3 rounds, synthesize consensus.
- Planning: Claude generates a structured implementation plan.
- Scaffold: Repo is created with README, SPEC, and Claude config.
- Build: Claude Code implements milestones (or runs async in tmux).
- Review: Codex and Gemini review in parallel until approved.
- Finalize: Summary + learnings are stored to memory.
The Goodhart Guardrail
I added a simple Goodhart risk detector based on cross-agent agreement. High confidence + low agreement is a warning signal. When the council gets too confident while disagreeing, the system flags it and tells me to slow down.
This is the difference between “looks right” and “is right.”
Triggering a Session
./bin/symposion-trigger \
--topic "pi-attention" \
--prompt "Build a sparse attention kernel in Mojo" \
--paper "https://arxiv.org/abs/2511.10696" \
--wait
When the scaffold is done, I can signal review with:
./bin/symposion-trigger --signal-review --workflow-id <id>
What Works Today
- Durable orchestration with Temporal (no lost context)
- Mattermost channels for every phase
- Debate synthesis + consensus scoring
- Repo scaffolding with specs and Claude tooling
- Parallel peer review from two independent agents
What I Am Fixing Next
- Make the “human review” gate real instead of a stub
- Fail fast if the database is missing instead of pretending
- Harden embeddings and memory writes with better error handling
Credits / Influences
- Andrew Karpathy’s
llm-council: https://github.com/karpathy/llm-council - Transformer injectivity paper: https://arxiv.org/html/2510.15511v3
Why This Matters
If you want to do serious AI work at home, you need more than a clever prompt. You need structure, memory, and a way to keep yourself honest. Symposion is my answer to that.
Next up: hardening the council and shipping the UI so the whole thing is visible at a glance.