December 20, 2025 by Adam 3 min read

Symposion: My LLM Council for Research-Driven Builds

I stitched together Temporal, Mattermost, and three CLI agents to turn a paper into a shipped repo.

symposion llm-council temporal mattermost go agents homelab

The Problem: Research Is a Team Sport

Reading papers is easy. Turning them into real systems is not.

I wanted a pipeline that could take a research question, argue about it like a real team, produce a plan, scaffold a repo, and then peer review the output. Not in my head. Not in a chat log. In my infrastructure, with durable state and a visible audit trail.

Symposion: The LLM Council

Symposion is a multi-agent research council that coordinates Claude, Codex, and Gemini to debate, plan, build, and review. Temporal orchestrates the workflow. Mattermost shows the whole conversation. PostgreSQL + pgvector stores memory across sessions.

This is not a single-prompt toy. It is a workflow system with state, gates, and traceability.

Architecture at a Glance

User Prompt + Paper URL
        |
        v
Temporal Workflow  <---->  PostgreSQL + pgvector (memory)
        |
        +-- Claude CLI (analysis/build)
        +-- Codex CLI (review)
        +-- Gemini CLI (review)
        |
        v
Mattermost (debate, planning, build, reviews)

Workflow Phases

Initialization: Create topic + session, spin up Mattermost channels.
Debate: Three agents argue for 1-3 rounds, synthesize consensus.
Planning: Claude generates a structured implementation plan.
Scaffold: Repo is created with README, SPEC, and Claude config.
Build: Claude Code implements milestones (or runs async in tmux).
Review: Codex and Gemini review in parallel until approved.
Finalize: Summary + learnings are stored to memory.

The Goodhart Guardrail

I added a simple Goodhart risk detector based on cross-agent agreement. High confidence + low agreement is a warning signal. When the council gets too confident while disagreeing, the system flags it and tells me to slow down.

This is the difference between “looks right” and “is right.”

Triggering a Session

./bin/symposion-trigger \
  --topic "pi-attention" \
  --prompt "Build a sparse attention kernel in Mojo" \
  --paper "https://arxiv.org/abs/2511.10696" \
  --wait

When the scaffold is done, I can signal review with:

./bin/symposion-trigger --signal-review --workflow-id <id>

What Works Today

Durable orchestration with Temporal (no lost context)
Mattermost channels for every phase
Debate synthesis + consensus scoring
Repo scaffolding with specs and Claude tooling
Parallel peer review from two independent agents

What I Am Fixing Next

Make the “human review” gate real instead of a stub
Fail fast if the database is missing instead of pretending
Harden embeddings and memory writes with better error handling

Credits / Influences

Andrew Karpathy’s llm-council: https://github.com/karpathy/llm-council
Transformer injectivity paper: https://arxiv.org/html/2510.15511v3

Why This Matters

If you want to do serious AI work at home, you need more than a clever prompt. You need structure, memory, and a way to keep yourself honest. Symposion is my answer to that.

Next up: hardening the council and shipping the UI so the whole thing is visible at a glance.