Symposion: My LLM Council for Research-Driven Builds
I stitched together Temporal, Mattermost, and three CLI agents to turn a paper into a shipped repo.
Homelab adventures, infrastructure deep-dives, and lessons learned building enterprise-grade systems on a budget
I stitched together Temporal, Mattermost, and three CLI agents to turn a paper into a shipped repo.
We fine-tuned a security agent to 100% skill differentiation in probing tests, but it collapsed to a single behavior in deployment. This gap led us to develop a trust diagnostic framework.
A follow-up to my retracted post on self-improving models. The fidelity metric improved 18% but actual performance dropped 11%. Here's what went wrong, what I learned about Goodhart's Law, and why reproducibility filtering might be the answer.
How I built custom Mojo kernels for AI research that outperform PyTorch on consumer hardware
Responsible disclosure of a class of vulnerabilities that allow system prompt extraction from transformer hidden states
How I achieved 100% token recovery from Mistral-7B hidden states and what it means for AI security