Beyond Benchmark Gaming: Multi-Model Consensus for Genuinely Capable AI
Using agreement among frontier models (Claude, GPT-5, Gemini) as a training signal to build AI that's genuinely capable, not just benchmark-optimized
Discover insights, tutorials, and thoughts on technology, homelab, and development.
Using agreement among frontier models (Claude, GPT-5, Gemini) as a training signal to build AI that's genuinely capable, not just benchmark-optimized
I stitched together Temporal, Mattermost, and three CLI agents to turn a paper into a shipped repo.
An overview of my ongoing research exploring transformer injectivity, self-improving models, and high-performance AI on an RTX 3090
How periodic sparse attention achieves O(n) complexity while maintaining model quality
A novel discovery: hidden state inversion quality predicts model capability, enabling self-improving systems without external feedback
A 7B model taught itself to generate better security commands using only its own understanding signals. No human labels, no external reward. Here's how and why it matters.