[ Llm ]

Discover insights, tutorials, and thoughts on technology, homelab, and development.

Dec 28, 2025 5 min read

Beyond Benchmark Gaming: Multi-Model Consensus for Genuinely Capable AI

Using agreement among frontier models (Claude, GPT-5, Gemini) as a training signal to build AI that's genuinely capable, not just benchmark-optimized

ai-training consensus goodhart

Dec 18, 2025 5 min read

I Was Wrong About Self-Improving Models: Here's What I Actually Found

A follow-up to my retracted post on self-improving models. The fidelity metric improved 18% but actual performance dropped 11%. Here's what went wrong, what I learned about Goodhart's Law, and why reproducibility filtering might be the answer.

self-improvement machine-learning goodharts-law

Dec 18, 2025 7 min read

When Models Talk the Talk but Don't Walk the Walk: A Journey into LLM Behavioral Consistency

We fine-tuned a security agent to 100% skill differentiation in probing tests, but it collapsed to a single behavior in deployment. This gap led us to develop a trust diagnostic framework.

llm fine-tuning behavioral-consistency

Dec 16, 2025 9 min read

Self-Improving Models Without Labels: What I Just Proved and Why It Matters

A 7B model taught itself to generate better security commands using only its own understanding signals. No human labels, no external reward. Here's how and why it matters.

ai machine-learning self-improvement

Dec 16, 2025 5 min read

The Hidden State Attack: Why Your LLM's System Prompt Isn't Secret

Responsible disclosure of a class of vulnerabilities that allow system prompt extraction from transformer hidden states

security llm transformers