[ Llm ]

Discover insights, tutorials, and thoughts on technology, homelab, and development.

May 16, 2026 4 min read

What If a Model Could Remember What It Learned?

We're building an adaptive memory system for AI inference — memory that operates at the activation level, not the token level. Early Gemma-4-31B results: a 3.12 selectivity ratio for on-topic vs adversarial recall. Provisional patent filed.

ai-research machine-learning llm

May 12, 2026 9 min read

I Was Wrong About All Three Dormant Models

Jane Street published the Dormant LLM Challenge answer key today. My March submission claimed all three triggers. The answer key disagrees on every model. Here's what I actually got wrong, the methodology error that drove it, and what I'd do differently with the benefit of hindsight.

ai-research machine-learning security

Mar 23, 2026 15 min read

Solving the Jane Street Dormant LLM Challenge: A Systematic Approach to Backdoor Discovery

How we solved all 3 backdoored DeepSeek V3 models using SVD weight analysis, persistent homology, multi-model deliberation, and 5,000+ indexed probes

ai-research machine-learning security

Dec 28, 2025 5 min read

Beyond Benchmark Gaming: Multi-Model Consensus for Genuinely Capable AI

Using agreement among frontier models (Claude, GPT-5, Gemini) as a training signal to build AI that's genuinely capable, not just benchmark-optimized

ai-training consensus goodhart

Dec 18, 2025 5 min read

I Was Wrong About Self-Improving Models: Here's What I Actually Found

A follow-up to my retracted post on self-improving models. The fidelity metric improved 18% but actual performance dropped 11%. Here's what went wrong, what I learned about Goodhart's Law, and why reproducibility filtering might be the answer.

self-improvement machine-learning goodharts-law

Dec 18, 2025 7 min read

When Models Talk the Talk but Don't Walk the Walk: A Journey into LLM Behavioral Consistency

We fine-tuned a security agent to 100% skill differentiation in probing tests, but it collapsed to a single behavior in deployment. This gap led us to develop a trust diagnostic framework.

llm fine-tuning behavioral-consistency