When Models Talk the Talk but Don't Walk the Walk: A Journey into LLM Behavioral Consistency
We fine-tuned a security agent to 100% skill differentiation in probing tests, but it collapsed to a single behavior in deployment. This gap led us to develop a trust diagnostic framework.