4 min read
SipIt: Extracting System Prompts from Transformer Hidden States
How I achieved 100% token recovery from Mistral-7B hidden states and what it means for AI security
transformers
security
interpretability
Read more
Discover insights, tutorials, and thoughts on technology, homelab, and development.
How I achieved 100% token recovery from Mistral-7B hidden states and what it means for AI security
Responsible disclosure of a class of vulnerabilities that allow system prompt extraction from transformer hidden states