Creating an anomaly based detection system for AI agents

As part of my effort to do a weekly blog post on LLM security or security in general, I invite you to read my newest one.

tl;dr:

After thinking of the Traveling Salesman Problem, I thought about how we can transfer the application of optimization solutions to these problems, to a security analysis of the paths of tool invocations that LLM agents take.

Pro: could flag paths that begin with read_email action, and end with delete_user action.

Con: would not flag generic read_email -> send_email paths, which could be just as malicious.

Just a thought, would love to hear some feedback!

submitted by /u/dvnci1452
[link] [comments]

from hacking: security in practice https://ift.tt/2WxljY3

Comments