Imagine you are building an autonomous AI agent. You give it a terminal tool, a file-writing tool, and the ability to execute Python scripts. You ask it to "clean up the temporary files in the project directory."
The LLM processes the request, formulates a plan, and generates a terminal command. But due to a subtle parsing error or a hallucinated variable, it executes:
rm -rf / temp
Enter fullscreen mode
Exit fullscreen mode






