Key points:
* It uses a LLM integrated with Playwright, a headless web browser, enabling automated web interactions through function calling.
* It gives access to the LLM to 7 web hacking documents and planning capabilities through specific prompting, without disclosing the exact methods to prevent misuse.
GPT-4 achieves a 73.3% success rate on the tested vulnerabilities, emphasizing the potential cybersecurity risks posed by advanced LLMs. Other open models cannot yet perform these types of attacks (results in screenshot).
Congrats to the authors for their work!
Paper: LLM Agents can Autonomously Hack Websites (2402.06664)