Vigil is an open-source security scanner that detects prompt injections, jailbreaks, and other potential threats to Large Language Models (LLMs).

Prompt injection arises when an attacker successfully influences an LLM using specially designed inputs. This leads to the LLM unintentionally carrying out the objectives set by the attacker.

“I’ve been really excited about the possibilities of LLMs, but have also noticed the need for better security practices around the applications built around them and the data we give the applications access to. This project gave me a great chance to build something at the intersection of AI and cybersecurity. Hopefully it is providing other security researchers and developers a start in experimenting with existing LLM input and output safety measures, and even creating their own. More “whats possible” than anything I’d expect to be used directly in production,” Adam M. Swanda, the creator of Vigil, told Help Net Security.

Vigil LLM security scanner highlights

  • Modular and extensible design
  • Supports YARA (heuristics), vector DB similarity, a transformer model, prompt-response similarity
  • Custom scanners can be added with little code
  • Self-hosted or use OpenAI
  • Embedding datasets and YARA signatures provided
  • Vector DB can auto-update with detected prompts when a threshold of scanners match
  • Very configurable (enable/disable scanners, modify thresholds, use different embedding models, etc.)
  • Easily extensible by adding custom scanners, new YARA signatures, or updating the vector DB

Vigil is available for download on GitHub. This repository also provides the detection signatures and datasets needed to get started with self-hosting.

Swanda plans to continue developing Vigil in the near term. Specifically, he’s been working on an application designed to evaluate Vigil and its various scanners against custom datasets. This application assesses aspects such as false positives and other relevant metrics. Additionally, Swanda is exploring methods for detecting image-based prompt injections.

More open-source tools to consider:



Source link

By i53gf