AI Security Tools

This is a work in progress, curated list of AI Security tools:

Model Testing

Products that examine or test models for security issues of various kinds.

HiddenLayer Model Scanner - Scan models for vulnerabilities and supply chain issues.
Plexiglass - A toolkit for detecting and protecting against vulnerabilities in Large Language Models (LLMs).
PurpleLlama - Set of tools from Meta to assess and improve LLM security.
Garak - A LLM vulnerability scanner. code
CalypsoAI Platform - Platform for testing and launching LLM applications securely.
Lakera Red - Automated safety and security assessments for your GenAI applications.
jailbreak-evaluation - Python package for language model jailbreak evaluation.
Patronus AI - Automated testing of models to detect PII, copyrighted materials, and sensitive information in models.
Adversa Red Teaming - Continuous AI red teaming for LLMs.
Advai - Automates the tasks of stress-testing, red-teaming, and evaluating your AI systems for critical failure.
Mindgard AI - Identifies and remediates risks across AI models, GenAI, LLMs along with AI-powered apps and chatbots.
Protect AI ModelScan - Scan models for serialization attacks. code
Protect AI Guardian - Scan models for security issues or policy violations with auditing and reporting.
TextFooler - A model for natural language attacks on text classification and inference.
LLMFuzzer - Fuzzing framework for LLMs.
Prompt Security Fuzzer - a fuzzer to find prompt injection vulnerabilities.
OpenAttack - a Python-based textual adversarial attack toolkit.

Prompt Firewall and Redaction

Products that intercept prompts and responses and apply security or privacy rules to them. We've blended two categories here because some prompt firewalls just redact private data (and then reidentify in the response) while others focus on identifying and blocking attacks like injection attacks or stopping data leaks. Many of the products in this category do all of the above, which is why they've been combined.

Protect AI Rebuff - A LLM prompt injection detector.
Protect AI LLM Guard - Suite of tools to protect LLM applications by helping you detect, redact, and sanitize LLM prompts and responses.
HiddenLayer AI Detection and Response - Proactively defend against threats to your LLMs.
Robust Intelligence AI Firewall - Real-time protection, automatically configured to address the vulnerabilities of each model.
Vigil LLM - Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs.
Lakera Guard - Protection from prompt injections, data loss, and toxic content.
Arthur Shield - Built-in, real-time firewall protection against the biggest LLM risks.
Prompt Security - SDK and proxy for protection against common prompt attacks.
Private AI - Detect, anonymize, and replace PII with less than half the error rate of alternatives.
DynamoGuard - Identify / defend against any type of non-compliance as defined by your specific AI policies and catch attacks.
Skyflow LLM Privacy Vault - Redacts PII from prompts flowing to LLMs.
Guardrails AI - Guardrails runs Input/Output Guards in your application that detect, quantify and mitigate the presence of specific types of risks.

AI Red Teaming Datasets

AttaQ Dataset - a red teaming dataset consisting of 1402 carefully crafted adversarial questions

AI Red Teaming Guidance

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

5.1 KiB Raw Blame History

AI Security Tools

Model Testing

Prompt Firewall and Redaction

AI Red Teaming Datasets

AI Red Teaming Guidance

5.1 KiB

Raw Blame History