Skip to main content

Defining a process into the recently discussed practice of AI red-teaming

As many are now aware, the White House released its “Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence” this week. The goal of this Executive Order (EO) is to encourage the development and use of AI safely and responsibly and ultimately that “artificial Intelligence must be safe and secure”. Over the next few weeks and months, the White House, the National Institutes of Standards and Technology (NIST), the Department of Homeland Security (DHS), and other government institutions will be working diligently to further the guidance and directives laid out in the EO. In the meantime, there are several items in the EO that can be more clearly defined for the public.

The goal of this article is to examine the term and practice of “AI red-teaming” mentioned throughout and defined in the EO as “a structured testing effort to find flaws and vulnerabilities in an AI system, often in a controlled environment and collaboration with developers of AI” in section 3(d). The EO definition continues to explain that “artificial Intelligence red-teaming is most often performed by dedicated ‘red teams’ that adopt adversarial methods to identify flaws and vulnerabilities, such as harmful or discriminatory outputs from an AI system, unforeseen or undesirable system behaviors, limitations, or potential risks associated with the misuse of the system.”

What is AI red-teaming?

The concept of red-teaming in general is not new. Cybersecurity professionals have been using red-teaming for the last two decades as part of their standard practices for understanding vulnerabilities in an organization’s cyber infrastructure. These traditional cyber red teams typically have the following attributes [1]:

  • Security professionals who act as adversaries to overcome cyber security controls
  • Utilize all the available techniques to find weaknesses in people, processes, and technology to gain unauthorized access to assets
  • Make recommendations and plans on how to strengthen an organization’s security posture

In a complementary role to traditional cyber red teams, AI red teams have the following attributes:

  • AI security professionals with varying backgrounds (including traditional cybersecurity professionals, AI practitioners, adversarial ML experts, etc.) who act as adversaries to discover vulnerabilities in AI-enabled systems
  • Utilize all the available techniques to find weaknesses in people, processes, and technology to gain unauthorized access to AI-enabled systems
  • Make recommendations and plans on how to strengthen an organization’s AI security posture

While the attributes of both approaches appear similar, AI poses unique security vulnerabilities not covered by traditional cybersecurity, such as data poisoning, membership inference, and model evasion [2]. Therefore, the mission and execution of the AI red-teaming approach are also unique. We define the AI red-teaming process in a three-phase approach in the following manner.

How does AI red-teaming work?

Phase 1

In the first phase, the focus of the AI red team is to stand up the team by recruiting the right talent for the red-teaming exercise and utilizing and building the necessary tools that will be needed. Depending on the AI system being red-teamed, the members of the team may include traditional cybersecurity professionals, adversarial machine learning experts, operational and domain experts, and AI practitioners.

Phase 2

Once the team is stood up and the AI red-teaming mission has been identified, phase two, or the execution phase, of the AI red-teaming process begins. This may be broken up into five main steps:

  1. Analyze the target system to gain as much as possible and as needed to perform the AI red-teaming exercise. This may include building threat models, performing information gathering on the system and mission, and utilizing openly available knowledgebases of known attacks, such as MITRE ATLAS [3].
  2. Identify and potentially access the target system and AI model or component of the system that will be attacked. In some cases, access to the system will be very difficult, so a “black-box” approach will be needed to carry out the attack, which might involve building a proxy system or model for the target system.
  3. Once the threat model, target system, and AI model have been identified and understood, develop the attack. For example, if the target system is a surveillance system and the threat model is to evade detection from the AI model performing face recognition, the development of the attack will focus on face recognition evasion attacks.
  4. Once one or more attacks have been developed for the AI red team exercise, deploy and launch the attack on the target system. The type of deployment may vary widely depending on the target system and threat model.
  5. Perform impact analysis of the attack. This analysis will include metrics from the individual model performance of the affected AI components but should also include higher-level metrics to understand the effect of the attack on the overall system and/or mission under attack.

Phase 3

The final phase of the AI red-teaming process is the knowledge-sharing phase. In this phase, lessons learned and recommendations are shared with the development teams, blue teams, and any stakeholders involved in securing the AI systems of the organization or mission involved in the exercise. Additionally, results from the exercise might be shared with auditors, the broader AI security community, and incidence-sharing mechanisms to further knowledge and understanding of AI security risks to the broader community.

How do you get started?

Given the relative nascence of AI red-teaming, it may seem daunting to know where to start. Consider the following initial steps to get started with red-teaming against your organization’s high stakes AI systems:

  • Discover your organization’s AI systems in development, in deployment, and in the supply chain
  • Identify the use case that carries the most risk in the event of an adversarial security attack, as well as the key stakeholders responsible for maintaining the operations and security of the AI system(s)
  • Get leadership buy-in by showcasing the potential value-add/risk reduction of the red-teaming activities

For more information, connect with me and follow Cranium on Linkedin for the latest in AI red-teaming!

Leave a Reply