Navigating AI Risks with ATLAS
A look at MITRE's Adversarial Threat Landscape for Artificial-Intelligence Systems (ATLAS)
As AI experimentation, use and adoption continues to heat up, the industry is increasingly looking for guidance on how to enable secure AI adoption.
In previous articles we have covered some examples, such as:
You can also find my conversations on these topics below:
Navigating the AI Security Landscape w/ Rob Van Der Veer (OWASP AI Exchange Lead and Industry AI Security Expert)
Securing the Adoption of GenAI and LLM’s w/ Steve Wilson (OWASP LLM Top 10 Project Lead and author of “The Developers Playbook for LLM Security”)
In this article, we will take a look at MITRE’s Adversarial Threat Landscape for AI Systems (aka ATLAS). It’s a
“living knowledge base of adversary tactics and techniques based on real world attack observations and realistic demonstrations from AI Red Teams and Security Groups”.
Interested in sponsoring an issue of Resilient Cyber?
This includes reaching over 6,000 subscribers, ranging from Developers, Engineers, Architects, CISO’s/Security Leaders and Business Executives
Reach out below!
ATLAS is modeled after the widely popular MITRE ATT&CK, which I have covered in an article titled “MITRE ATT&CK v1 adds ICS matrix, sub-techniques for mobile threats”.
MITRE recommends using ATLAS for activities such as security analysis, AI development/implementation, threat assessments and red teaming and reporting attacks on AI-enabled systems.
MITRE has also rolled out companion resources, such as the “AI Risk Database” which can be used to report vulnerabilities and can be used to discover and report risks of public ML models and AI systems.
There is also the MITRE ATLAS Mitigations page, which discusses mitigations that can be used to address specific AI vulnerabilities, risks and malicious activities.
ATLAS Matrix
While the ATLAS Matrix continues to evolve with new threats, risks and vulnerabilities, we will take a look at each of the tactics below and some of the specific techniques within each tactic that organizations should be familiar with and cognizant of as they adopt AI-enabled systems or embed AI technologies into their products.
The ATLAS flows from left to right, and demonstrates an attack lifecycle from initial reconnaissance through ultimate impact.
So let’s walk through the attack lifecycle, taking a look at both the phases, tactics and specific techniques that may be involved.
Reconnaissance
This activity generally involves attackers looking to gather information about the system they are attacking. This is common in any attack lifecycle, and includes AI and ML systems as well.
Some of the reconnaissance activities will often involve searching for the victims publicly available research materials. This may include examples such as journals and conference proceedings, pre-print repositories, or technical blogs.
This can help aid attackers in understanding current research efforts, an organizations underlying architecture, tech stack, open source tooling, models and more. All of this can help inform future malicious activities.
Attackers may also look for publicly available adversarial vulnerability analysis. This can include information on vulnerabilities in models, services/providers, platforms and underlying technologies. This will help inform successful AI focused attacks, wether using known exploitation techniques, or creating new ones.
Additionally, attackers will often search the victims owned websites and repositories for both technical details and insights as well as identity related information that can be used to target employees, developers and engineers with techniques such as phishing or extortion.
Lastly, attackers may actively scan and probe the victims systems to gather information, and involves direct interaction with target systems. However, discerning this often benign interaction from normal system use may be problematic for victims.
Resource Development
After the attacker has conducted their reconnaissance, they will look to establish resources they can use for their malicious activities. This will include various activities such as creating and purchasing resources to support their activities or compromising and stealing existing resources which may offer both cost savings and make their activities opaque and hard to attribute.
We see this quite often with cloud infrastructure more recently, but historically activities such as botnets as well for DDoS type attacks.
This tactic in ATLAS involves 7 different techniques. For the sake of brevity we won’t cover them all but they include things such as:
Acquiring Public ML Artifacts
Obtaining/Developing Capabilities
Acquiring Infrastructure
Poisoning Data and Publishing Poisoned Datasets
Techniques in this tactic not only involve traditional resources but also looking to craft adversarial data, create proxy ML models, and publishing poisoned datasets publicly, similar to how we see attackers take advantage of open source ecosystem, with poisoning software packages.
Mitigations here involve activities such as using cryptographic checksums for ML artifacts.
Initial Access
Okay, so the attacker has done reconnaissance and began developing resources for their malicious activity. Now they will seek to gain initial access to the AI/ML system. These may involve networks, mobile devices or edge systems (or combination). The systems may also be local to the enterprise or be hosted in a cloud environment or managed service provider.
There are many ways attackers can establish an initial access to a system. Some examples ATLAS gives include:
ML Supply Chain Compromise
Valid Accounts
App Exploitation
LLM Prompt Injection
Phishing
Model Evasion
While some of these techniques are common in other cyber attacks, some are more novel for AI/ML. Such as compromising the ML supply chain through GPU hardware, data and ML software or even the model itself.
Model evasion is a technique that involves the attacker crafting “adversarial data”, which are inputs to the ML model that can cause their desired effect on a target model. It could include impacts such as a misclassification, missing detections, impacting energy consumption and more. Impacting the model can have a slew of adverse impacts depending on the use case (e.g. models aimed at helping developer secure code, identify vulnerabilities or assist with SOC and SIEM activities - in the cyber context).
LLM Prompt Injection has perhaps been one of the most discussed attack types against GenAI and LLM systems. It involves crafting malicious prompts to input to the LLM to get it to act in unintended ways. It could be ignoring instructions, providing sensitive outputs, generating harmful content or more. These malicious prompts can be provided directly or come from another data source that the LLM interacts with.
ML Model Access
A unique technique in attacking AI/ML systems is ML model access. Attackers are often seeking access to the ML model to gain information, develop attack techniques or input malicious data into the model for nefarious purposes. They can also get access to the model in various paths, such as the underlying hosting environment, via an API or by interacting directly with it.
The techniques involved in ML Model Access involve:
ML Model Inference API Access
ML-Enabled Product or Service
Physical Environment Access
Full ML Model Access
Physical environment access is fairly straightforward and would be common in countless other types of cyber attacks. The same can be said for API-based attacks, and can help with activities such as discovering related reconnaissance related to the ML model, or staging attacks such as crafting adversarial data or verifying an attacks success.
Increasingly, organizations are utilizing ML and AI through products and services, either directly through an AI provider, or by products and services that are integrating ML and AI into their product portfolio. Attackers may look to get access to the underlying ML model through these products and services, or even glean insights from log and metadata.
In some rare cases, such as in open source, the attacker may have full white-box access to the ML model. This lets them fully understand the architecture, parameters and class ontology.
Execution
Now we’re starting to have the rubber hit the road, as the attacker looks towards execution. This involves looking to run malicious code within the ML artifacts or software, either locally or on a remote system. It also aids broader activities, from moving laterally or stealing sensitive data.
There are three potential techniques involved in this tactic:
User Execution
Command and Scripting Interpreter
LLM Plugin Compromise
In summary execution may involve the user taking specific actions, such as executing unsafe code through techniques such as social engineering or attachments. They may also use commands and scripting to embed initial access payloads or help establish command and control. Most specifically to AI, attackers may look to leverage LLM plugins which help execute API calls or external system integration with other applications to facilitate their execution activities.
Persistance
Once an initial foothold has been established through execution, attackers are striving to establish persistence. This often occurs through ML artifacts and software and is aimed at helping the attacker keep access beyond system restarts or credential rotations that would eliminate their access.
The techniques cited for Persistence include:
Poison Training Data
Backdoor ML Model
LLM Prompt Injection
Persistence of course is a common activity in cyber attacks but the method in which the attacker establishes it for AI/ML systems can be unique. This may involve poisoning the datasets the ML model uses or its underlying training data and labels to embed vulnerabilities or inserting code which can be triggered later when needed, such as a backdoor.
Again, we have the technique of LLM Prompt Injection standing out, and with persistence may function by allowing the prompt injections impact to persist beyond the initial interaction session with the LLM.
Privilege Escalation
Gaining initial access and persistence are key but often the attacker may need to escalate their privilege to have their intended impact, whether it is full organizational compromise, impacting models, data, or exfiltration. Typically attackers take advantage of system weaknesses, misconfigurations and vulnerabilities to escalate their level of access.
The three techniques ATLAS identifies include:
LLM Prompt Injection
LLM Plugin Compromise
LLM Jailbreak
Given we have discussed the first two techniques several times already, we will focus on the LLM Jailbreak. An LLM Jailbreak includes using a prompt injection to get the LLM into a state that lets it freely respond to any user input, disregarding constraints, controls and guardrails the LLM system owner may have put in place.
Defense Evasion
Getting access to a system and persisting is great but detection could lead to eliminating access or impacting the attackers goals. This is why defensive evasion is key.
Similar to previous tactics, the techniques involved here include:
Evading ML Model
LLM Prompt Injection
LLM Jailbreak
This may aid in activities such as evading ML-based virus and malware detection or network scanning to ensure their activities are not discovered.
Credential Access
It should be no surprise to see credential access and compromise listed. While ATLAS lists account names and passwords, this should be expanded to any sort of credentials, including access tokens, API keys, GitHub Privileged Access Tokens and more, as credential compromise remains a leading attack vector and we see the rise of non-human identities (NHI)’s as well, due to API’s, Microservices, Cloud and the current digital landscape.
The only technique ATLAS listed under Credential Access is
Unsecured Credentials
They discuss insecurely stored credentials such as plaintext files, environment variables, and repositories.
Given how much of the current GenAI, LLM and broader AI landscape is reliant on cloud environments, especially due to scale and compute demands, credential access should be a key focus for both customers, and AI providers.
Discovery
Discovery is similar to reconnaissance but is occurring within your environment, rather than from the outside. The attacker has established access and persistence and is now looking to gain insights about the system, network and ML environment.
The four techniques listed include:
Discover ML model Ontology
Discover ML Model Family
Discover ML Artifacts
LLM Meta Prompt Extraction
Here attackers are looking to understand the ML model, its ontology, the family of model, how it responds to inputs and more to tailor their attacks accordingly. They also are looking to understand how an LLM handles instructions and its internal workings so it can be manipulated or forced to disclose sensitive data.
Collection
In this phase of the attack lifecycle within ATLAS the attacker is gathering ML artifacts and other information to aid in their goals. This often is a precursor to stealing the ML artifacts or using the collected information for next steps in their attacks. They are often collecting information from software repos, container and model registries and more.
The techniques identified are:
ML Artifact Collection
Data from Information Repositories
Data from Local Systems
ML Staging Attack
Now that information has been collected they start to stage the attack with knowledge of the target systems. They may be training proxy models, poisoning the target model or crafting adversarial data to feed into the target model.
The four techniques identified include:
Create Proxy ML Model
Backdoor ML Model
Verify Attack
Craft Adversarial Data
Proxy ML Models can be used to simulate attacks, and do so offline while the attackers hone their technique and desired outcomes. They can also use offline copies of target models to verify the success of an attack without raising the suspicion of the victim organization.
Exfiltration
After all the steps discussed, attackers are getting to what they really care about most often, exfiltration. This includes stealing ML artifacts or other information about the ML system. It may be IP, financial information, PHI or other sensitive data depending on the use case of the model and ML systems involved.
The techniques associated with exfiltration include:
Exfiltration via ML Inference API
Exfiltration via Cyber Means
LLM Meta Prompt Extraction
LLM Data Leakage
These all involve exfiltrating data, and it may occur via an API, traditional cyber methods (e.g. ATT&CK Exfiltration), or using prompts to get the LLM to leak sensitive data, such as private user data, proprietary organizational data, and training data, which may have personal information. This has been one of the leading concerns around LLM usage by security practitioners as organizations rapidly adopt LLM’s.
Impact
Unlike exfiltration, impact is striving to do that - have an impact, which may be interruptions, eroding confidence or even destroying ML systems and data. These may target availability (e.g. ransom for example) or integrity.
This tactic has 6 techniques, which include:
Evading ML Models
Denial of ML Service
Spamming ML Systems with Chaff Data
Eroding ML Model Integrity
Cost Harvesting
External Harms
While we have discussed some of the techniques as part of other tactics, there are some unique ones here related to Impact. For example, Denial of an ML service is looking to exhaust resources or flood systems with requests to degrade or shut down services. While most modern enterprise grade AI offerings are hosted in the cloud with elastic compute, they still can run into DDoS and resource exhaustion, as well as cost implications if not properly mitigated, impacting both the provider and the consumers.
Additionally, attackers may look to erode the ML models integrity instead, with adversarial data inputs that impact ML model consumer trust and cause the model provider or organization to fix system and performance issues to address integrity concerns.
Lastly, attackers may look to cause external harms, such as abusing the access they obtained to impact the victim system, resources and organization in ways such as related to financial and reputational harm, impact users or broader societal harm depending on the usage and implications of the ML system.
Closing Thoughts
The MITRE ATLAS represents an excellent resource for organizations looking to understand AI/ML risks, threats and potential vulnerabilities. It is aligned with an attack lifecycle which can be used to interrupt the “kill chain” of malicious activity and mitigation organizational risks. They also provide robust resources such as Case Studies, Sub-techniques, and Mitigations that organizations can use to enable secure AI adoption.
This resource is valuable to both AI/ML consumers, as well as providers and organizations integrating AI/ML into their products and services.
As we continue to experiment and explore with AI, having a robust taxonomy of Tactics, Techniques and Mitigations along with accompanying case studies of real-world incidents and research will be invaluable.
So dig in, and let the MITRE ATLAS be your guide to secure AI adoption and enablement!