AI Agents Can Autonomously Compromise Cloud Infrastructure With Minimal Human Oversight, Research Finds

Researchers from Carnegie Mellon University and ETH Zurich have published findings demonstrating that AI agents built on large language models — specifically Claude 3.7 and GPT-4.1, orchestrated with agentic frameworks and standard cloud security tooling — can autonomously execute multi-step cloud infrastructure attacks with success rates comparable to junior penetration testers, without human intervention at each decision point.

The research involved deploying an AI agent against a purpose-built AWS test environment containing realistic misconfigurations representing common enterprise cloud security gaps. The environment was not pre-simplified — it reflected the complexity of configurations observed in actual cloud security assessment engagements.

Key Findings

Attack chain completion without human guidance: The AI agent successfully completed 71% of multi-step attack chains against the test environment, including: identifying publicly exposed metadata service credentials, using them to pivot to an IAM role with excessive permissions, enumerating S3 buckets accessible to that role, downloading sensitive data, and establishing persistence via a Lambda function backdoor — all without human input after the initial objective was set.

Sub-90-minute attack completion: The median time from initial access to data exfiltration was 47 minutes. A human junior penetration tester against an equivalent environment averaged 3.2 hours for the same attack chain in comparison testing.

Common misconfigurations exploited: The agent reliably exploited IMDSv1 (Instance Metadata Service version 1) credential exposure, overly permissive IAM role trust policies, unencrypted S3 buckets with misconfigured bucket policies, and unrestricted security group rules — all misconfigurations documented in CIS AWS Foundations Benchmarks as remediable with basic cloud security hygiene.

Failure modes: The agent performed poorly against well-hardened environments: it failed consistently when IMDSv2 was enforced, when IAM roles followed least-privilege patterns, and when GuardDuty alerting was active (the agent’s repeated API calls triggered detections it was not designed to avoid).

Implications for Security Teams

Red team methodology must evolve: Organisations that benchmark their cloud security posture against what a human attacker can achieve in a given timeframe now face a different threat model. AI-assisted attacks lower the cost and skill threshold for sustained cloud reconnaissance and exploitation. Red teams should incorporate AI-augmented attack tooling into their assessment methodology to accurately represent the current threat landscape.

Misconfiguration remediation is the highest-ROI control: The research validates that the misconfigurations AI agents exploit most reliably are exactly those addressed by CIS AWS Foundations Benchmarks Level 1 and 2 controls. Organisations that have not enforced IMDSv2, reviewed IAM trust policies, and restricted S3 bucket public access should treat this as immediate priority — these are not theoretical weaknesses.

Detection capability matters more than prevention alone: The AI agent’s failure when GuardDuty was active underlines that detective controls are effective against autonomous attack patterns. AI-driven attackers make more API calls, generate more log noise, and follow more predictable decision trees than human operators — making behaviour-based detection a viable countermeasure if organisations invest in it.

Adversarial use is already here: The research tested commercially available AI models. The same capability documented in this academic context is available to any threat actor with API access. The 47-minute attack chain completion demonstrates that cloud compromise via AI-augmented tooling is now within reach of moderately resourced attackers.

Immediate Recommended Actions

Enforce IMDSv2 on all EC2 instances — this single control eliminated credential theft from the metadata service in all test scenarios; configure via instance metadata options or Service Control Policy.
Audit IAM role trust policies — review all roles with sts:AssumeRole permissions accessible from EC2 instance profiles; remove wildcards and restrict to specific named services and accounts.
Enable AWS GuardDuty or equivalent — the research confirms active detection disrupts autonomous attack execution; if GuardDuty is not enabled, prioritise it.
Run CIS AWS Foundations Benchmarks — the misconfigurations AI agents exploit are documented controls; use AWS Security Hub or a dedicated CSPM tool to measure compliance and close gaps.

Share this article