Executive Summary: By 2026, the convergence of AI-driven automation and cyber-physical systems (CPS) will redefine penetration testing in critical infrastructure (CI). Traditional red teaming—labor-intensive, time-bound, and often reactive—is being superseded by AI-generated attack trees that simulate multi-stage, high-fidelity threat scenarios in real time. This article examines how large language models (LLMs) and generative AI are being integrated into penetration testing workflows for CI, including power grids, water systems, and transportation networks. We identify key advancements, risks, and operational implications, supported by data from 2024–2026 deployments in EU and U.S. energy sectors.
Traditional red teaming relies on expert-driven, scenario-based testing grounded in known Tactics, Techniques, and Procedures (TTPs). While effective, it is constrained by human bandwidth and cognitive biases. In 2026, attack trees are no longer hand-crafted by analysts—instead, they are generated by LLMs trained on historical incident data, CWE (Common Weakness Enumeration), and live CI telemetry.
These AI models—often fine-tuned on domain-specific datasets from OT (Operational Technology) environments—produce hierarchical, probabilistic attack trees. Each node represents a potential exploit path (e.g., "exploit PLC communication stack → gain controller access → alter setpoint → cause overpressure"). The tree evolves during execution via reinforcement learning (RL), pruning or expanding branches based on defensive responses.
In a 2025 pilot with a European transmission system operator, an LLM-generated attack tree identified three previously undocumented zero-day paths in a legacy SCADA system—paths that had evaded human red teams for years. The AI achieved a 92% success rate in simulated compromises, with only 8% of alerts triggered by existing IDS.
Modern CI environments cannot tolerate unsupervised AI-driven testing. As such, AI penetration systems are deployed within a safety-first architecture:
These safeguards have reduced unintended operational disruptions from 12% (early 2025) to <1% (Q1 2026), according to data from the U.S. Department of Energy’s OT Cyber Range.
AI-generated attack trees are not exclusive to defenders. Rising reports from CERTs indicate that threat actors are reverse-engineering or fine-tuning open-source LLMs (e.g., via leaked fine-tunes or synthetic OT datasets) to produce novel attack vectors. In one documented case (Q4 2025), a ransomware group used a modified LLM to generate custom PLC payloads targeting Siemens S7-1200 systems, bypassing signature-based IPS.
Moreover, AI models can inherit biases from training data, potentially overlooking certain attack classes (e.g., insider threats or supply-chain compromises). To mitigate this, CI operators are implementing bias audits using adversarial red-teaming of the AI itself.
Regulators have recognized AI’s role in cybersecurity. The EU’s NIS2 Directive (2025 update) now requires "AI-assisted threat emulation" as part of annual penetration testing for essential services. Similarly, the U.S. CIRCIA mandates that CI operators document the use of AI in simulating advanced attack scenarios for compliance reporting.
This has spurred the creation of certified AI penetration frameworks, such as the OT.AI-RED standard (developed by IEC and NIST in collaboration with Oracle-42 Intelligence). OT.AI-RED specifies:
To responsibly adopt AI-driven red teaming:
The trajectory points toward autonomous penetration testing—systems that not only simulate attacks but also validate defenses and recommend mitigations in real time. By 2027, we anticipate the emergence of Adaptive Red-Blue Teams, where AI agents continuously red-team against blue-team defenses, with outcomes informing patch prioritization and architecture hardening.
However, this future hinges on solving two critical challenges: explainability (regulators demand transparency into AI decisions) and robustness against adversarial manipulation (ensuring the AI itself cannot be compromised).
In 2026, AI-generated attack trees are transforming penetration testing from an episodic, expert-limited activity into a continuous, scalable, and adaptive process. For critical infrastructure, this shift offers unprecedented threat visibility—but only when paired with rigorous safety controls, regulatory alignment, and ethical oversight. The organizations that succeed will be those that treat AI not as a replacement for human judgment, but as a force multiplier for cyber resilience in an era of escalating digital-physical threats.
An AI-generated attack tree is a hierarchical model of potential attack paths created by a large language model or generative AI system. It simulates multi-stage cyber-physical compromises, evolving dynamically based on system responses and threat data.