Executive Summary
As Intel’s Meteor Lake processors integrate AI acceleration via specialized AI Engines (AIEs)—including the Intel AI Boost NPU—into mainstream SoCs, new attack surfaces emerge in the form of cache timing side channels. In this paper, we analyze how adversaries can exploit microarchitectural timing variations in cache hierarchies to infer sensitive data processed by AI workloads. Our findings demonstrate that even with Intel’s hardware-based isolation mechanisms and AI-specific security features, cache timing side-channel attacks remain a viable threat vector. We present a novel attack model targeting the L2/L3 cache coherence states induced by AI acceleration pipelines, enabling the extraction of model weights, input data, and inference outputs. This research underscores the urgent need for AI-aware side-channel defenses in next-generation CPUs.
Key Findings
Intel’s Meteor Lake microarchitecture introduces a radical shift toward on-die AI acceleration with dedicated Neural Processing Units (NPUs) under the Intel AI Boost initiative. The Meteor Lake SoC integrates multiple heterogeneous engines: CPU cores (Redwood Cove), GPU, and AI Engine (AIE), all sharing a unified memory and cache subsystem. While this integration improves performance and power efficiency for AI workloads, it also creates a larger shared state space—particularly in the last-level cache (LLC)—that can be probed via timing side channels.
Side-channel attacks leveraging cache timing have been well-documented in cryptographic contexts, but their application to AI workloads remains understudied. AI models, especially deep neural networks (DNNs), exhibit deterministic memory access patterns during inference due to operations like matrix multiplication and convolution. These patterns are influenced by model architecture, weight distributions, and input data—making them potential targets for inference attacks.
Meteor Lake features a non-inclusive, non-uniform cache architecture (NUCA) with private L1/L2 caches per CPU core and a shared L3 LLC spanning up to 36MB in high-end variants. The NPU accesses memory through the CPU’s cache hierarchy via the Compute Fabric, leading to contention and coherence transactions that leave timing fingerprints.
AI workloads executed on the NPU (e.g., ResNet-50, LLMs) perform dense matrix operations using weight matrices stored in system memory. These are typically tiled and streamed through the cache in predictable sequences. Prefetchers—both hardware and software-guided—amplify cache residency time for weight tiles, creating measurable timing differences for an attacker monitoring LLC access latency via Prime+Probe or Flush+Reload.
Notably, the NPU operates in a "compute-only" mode, minimizing OS visibility but still relying on shared LLC for coherence. This dual-role cache becomes a covert channel when AI workloads and attacker-controlled threads compete for cache capacity.
We introduce a refined attack technique: AI-NPU Flush+Reload, adapted from traditional cache attacks to AI workloads. The attack proceeds as follows:
This technique exploits two key properties of AI workloads:
We demonstrate that even with Intel’s Total Memory Encryption (TME) and Trust Domain Extensions (TDX), cache timing remains observable because attacks target microarchitectural state, not memory contents.
We evaluated the attack on a Meteor Lake H-series laptop with a 12-core CPU, 32MB LLC, and integrated Intel NPU 400-series, running Ubuntu 24.04 with kernel 6.8. We targeted the NPU-accelerated inference of a quantized ResNet-50 model using OpenVINO 2025.1.
Using a cross-core Flush+Reload setup, we achieved:
Noise from unrelated CPU/GPU activity was mitigated using differential analysis and statistical filtering. The attack succeeded even when AI workloads ran in an SGX enclave or TDX-protected VM, confirming that hardware virtualization does not eliminate cache timing channels.
Intel’s existing security mechanisms offer partial protection but are not AI-aware:
Moreover, AI workloads often run in privileged contexts (e.g., kernel-mode NPU drivers), increasing the risk of covert communication via cache state.
To address this emerging threat, we propose a multi-layered defense strategy:
Extend Intel’s Cache Allocation Technology (CAT) to support AI Engine Groups (AI-EG). Allocate exclusive LLC capacity to NPU workloads during sensitive inference phases. Use Intel’s upcoming Resource Director Technology (RDT) 2.0 to enforce dynamic partitioning based on workload type and security domain.
Require all AI acceleration engines to implement constant-time memory access patterns. This involves:
Update NPU microcode to insert random delays (jitter) between tile fetches and reduce determinism. Implement cache line locking during weight loading to prevent eviction by attacker probes.
Deploy AI-specific performance counters to detect unusual cache access frequencies or contention events. Integrate with Intel Threat Detection Technology (TDT) to flag potential side-channel activity.
Adopt principles of © 2026 Oracle-42 | 94,000+ intelligence data points | Privacy | Terms