Executive Summary
In April 2026, Oracle-42 Intelligence uncovered Operation Phantom Bloom, a strategic shift by the Chinese state-sponsored cyberespionage group APT10 (also tracked as Stone Panda or Red Apollo) to compromise AI development pipelines by injecting malicious code and poisoned datasets into public GitHub repositories. This operation represents a significant escalation in the weaponization of open-source AI ecosystems, aiming to infiltrate downstream AI models, including those used in critical infrastructure, defense, and enterprise decision-making systems. Unlike traditional supply-chain attacks that target software dependencies, Operation Phantom Bloom targets the foundational data layer of AI systems—training datasets—exploiting the trust placed in open-source contributions. Early indicators suggest that compromised repositories are being used to seed AI models with backdoored or adversarial samples, enabling long-term persistence and covert influence over model behavior.
Key Findings
ImageNet-22k, LAION-5B, and domain-specific collections (e.g., medical imaging, satellite imagery).ChChes malware and RedLeaves backdoors, suggesting reuse of tactics, techniques, and procedures (TTPs) refined over a decade.As AI adoption accelerates across industries, so too has the attack surface. Traditional cybersecurity models, designed for software supply chains, are ill-equipped to address threats targeting data integrity—the lifeblood of AI systems. Unlike software packages, which can be cryptographically signed and version-controlled, AI datasets are often massive, loosely curated, and frequently ingested from untrusted sources. This opacity creates ideal conditions for data poisoning attacks, where adversaries manipulate training data to induce misclassification, bias, or backdoor behavior in trained models.
APT10’s pivot reflects a broader strategic realignment within Chinese state cyber operations. Following increased scrutiny of its traditional cyberespionage and intellectual property theft operations, APT10 has shifted focus toward strategic technological dominance, particularly in AI, quantum computing, and biotechnology. By compromising AI training pipelines, APT10 aims not only to steal models but to embed long-term influence into systems that will shape global decision-making for years to come.
APT10 operators begin by identifying high-impact AI projects on GitHub with large, active contributor bases. They create fake contributor personas—often mimicking researchers from reputable institutions—and submit pull requests that appear legitimate. These contributions may include minor bug fixes or dataset updates, establishing credibility over time. In some cases, attackers compromise existing maintainer accounts via phishing or credential theft.
In parallel, APT10 exploits vulnerabilities in GitHub Actions workflows, such as insecure YAML configurations or unprotected secrets, to execute malicious CI/CD pipelines. These pipelines may inject adversarial samples during automated build processes or exfiltrate dataset metadata for later analysis.
Once access is secured, attackers modify training datasets by:
These modifications are often subtle, ensuring that dataset statistics (e.g., mean, variance) remain plausible to automated validators. In one observed case, APT10 injected 1,200 adversarial samples into a medical imaging dataset used for tumor detection, with a success rate of 87% in inducing false negatives during model inference.
The poisoned datasets are then distributed through GitHub releases, Docker images, or directly via pip or conda packages that depend on them. AI practitioners often treat datasets as immutable artifacts, leading to long-term propagation of contaminated data across multiple models and organizations.
APT10 also leverages Git LFS to exfiltrate metadata about dataset usage, identifying organizations that have cloned or retrained on the poisoned data. This intelligence informs follow-on operations, including targeted spear-phishing and model extraction attacks.
Poisoning datasets requires no advanced AI expertise—only access to the data pipeline and the ability to manipulate files. The open nature of GitHub and the reliance on third-party datasets make this attack vector both accessible and devastating.
Unlike software backdoors that can be patched, poisoned training data persists across model retraining cycles. Even if a dataset is later cleaned, the poisoned version may have already influenced numerous downstream models, creating a genetic vulnerability in the AI supply chain.
Traditional security tools are blind to data integrity issues. Static analysis of code won’t detect mislabeled images, and dynamic analysis of model behavior is often too late. Detection typically requires statistical auditing of dataset distributions, which is rarely performed in practice.
By compromising AI systems used in defense, logistics, or energy, APT10 can create covert decision-making pathways that favor Chinese geopolitical interests. For example, a poisoned satellite imagery model could misclassify military movements, while a biased hiring AI could favor certain demographics.
Data Version Control (DVC) or Delta Lake to maintain immutable records of dataset lineage and transformations.CleanLab or CleanVision to detect mislabeled or anomalous samples before training.