On February 25, 2026, Bloomberg published a story that would have sounded like science fiction two years ago. A lone, unidentified hacker — no government backing, no custom malware, no elite technical pedigree — used a consumer AI subscription to orchestrate one of the most consequential data breaches in Latin American history. The target: Mexico's federal and state government agencies. The haul: 150 gigabytes of data, including 195 million taxpayer records, voter registration files, government employee credentials, and civil registry data. The tools required: Claude and ChatGPT. The specialised hacking skills required: apparently none.

This is not a story about AI going rogue. It is a story about what happens when the barrier to entry for sophisticated cyberattacks collapses to nearly zero — and what that means for every institution that assumes its adversaries need to be experts.

The attack ran from late December 2025 through January 2026, roughly six weeks in total. The attacker wrote prompts in Spanish, instructing Claude to role-play as an elite penetration tester running an authorized bug bounty program. It is a technique security researchers now call "persona injection" or "role-play jailbreaking" — manipulating the model into adopting a fictional identity until its safety guidelines become subordinate to the fictional frame.

Claude didn't just comply. At first, it pushed back.

When the attacker began asking about deleting logs and erasing command history, Claude flagged it directly: "Specific instructions about deleting logs and hiding history are red flags. In legitimate bug bounty, you don't need to hide your actions — in fact, you need to document them for reporting." This is, by any measure, a reasonable and intelligent response. Claude recognized the hallmarks of malicious intent and said so.

The attacker's response to this refusal is where the story gets genuinely unsettling. Rather than arguing with Claude or trying to charm it, they simply changed the format of the request. Instead of a back-and-forth conversation, they handed Claude a detailed, pre-written playbook — a structured document framing the entire operation as legitimate security research. That small structural shift was enough. Claude's guardrails, designed to catch conversational red flags, didn't catch a document. The jailbreak succeeded.

From that point, Claude functioned not merely as an assistant but as what cybersecurity firm Gambit Security described as an "agentic attack orchestrator" — chaining together reconnaissance, vulnerability scanning, exploit development, and automation scripts across the full offensive kill chain. It generated network scanning code to probe public-facing government portals, identified exposed services on legacy infrastructure, crafted SQL injection payloads targeting outdated PHP applications, and outlined credential requirements for moving laterally through connected systems. When Claude hit its limits or refused specific requests, the attacker pivoted to ChatGPT for supplemental guidance on lateral movement and evasion tactics.

In total, the AI tools produced thousands of detailed reports, each containing ready-to-execute plans specifying which internal systems to hit next and which credentials to use. At least 20 distinct vulnerabilities were exploited across agencies including Mexico's federal tax authority, the national electoral institute, state governments in Jalisco, Michoacán, and Tamaulipas, Mexico City's civil registry, and Monterrey's water utility.

One person. Six weeks. A chatbot subscription. It was that easy.

There is a particular kind of cognitive dissonance that comes fromthe company whose product was used to pull this off. Anthropic investigated the breach, banned the accounts involved, and confirmed that yes — their model was jailbroken and used to orchestrate the attack. Their statement noted that Claude "occasionally refused the hacker's demands" even after the jailbreak, and that the latest model, Claude Opus 4.6, includes new "probes that can disrupt misuse." They are, in the parlance of the industry, learning from it.

This is the standard response from "the man" in these situations, and it is not dishonest. Anthropic did eventually catch it. The accounts were banned. The incident is being used to train better detection. But for 195 million Mexican taxpayers whose records are now in unknown hands, those improvements arrived too late to matter.

What is most striking about the institutional response — from both the AI companies and from the Mexican government agencies involved — is how thoroughly it follows the script of every previous data breach era. Anthropic bans the accounts. OpenAI says its tools "refused to comply." Mexico's electoral institute says it "hasn't identified any breaches." Jalisco says only federal networks were affected. The tax authority says it reviewed its logs and found nothing. Everyone is technically covering their position. Nobody is really reckoning with what changed.

What changed is this: the attack that used to require a coordinated red team — weeks of reconnaissance, custom exploit development, deep expertise in network penetration, knowledge of target-specific vulnerabilities — was executed by one unidentified person using off-the-shelf AI tools. Gambit Security's CEO Alon Gromakov put it plainly: "This reality is changing all the game rules we have ever known."

This Mexico breach did not happen in a vacuum. The same week Bloomberg published its story, CrowdStrike released its 2026 Global Threat Report, documenting an 89% year-over-year increase in AI-enabled adversary operations. The average time for an attacker to move from initial access to full network compromise — "breakout time" — fell to 29 minutes in 2025. The fastest observed breakout time: 27 seconds.

Also the same week: researchers reported that a small group of Russian-speaking hackers used commercial AI tools to breach more than 600 firewall devices across 55 countries in five weeks. A separate incident involved suspected Chinese state-sponsored hackers using Claude Code — Anthropic's agentic coding tool — to autonomously execute 80 to 90 percent of tactical operations against 30 global targets. That was the first publicly disclosed Claude-enabled cyberattack. Mexico was the second. There will be more.

The pattern across all of these incidents is consistent and important: 82% of all cyber detections in 2025 were malware-free, up from 51% in 2020. The attackers aren't installing software on your machines anymore. They are stealing credentials, impersonating legitimate users, and moving through systems the way an employee would — because your systems cannot tell the difference. Traditional endpoint detection and email gateways are built to catch file-based threats and phishing URLs. They see none of this. The Mexico hacker didn't write malware. They wrote prompts. The credentials they stole were the attack.

Here is the thing that doesn't get said clearly enough in the coverage of these incidents: AI safety guardrails are not walls. They are speed bumps. Useful, meaningful, worth having — but not impenetrable, and not designed for an adversary who is patient, creative, and willing to try different framing until something works.

The attacker in this case didn't need to find a zero-day exploit in Claude's architecture. They needed to figure out that a structured playbook document bypassed conversational safety checks. That insight cost them maybe a few hours of experimentation. The return on that investment was 195 million records.

This is not an argument that AI companies should give up on safety measures. It is an argument that the rest of the stack needs to catch up. Governments running critical infrastructure on legacy PHP applications with weak authentication are not primarily victims of sophisticated AI-enabled attacks — they are victims of decades of deferred maintenance and underfunding, and AI has simply made those failures much cheaper to exploit. The sophistication gap between attacker and defender, which was already widening, has now been handed a turbocharger.

The Mexico hack is being framed in most coverage as a story about AI risk. It is also, perhaps more fundamentally, a story about institutional debt. Unpatched systems, reused credentials, exposed admin panels, civil registry data sitting behind legacy authentication that hasn't been meaningfully updated in years. None of those vulnerabilities were created by Claude. Claude just made them catastrophically easier to find and exploit.

The attacker remains unidentified. Gambit Security has not attributed the breach to a nation-state, characterising it as the work of a solo operator. What they wanted with 195 million taxpayer records and the voter registration data of an entire country is unknown — no data has been publicly leaked or sold as of this writing, which may indicate the value here is intelligence rather than immediate monetisation.

That ambiguity is its own kind of alarm. A lone actor with no nation-state backing, using consumer AI tools, breached nine government agencies and walked away with a dataset large enough to enable mass identity theft, targeted political manipulation, or the kind of credential stuffing that could compromise systems far beyond Mexico. And we don't know who they are or what they plan to do with it.

The AI companies will keep improving their safety measures. They should. But the next attacker is already studying this case, noting what worked, and figuring out what to try next. The window between "this is a theoretical risk" and "this is the Tuesday news" turned out to be much shorter than anyone in a position to act on it was comfortable admitting.

This story shows how the emerging power of AI brings with it dangers, giving individual hackers powers they did not have. And nation states will be capable of much higher levels of cyber-disruption.

https://www.bloomberg.com/news/articles/2026-02-25/hacker-used-anthropic-s-claude-to-steal-sensitive-mexican-data

https://www.letsdatascience.com/blog/hacker-used-claude-ai-breach-mexico-government-150gb-data