The Dark Side of Agentic AI: Are We Ready for What's Coming?

Have you ever spent 10 minutes on an internet questionnaire and then at the end it puts your results behind a pay wall. How annoying!! That was me yesterday, and it really pissed me off, because I spent a good 15 minutes on it only to get no answer. I went back to the homepage to see if there was mention of a paywall or cost and there wasn't a single mention anywhere. Just a few days ago a friend of mine who read law told me about how he has used this to his advantage by referencing new law changes which are targeting ghost pricing. In my annoyed state, I started thinking of ways that I could get back at the company for deceptive marketing. I could report them to the FCA. I could blast their server with dodgy requests or find their API and overwhelm it. Then I asked Claude to see how easy it was to build such a pernicious script and it got me thinking further. How could an agent be used to cause destruction?

The talking points around agentic AI or agents is always about solving some kind of goal or creating something positive. A lot of the foundational models have strict guard rails that would require some jiggery-pokery to get them to behave badly, but open source models wouldn't be that difficult. Then I started thinking about the products I've worked on and how a self-managing system could be used to disrupt or cause trouble. Are we ready for that kind of disruption? What about once state actors get involved. What about when agents are connected with drones and weapons? It's worth thinking about, and it's worth talking about right now.

I've mainly worked on web apps, so here are a few risks that I've jotted down:

Brute forcing a UI to gain information that is not available to an API
Looking for page loads on vulnerable servers to decide which servers are brittle enough to attack
Agents trained to find CVEs in webapps and use tools to try and exploit the server
Agents that scan client artefacts hunting for keys or vulnerable code
Agents that hold reservations like tickets to manipulate prices or to not allow others to purchase items. This could also be used on shopping baskets to confuse stock taking algorithms. It could also be used by touts.
Reconnaissance or spying on people / a place.
Used by bot farms to spread disinformation. This is already happening, but I believe it will increase in volume dramatically

I got this far and then I asked Claude to come up with some more and they are probably more terrifying than mine. Consider this context as well, as we speak there's an Anthropic model named Mythos that is being tested as it may be too harmful to release to the general public.

Claude's answers:

Here are some additional risks to consider:

Cyber & Infrastructure

Agents that autonomously probe and map internal networks after an initial breach, dramatically accelerating lateral movement without human attacker involvement
Persistent agents that slowly exfiltrate data in small, noise-avoiding increments to evade anomaly detection systems
Agents that monitor patch release notes and automatically reverse-engineer the vulnerability being fixed, exploiting unpatched systems in the window before updates are applied
Coordinated multi-agent attacks where one agent distracts security systems while another performs the actual breach

Highly personalised phishing at scale — agents that scrape a target's public digital footprint to craft individually tailored lures, removing the "spray and pray" limitation of traditional phishing
Real-time voice or chat impersonation of known contacts or authority figures to manipulate victims (CEO fraud, family emergency scams) with dramatically lower cost and higher volume
Agents that build long-term fake relationships over weeks or months before executing a fraud — romance scams, fake recruiters, etc. — with superhuman patience and consistency

Financial & Economic

Algorithmic market manipulation below the threshold that triggers regulatory alerts, coordinated across many accounts
Agents that monitor legal filings, planning applications, and procurement portals to extract non-public signals for trading or competitive intelligence
Automated credential stuffing combined with account takeover, intelligently prioritising targets by likely account value

Physical World

Agents that book and cancel logistics or transport capacity (freight, vehicles) to create artificial scarcity or disrupt supply chains — an extension of your ticket-touting point
Coordinating drone swarms or physical robots for surveillance or disruption, with the agent handling real-time decision-making that previously required a skilled human operator

Institutional & Democratic

Agents that systematically flood regulatory comment periods, planning consultations, or public feedback processes with synthetic but superficially legitimate responses, drowning out genuine voices
Generating and submitting false legal documents, evidence, or FOIA requests at scale to clog court systems or extract information
Automated astroturfing of academic or policy discourse — generating fake citations, bogus preprints, or coordinated peer review manipulation

Defensive Evasion

Agents that study an organisation's published security blog or incident reports to learn exactly what that defender does detect, and tailor attacks to avoid those specific signatures
Agents that deliberately act slowly and inconsistently to mimic human behaviour patterns and evade bot-detection systems like CAPTCHA or behavioural analytics

The Dark Side of Agentic AI: Are We Ready for What's Coming?

Cyber & Infrastructure

Financial & Economic

Physical World

Institutional & Democratic

Defensive Evasion

The Shedding Skin Heuristic: How Zombie APIs and Missed Dependency Updates Hide in Plain Sight

The Dark Side of Agentic AI: Are We Ready for What's Coming?

BINMEN: A Practical Heuristic for API Testing

API Testing Mnemonics: CRUD, BINMEN, VADER & POISED Explained

Don't think of an elephant

There's Something Odd About the Official Playwright MCP Demo

I was wrong about exploratory testing, are you?

The perpetual stew vs the historian

Pushback on crappy testing interviews.

Common misconceptions about Scrum

AI has got our wires crossed

How are we still doing Taylorism in 2025

Testing practice: Irish phone numbers

Forget flashy - focus on fundamentals in testing

Have you had too much to think?

Setting expectations for tester during agile ceremonies

Thoughts on Estimates in Software Engineering

Rating testing deifnitions from different orgs

Testing Financial data using an API

Tales from Reddit: testing doesn't exist

My Accidental Vibe Coding Nightmare