The Dark Side of Agentic AI: Are We Ready for What's Coming?

Have you ever spent 10 minutes on an internet questionnaire and then at the end it puts your results behind a pay wall. How annoying!! That was me yesterday, and it really pissed me off, because I spent a good 15 minutes on it only to get no answer. I went back to the homepage to see if there was mention of a paywall or cost and there wasn't a single mention anywhere. Just a few days ago a friend of mine who read law told me about how he has used this to his advantage by referencing new law changes which are targeting ghost pricing. In my annoyed state, I started thinking of ways that I could get back at the company for deceptive marketing. I could report them to the FCA. I could blast their server with dodgy requests or find their API and overwhelm it. Then I asked Claude to see how easy it was to build such a pernicious script and it got me thinking further. How could an agent be used to cause destruction?

The talking points around agentic AI or agents is always about solving some kind of goal or creating something positive. A lot of the foundational models have strict guard rails that would require some jiggery-pokery to get them to behave badly, but open source models wouldn't be that difficult. Then I started thinking about the products I've worked on and how a self-managing system could be used to disrupt or cause trouble. Are we ready for that kind of disruption? What about once state actors get involved. What about when agents are connected with drones and weapons? It's worth thinking about, and it's worth talking about right now.

I've mainly worked on web apps, so here are a few risks that I've jotted down:

  1. Brute forcing a UI to gain information that is not available to an API
  2. Looking for page loads on vulnerable servers to decide which servers are brittle enough to attack
  3. Agents trained to find CVEs in webapps and use tools to try and exploit the server
  4. Agents that scan client artefacts hunting for keys or vulnerable code
  5. Agents that hold reservations like tickets to manipulate prices or to not allow others to purchase items. This could also be used on shopping baskets to confuse stock taking algorithms. It could also be used by touts.
  6. Reconnaissance or spying on people / a place.
  7. Used by bot farms to spread disinformation. This is already happening, but I believe it will increase in volume dramatically

I got this far and then I asked Claude to come up with some more and they are probably more terrifying than mine. Consider this context as well, as we speak there's an Anthropic model named Mythos that is being tested as it may be too harmful to release to the general public.

Claude's answers:

Here are some additional risks to consider:

Cyber & Infrastructure

  • Agents that autonomously probe and map internal networks after an initial breach, dramatically accelerating lateral movement without human attacker involvement
  • Persistent agents that slowly exfiltrate data in small, noise-avoiding increments to evade anomaly detection systems
  • Agents that monitor patch release notes and automatically reverse-engineer the vulnerability being fixed, exploiting unpatched systems in the window before updates are applied
  • Coordinated multi-agent attacks where one agent distracts security systems while another performs the actual breach

Social Engineering & Fraud

  • Highly personalised phishing at scale — agents that scrape a target's public digital footprint to craft individually tailored lures, removing the "spray and pray" limitation of traditional phishing
  • Real-time voice or chat impersonation of known contacts or authority figures to manipulate victims (CEO fraud, family emergency scams) with dramatically lower cost and higher volume
  • Agents that build long-term fake relationships over weeks or months before executing a fraud — romance scams, fake recruiters, etc. — with superhuman patience and consistency

Financial & Economic

  • Algorithmic market manipulation below the threshold that triggers regulatory alerts, coordinated across many accounts
  • Agents that monitor legal filings, planning applications, and procurement portals to extract non-public signals for trading or competitive intelligence
  • Automated credential stuffing combined with account takeover, intelligently prioritising targets by likely account value

Physical World

  • Agents that book and cancel logistics or transport capacity (freight, vehicles) to create artificial scarcity or disrupt supply chains — an extension of your ticket-touting point
  • Coordinating drone swarms or physical robots for surveillance or disruption, with the agent handling real-time decision-making that previously required a skilled human operator

Institutional & Democratic

  • Agents that systematically flood regulatory comment periods, planning consultations, or public feedback processes with synthetic but superficially legitimate responses, drowning out genuine voices
  • Generating and submitting false legal documents, evidence, or FOIA requests at scale to clog court systems or extract information
  • Automated astroturfing of academic or policy discourse — generating fake citations, bogus preprints, or coordinated peer review manipulation

Defensive Evasion

  • Agents that study an organisation's published security blog or incident reports to learn exactly what that defender does detect, and tailor attacks to avoid those specific signatures
  • Agents that deliberately act slowly and inconsistently to mimic human behaviour patterns and evade bot-detection systems like CAPTCHA or behavioural analytics

Related topics:

← Back to blogs

The Shedding Skin Heuristic: How Zombie APIs and Missed Dependency Updates Hide in Plain Sight

A software testing heuristic for catching dependency updates and zombie APIs

The Dark Side of Agentic AI: Are We Ready for What's Coming?

From brute-forcing UIs to drone swarms, agentic AI in the wrong hands poses risks we're barely discussing. A web developer's perspective on the threats hiding in plain sight.

BINMEN: A Practical Heuristic for API Testing

BINMEN is a mnemonic heuristic for API testing covering Boundary, Invalid, Null, Method, Empty, and Negative scenarios. Learn how to use it practically.

API Testing Mnemonics: CRUD, BINMEN, VADER & POISED Explained

API Testing Mnemonics: CRUD, BINMEN, VADER & POISED Explained

Don't think of an elephant

Should you do what your told or look where they tell you to not look

There's Something Odd About the Official Playwright MCP Demo

There's Something Odd About the Official Playwright MCP Demo

I was wrong about exploratory testing, are you?

How I came to finally understand what exploratory testing is

The perpetual stew vs the historian

A story about a search for truth that no one asked for

Pushback on crappy testing interviews.

How to demonstrate responsible testing in an interview

Common misconceptions about Scrum

Common misconceptions about scrum

AI has got our wires crossed

How AI has us thinking back to front

How are we still doing Taylorism in 2025

It's 2025, and Taylorism should be long gone. Why are we still seeing it everywhere in 2025?

Testing practice: Irish phone numbers

Tales of testing a web form with field validation for Irish phone numbers

Forget flashy - focus on fundamentals in testing

Why testers should focus on risk and fundamentals instead of over-engineering solutions with automation.

Have you had too much to think?

Are you being asked to test without thinking? be wary.

Setting expectations for tester during agile ceremonies

Setting expectations that testers should follow throught each agile process to make more of an impact and provide value

Thoughts on Estimates in Software Engineering

A deep dive into why software estimations are so tricky, the asymmetry of estimates, and how Scrum approaches them.

Rating testing deifnitions from different orgs

Rating the definitions of software testing from page 1 of Google and explaining why I think they deserve the rating

Testing Financial data using an API

How to test time-series financial data through an API

Tales from Reddit: testing doesn't exist

My thoughts on a bizarre comment from Reddit in which a fellow tester claims testing doesn't exist and what it means to the state of testing

My Accidental Vibe Coding Nightmare

When limitied coding experience meets AI, is it tempting to vibe code or are you entering a debugging nightmare?