There's Something Odd About the Official Playwright MCP Demo

In the demo, the presenter prompts an MCP server to perform an exploration of a web app they built that can look up different movies and generate one test.

The agent starts by finding a search button and choosing to perform a check that it works. It chooses a movie title — "star wars". Great taste.

When searching for star wars, a completely different movie appears with a different thumbnail and description. The presenter is under the impression that the MCP found an edge case or something that she had missed, and wrote some tests to uncover that issue.

That's not what appears to have happened.

The MCP server did not notice that the wrong movie was shown — we can see that by its summary and the tests it creates. It also didn't ask the presenter to update the incorrect file either, because it's still there.


The point is not that the presenter did a bad job or that Playwright MCP is a load of crap.

The presenter did a great job, and Playwright is a great tool.

The point is that working with AI often leads us astray. It moves much quicker than we can think critically, and that's a problem if we want to truly assess quality.

130,000 people have watched the video, but I bet only a handful noticed that beyond the awe, the MCP created a pretty useless test.

It looks like the MCP server found an issue and acted on it — but it didn't. It was the human that noticed it. Funny, that.

The MCP server then wrote a passing test, rather than a failing test — which would have been the better way to surface an issue and get a developer to fix it.


Speed is seductive. But quality requires pause.

Related topics:

← Back to blogs

Testing Heuristics and Mnemonics for APIs

How to use memory heuristics to assist your testing

Don't think of an elephant

Should you do what your told or look where they tell you to not look

There's Something Odd About the Official Playwright MCP Demo

There's Something Odd About the Official Playwright MCP Demo

I was wrong about exploratory testing, are you?

How I came to finally understand what exploratory testing is

The perpetual stew vs the historian

A story about a search for truth that no one asked for

Pushback on crappy testing interviews.

How to demonstrate responsible testing in an interview

Common misconceptions about Scrum

Common misconceptions about scrum

AI has got our wires crossed

How AI has us thinking back to front

How are we still doing Taylorism in 2025

It's 2025, and Taylorism should be long gone. Why are we still seeing it everywhere in 2025?

Testing practice: Irish phone numbers

Tales of testing a web form with field validation for Irish phone numbers

Forget flashy - focus on fundamentals in testing

Why testers should focus on risk and fundamentals instead of over-engineering solutions with automation.

Have you had too much to think?

Are you being asked to test without thinking? be wary.

Setting expectations for tester during agile ceremonies

Setting expectations that testers should follow throught each agile process to make more of an impact and provide value

Thoughts on Estimates in Software Engineering

A deep dive into why software estimations are so tricky, the asymmetry of estimates, and how Scrum approaches them.

Rating testing deifnitions from different orgs

Rating the definitions of software testing from page 1 of Google and explaining why I think they deserve the rating

Testing Financial data using an API

How to test time-series financial data through an API

Tales from Reddit: testing doesn't exist

My thoughts on a bizarre comment from Reddit in which a fellow tester claims testing doesn't exist and what it means to the state of testing

My Accidental Vibe Coding Nightmare

When limitied coding experience meets AI, is it tempting to vibe code or are you entering a debugging nightmare?