There's Something Odd About the Official Playwright MCP Demo
In the demo, the presenter prompts an MCP server to perform an exploration of a web app they built that can look up different movies and generate one test.
The agent starts by finding a search button and choosing to perform a check that it works. It chooses a movie title — "star wars". Great taste.
When searching for star wars, a completely different movie appears with a different thumbnail and description. The presenter is under the impression that the MCP found an edge case or something that she had missed, and wrote some tests to uncover that issue.
That's not what appears to have happened.
The MCP server did not notice that the wrong movie was shown — we can see that by its summary and the tests it creates. It also didn't ask the presenter to update the incorrect file either, because it's still there.
The point is not that the presenter did a bad job or that Playwright MCP is a load of crap.
The presenter did a great job, and Playwright is a great tool.
The point is that working with AI often leads us astray. It moves much quicker than we can think critically, and that's a problem if we want to truly assess quality.
130,000 people have watched the video, but I bet only a handful noticed that beyond the awe, the MCP created a pretty useless test.
It looks like the MCP server found an issue and acted on it — but it didn't. It was the human that noticed it. Funny, that.
The MCP server then wrote a passing test, rather than a failing test — which would have been the better way to surface an issue and get a developer to fix it.
Speed is seductive. But quality requires pause.