I ran a small internal CTF for our team last month. Twelve challenges, expected solve time around six hours for a strong player. The first three fell in under ten minutes — not because the players were geniuses, but because they pasted the prompt into an LLM and waited.

This is not a rant about cheating. The same thing is happening in public CTFs, and it's exposing a real engineering problem: most CTF challenges were designed assuming the solver is a human reading a static artifact. Frontier models are extremely good at reading static artifacts. If you want challenges that still teach something in 2026, you have to design them differently.

Here's the debugging walkthrough I went through after watching my own event get eaten.

The root cause: challenges that are pure pattern recognition

Most "easy" and "medium" CTF problems share a shape. You get a file or an endpoint. You inspect it. You recognize a known scheme — XOR with a short key, a misuse of ECB mode, a path traversal, a weak JWT secret, a pickle deserialization. You apply the known counter and pull the flag.