Agentic Test-Driven Development

November 26, 2025 by R.P.

My most valuable learning on autonomous AI Software Engineering for complex and large projects of the last months: a strict test-driven-development approach. By now we are all happy to see test coverage in our repos skyrocketing, but most engineers still create tests as a validation after the implementation. Why?

A while ago I was struggling with keeping my AI agents focused on the task at hand and not complicate things - which is when TDD came to mind. Best practice in the largest codebases is test-driven-development (TDD), only problem is: most engineers don’t want to do it, because (frankly) it’s boring and (for most us) highly unsatisfying work. We rather build “real code” instead of writing tests. Fair enough. However, having virtually unlimited AI-engineers with infinite patience at hand, that excel especially in highly structured and repetitive tasks like writing tests, there is no reason to not embrace test writing and to put it at the beginning. All the arguments that we used to bring forward against TDD don’t apply anymore: increased code volume, maintenance overhead, longer time-to-production, overcomplication. All these are irrelevant when putting AI agents at the center of it.

So I changed my development approach:

Create specs of what you want to build.
Have AI agent create tests based on these specs. (Phase Red)
Instruct the AI agent to stop. Do first round of (human) code review, reviewing test coverage and edge cases. Prompt back to apply changes in case specs are not fully covered by tests.
Instruct AI agent to proceed to Phase “Green”, linearly, one test after each other., with a zero failed test policy. Instruct to not change earlier agreed on tests.
Code review & iterate above
Finish

Doing this, I consistently find:

Less (no) deviation from the task at hand or random unrelated “bug fixes”. The agents stay true to their objective function: making tests pass, and nothing else.
Excellent test coverage, by design.
Excellent “documentation” and “specification” of feature requirements, as tests are essentially the contract.

Challenges I often see:

Sometimes agents go too far as to testing depth (“should render red button in top right corner”). However these things are easily spotted in the first code review after phase red.

Any similar experiences? Would love to hear.