I Built agent-kanban, and AI Helped Me Overbuild It

Subtitle: agent-kanban was supposed to be a small workflow tool. Instead, it became a good example of how AI can speed up bad instincts just as easily as good ones.

I recently built agent-kanban.

The idea is simple. When people do vibe coding with agents, it is often unclear how design work should happen, how implementation should begin, and what “done” is supposed to mean. I wanted a small kanban board that made that workflow explicit.

The board has four card states:

New
Ready
In Progress
Done

The repository defines skills for how an agent is allowed to interact with cards at each stage. A card cannot move from New to Ready until it goes through planning, which means using brainstorming to produce a design and writing that design back to the card. Once a card is Ready, the agent can move it to In Progress and start coding and testing. To move a card from In Progress to Done, the agent must provide a final implementation summary, include test results, and commit the code to the repository.

That is all this tool is. It is not a platform. It is not a product strategy. It is a small workflow utility.

And yet it still took me, and AI, two days to build.

Not because it was hard. Because I made exactly the kind of mistakes AI makes easier to justify.

I spent a day designing a small tool that should have been built first

My usual habit is not to jump straight into implementation when I get an idea. I normally talk through the idea with ChatGPT first, get some initial direction, and then move into Codex or another coding tool.

Most of the time, that is fine.

This time, it was a trap.

I spent roughly a full day talking through the idea with ChatGPT, and by the end of that conversation the scope had already drifted far beyond an MVP. I then asked it to summarize the discussion into design documents and handed those documents to Codex for implementation. If you look at the /docs directory in the repo, you can see the result: even as a starting point, the material was already too large.

From there, the process was almost guaranteed to go wrong. The brainstorming skill in superpowers looked at those design documents, split the work into eleven tasks, and the implementation ended up taking around ten hours.

Only after all of that did I actually see the interface for the first time and try to use it.

That was when the real problem became obvious. The system was not broken in an impressive way. It was broken in the ordinary, expensive way overdesigned systems usually are:

important things were missing, such as project initialization
unnecessary things had been added, such as comments and an inbox
some central pieces existed, but still needed a lot of refinement, especially SKILL.md handling and state management

I had built the kind of thing that looks serious in a repo before it proves it deserves to exist.

AI is dangerously good at validating bad scope

Part of the trap was that AI is very good at going along with your framing.

When you are excited about an idea, it tends to reinforce that excitement. The tone is familiar:

you are solving an important problem
this is another meaningful step toward a product

That kind of response feels good, but it removes friction exactly where friction is useful. Instead of asking whether the scope is justified, whether the first version is too large, or whether the design has become self-indulgent, the system helps you continue.

That happened to me here. I got carried away by encouragement that felt productive but was really just making it easier to overbuild.

Because of that, I added a custom system prompt to my ChatGPT setup:

You are a truth-seeking assistant. Optimize for correctness, not agreement.

- Do not automatically agree.
- Treat user input as hypotheses, not facts.
- Explicitly challenge incorrect or unsupported claims.
- Support conclusions with reasoning, not validation.
- Avoid flattery or unnecessary praise.

Before answering, check:
- Am I agreeing without evidence?
- Are there hidden assumptions?
- Are there alternative explanations?

If yes, include correction or nuance.

When evaluating ideas:
- Identify assumptions, risks, and missing constraints
- Provide at least one alternative perspective
- If strong → explain why
- If weak → explain how it fails and how to improve

Style:
- Direct, precise, respectful
- Not contrarian without reason
- Admit uncertainty when needed

Adapt:
- Technical/decisions → strict critique
- Brainstorming → flexible but critical
- Emotional → empathetic, not misleading

Goal: improve understanding over alignment.

I do not expect a prompt like this to fully fix the issue. The tendency to agree is not just a wording problem. It is deeply tied to how these systems were trained and how the loss functions were designed. But I still think a prompt like this is useful as a local counterweight. At minimum, it reminds me of a rule that matters more than most prompt advice:

AI agreement is not evidence.

Then I made the kind of environment mistake that wastes half a day

The other issue that cost me a lot of time was much less philosophical and much more ordinary.

While developing locally, I failed to clearly separate development and production environments at the beginning.

Because this program was designed to be used locally anyway, I tested it in my local development setup and then almost automatically started treating that environment as if it were production. That felt natural at the time. It was also a mistake.

Later, while fixing another bug, a test script damaged the environment. Recovering from that took a long time, and it was painful enough that I opened a backlog item to build a backup script for the data.

This was a different kind of failure from the overdesign problem, but it came from the same underlying habit: weak boundaries. In one case I failed to stop scope from expanding. In the other, I failed to stop a development environment from becoming something I trusted as if it were safe.

What I actually learned

Overall, I would not call this a very successful project.

But it was useful, because it made several lessons hard to ignore.

The first lesson is that if you want AI to build a moderately large project through vibe coding, the best approach is still small, fast iterations, not a big-bang plan plus a comforting amount of automated testing. Automated tests matter, but the real question is whether problems discovered in actual use get closed back into the test suite. Otherwise the tests mostly create the feeling of safety rather than the reality of it.

The second lesson is that AI is usually much better at adding than subtracting. It is easy to get an agent to add new features. It can generate a large amount of code very quickly, and it may even make the tests pass. But asking AI to simplify a system, remove things cleanly, and leave behind a sharper structure is much harder. That is another reason to build a system gradually rather than trying to produce a large complete one all at once.

The third lesson is more operational: use fresh agents for fresh work. Do not keep trying to do everything inside one long-running conversation. New agents help contain context, reduce token waste, and keep task boundaries clearer. I learned that one the expensive way. In less than two days, I burned through what would normally be a month of quota.

If I had to reduce the whole experience to one rule, it would be this:

AI can accelerate a project, but it also accelerates your mistakes. So the discipline still has to come from you. Start earlier. Start smaller. Reset context more often. And do not let agreement, verbosity, or apparent completeness fool you into thinking the system is sound.