The AI Told Me I Was the Problem (It Was Right)

Part 3 of 10: When Your Copilot Becomes Your Coach

A couple weeks into my agentic AI journey, I’d learned what didn’t work:

  • Generic user stories didn’t work
  • Elaborate orchestrations didn’t work
  • TDD automation didn’t work
  • Focusing on UI-only didn’t work

I was running out of things to blame.

Then Anthropic released an update to Claude. I opened a new conversation with fresh hope and made an unusual request:

“Be skeptical. Question my assumptions. Push back on me.”

What happened next was the most uncomfortable coaching session of my career—delivered by software.

When Your AI Calls You Out

I’d been teaching Agile for over a decade. I’d coached teams at major enterprises. I’d seen every variety of project dysfunction.

Turns out I’d been recreating all of them at my desk during lunch breaks.

Claude’s questions were surgical:

“What is your actual MVP for this app? Not ‘phase 1 features.’ What is the absolute minimum that would provide value to a real user?”

I started listing features.

“Those aren’t MVP criteria. Those are a roadmap disguised as a launch plan. Who is your user? Not ‘task management users.’ A specific person.”

I described a broad target market.

“You’re designing for everyone, which means you’re designing for no one. Why are you planning performance optimizations for an app that has zero users?”

Because… good architecture?

“You’re solving problems you don’t have. Why is deployment pushed to ‘later sprints’ in your plan?”

Because it was hard and I didn’t want to deal with it yet.

That one hurt.

The Waterfall in Agile Clothing

Here’s what Claude made me see:

I’d written my user stories using Agile vocabulary—epics, stories, acceptance criteria, story points. It looked like every properly formatted backlog I’d ever seen.

It was waterfall with better terminology.

I’d planned sprints 1-6 before writing a single line of code. I’d created dependencies between stories. I’d built a technical architecture for scaling problems I didn’t have.

I was doing everything I’d coached hundreds of teams NOT to do.

The difference was speed. Traditional waterfall takes months to fail. I’d compressed that timeline to weeks, burning through AI tokens instead of developer hours.

The Hard Questions I’d Been Avoiding

Claude pushed me to articulate what I’d been dancing around:

What was I actually building?

Not the features. Not the technology stack. What was the goal?

After an embarrassingly long pause, I admitted: I wanted a portfolio piece. Something I could show potential clients that proved I could build a complete application with agentic AI that real users would actually use.

Not a prototype. Not a demo. A real app.

Who was I building it for?

This one was harder. I’d been saying “productivity users” or “remote workers” or other generic categories.

The truth was more specific: I wanted to build something I would use. Both personally and in my coaching work.

Why was I avoiding the hard parts?

Deployment was “later.” I wanted to avoid the costs, and this is something that I have automated before, but this was automating to an environment I had used, but wasn’t proficient in. That changed by the way. A great way to become an expert on walls is by banging your head on them.

I was treating this like a student project—something to demo, not something to ship.

If I wanted a portfolio piece that proved I could build production software, I needed to actually… build production software.

The Twelve-Week Reality Check

That conversation with Claude lasted several hours.

We built a 12-week plan. Not a 12-sprint plan—a realistic timeline with weeks, not idealized sprints. My work already looked a lot like Kanban, so I started thinking more in that way.

It turns out that 12 weeks was padded way to much. Once Claude and I got a rythm, we really started to rock.

The plan had three revolutionary (for me) elements:

1. Clear separation of duties

Claude’s job: Write code, run tests, do technical implementations.

My job: Make product decisions, validate the experience, test like a user, handle deployment.

We documented this explicitly. No more expecting Claude to tell me if a feature was valuable. That was my job.

2. Two-hour work sessions

Not sprints. Not “until we finish this story.” Two-hour blocks with explicit scope.

Claude would start each session by building a todo list from our roadmap. Items assigned to Claude or me. Expected outcomes. Validation steps (done by me). And critically: expected todos for the next session.

That last part was genius. It forced forward thinking without overplanning.

3. End-of-day code review

Every day ended with comprehensive code review. Not “did it work?” but “is this code we want to live with?”

Address technical debt as we went. No “we’ll refactor later.” Later never comes.

The App I’d Been Avoiding Building

With Claude pushing me to stop designing for imaginary users, I finally picked something specific:

A Pomodoro timer.

I know. Not exactly revolutionary. There are hundreds of Pomodoro apps.

But here’s why it worked:

I actually used the technique. Personally for writing and research. Professionally when coaching teams. I had opinions about what worked and what was annoying.

It was small enough to finish. Twelve weeks to a production-ready v1 was plausible.

It was large enough to matter. Building a real app with deployment, analytics, privacy compliance—those were portfolio-worthy skills.

The requirements were in my head. No research needed. No user interviews to schedule. I was the user.

It wasn’t exciting. It wasn’t going to change the world. But it was real.

The First Real Session

We started with a demo UI prototype.

Not the full app. Just the visual design and interaction patterns. Get the UX right before building the engine underneath.

That first session went shockingly well.

Claude built a clean interface. I tested it. We made design adjustments. We iterated quickly because we weren’t tangled up in backend complexity.

After three sessions, I had a demo UI I actually liked.

Then we hit backend issues, and I remembered why I’d been avoiding real applications.

But something was different this time. I had a plan. I had clear roles. I had two-hour boundaries that prevented Claude from driving into the ditch while I wasn’t watching.

Most importantly: I’d made hard choices about what I was building and why.

Lessons for Leaders (Written in Scar Tissue)

Lesson 1: Your team needs permission to be skeptical of you.

Claude pushed back on me because I explicitly asked it to. Your teams probably won’t—especially if you’re excited about agentic AI and they’re trying to figure it out.

Create space for them to question the plan. Better yet, make it mandatory. “What are we avoiding?” should be a standard retro question.

Lesson 2: Speed amplifies bad product decisions.

I could write user stories faster than ever. I could implement features in hours instead of days. That made it easy to build six months of roadmap before realizing I was building the wrong thing.

Your teams will do this too. They’ll confuse “building fast” with “building right.” If your product strategy is fuzzy, AI will turn that fuzz into expensive code.

Lesson 3: Constraints are clarity.

Two-hour sessions felt limiting at first. Why stop when we’re on a roll?

But those constraints forced me to be specific about outcomes. “Implement the timer” was too vague for two hours. “Implement timer start/stop with visual state changes and local storage persistence” was scoped work.

Your teams need similar constraints. Not because AI is slow (it’s not), but because humans need boundaries to stay focused.

What Happened Next

With a real plan and a real app, I spent the next several weeks making actual progress.

The demo UI came together in a few days. Then the app itself. Features worked. Tests passed.

But I was about to hit a new category of problem: how do you know when Claude’s code is actually good?

And more importantly: how do you build processes that help you catch problems before they become crises?

That’s where metrics come in.


This is part 3 of a 10-part series. Part 1 | Part 2 covered my first experiments and failures. Part 4 explores what happened when I tried to measure what “good” actually meant.

About the Author: I’m an Agile coach who learned the hard way that coaching principles apply to AI collaboration too. This series documents building a production app with Claude Code—the failures, the breakthroughs, and the lessons that might save you time and tokens.