The Difference Between Working and Understanding Why

Part 7 of 10: When Best Practices Met Real Practices

A book dropped that had a big impact on me.

“Agentic AI Designs” from Google. Fresh research on AI development patterns, team structures, and architectural approaches.

I’d been building my Pomodoro timer for a while. I had practices that worked. Session retrospectives that showed improvement. A system that felt sustainable.

But I also had a nagging question: was I doing this well, or just doing this differently than before?

Time to find out.

The Reading Assignment I Gave Myself

I did something that would have seemed impossible early on:

I had Claude read the entire book and make recommendations specific to my project.

Not generic advice. Not “here’s what the book says.” But: “Based on this book and your current practices, here’s what you should consider changing.”

I asked for the top 5 suggestions, why they were made, what improvements I might see in development, and what improvements users of my app might see.

As Claude sometimes does, it went outside the bounds of my ask and told me some of the things that I do that already have alignment to suggestions from the book. I get a strange feeling when I get suggestions to improve my process. I am excited about the possible upsides – improved efficiency, better quality, more confidence in changes. I also get a pit in my stomach – will this be worse than it was? What if it breaks something. That’s why were Agile though, right? Try something, if it doesn’t serve you, toss it out.

The Retrospectives That Needed Structure

Google’s research emphasized measurement, but not just any measurement.

I’d been doing session retrospectives after each completed story. Useful, but I wasn’t categorizing what I learned.

The book pushed for what they called “effectiveness tracking”:

Clarity of requirements – Rated by both human and AI. Did we actually understand what we were building before we built it?

Rework – Not just cycles, but why. Was it unclear requirements? AI bugs? Human verification failures? Changed decisions?

Bugs found – I was tracking this, but not categorizing. Were they logic bugs? Integration bugs? Edge cases? UI issues?

Iterations to fix – The book suggested tracking it per bug type. Logic bugs should take fewer iterations than race conditions.

I enhanced my retrospective process. The new version forced me to categorize, not just observe. I kept a history of all of our retros. When I mine them for potential changes, the categorizations help immensely. We know what was being worked on at the time, and other confounding issues that may have been happening.

I very much enjoyed the retrospectives. I did have to keep a close eye on Claude. He would sometimes make statements that seemed to reference one off issues. Too often he would have high praise for me. Here is a power move for your Claude prompts – “Be Skeptical”. When I tell Claude to be skeptical he is often more honest about his own performance and mine.

The Code Review That Became a System

I’d been doing end-of-day code reviews. But they were ad hoc.

Sometimes I’d catch major issues. Sometimes I’d miss obvious problems because I was tired or rushed.

The book recommended structured review frameworks. Not checklists (I had those), but frameworks with escalating levels:

Level 1: Quick review (every commit)

  • Does it work?
  • Does it match requirements?
  • Are there obvious issues?

Level 2: Standards review (end of day)

  • Does it follow coding standards?
  • Is it maintainable?
  • What technical debt did we create?

Doing two different reviews is a great way to find gaps between quick reference and full standards checks. The more things found in the standards review, the more likely I am to reconsider what is in the quick reference.

Painful. But necessary.

The Quick Reference That Became a Pattern

Remember my quick reference guides? The ones that reduced token usage by 90%?

Turns out Google had done something similar internally. They called it “tiered documentation.”

Load minimal context by default. Add detail only when needed. Track what’s missing to improve the tiers over time.

I’d accidentally implemented a Google best practice through token stinginess.

But the book suggested one addition I hadn’t considered: versioning.

My quick references were living documents. I’d update them based on gap tracking. But I had no version history, no changelog, no way to know what changed when.

What if a bug appeared in code approved weeks ago? Had my standards changed since then? Was it something I’d always prohibited but missed in review, or something I’d only recently added to standards?

I started versioning the quick references. Dating comprehensive reviews. Tracking which version of standards was in effect for each code commit.

Suddenly, debugging across time became possible. “This bug was introduced when X was approved, using quick reference v2.3, which didn’t include the pattern that would have prevented this.”

That’s actionable learning.

The Session Planning That Needed Planning

I’d been doing two-hour sessions with todo lists Claude generated from our roadmap.

Worked great. But the book suggested something I’d been avoiding: meta-planning.

Not just “what will we do this session?” but “how will this session advance the larger goal?”

Each session should have:

  • Clear scope (I had this)
  • Expected outcomes (I had this)
  • Validation plan (I had this)
  • Connection to roadmap (I sort of had this)
  • Definition of success beyond “feature works” (I didn’t have this)

That last one hit hard.

I’d been measuring success as “feature implemented and verified.” But that’s not actually success. That’s completion.

Success is: “feature implemented, verified, user need met, no technical debt created, team understanding improved.”

Those last three—user need, technical debt, understanding—I wasn’t tracking.

I added them to session planning:

User need: Why are we building this? Not the feature, the need behind it.

Technical debt: What are we choosing not to solve now? Is that debt acceptable?

Understanding: What am I learning this session? What should I be learning?

I changed my workflow to be more based around Issues on GitHub as a part of this change. The visual aspects helped my a lot when reprioritizing. I also felt like my project was less cluttered with documentation. It was a fairly hefty lift to make all the changes. It paid benefits though.

But the work itself was more focused. And the retrospective was more honest.

The Deployment Process I’d Been Avoiding

My roadmap had “deployment” pushed later.

The book asked: why?

Good question. Because deployment is hard? Because I wanted to focus on features first? Because I could test locally and didn’t need production yet?

All true. All wrong.

Google’s research showed that teams who delayed deployment ended up building the wrong things. They optimized for problems that didn’t exist. They made architectural choices that didn’t match production constraints.

Deployment isn’t just about shipping. It’s about learning.

I rebuilt my roadmap. Deployment moved earlier. Not “full production” but “real deployment pipeline with real hosting and real users testing.”

I knew this in my head, but didn’t live it in my heart. Deploying to prod is scary. Breaking prod is scarier. I had to trust the decades of experience I had and just push forward. That is what iterations and practice are for. Deploy sooner, break sooner, find problems sooner. Is it a pride issue, not wanting to see when I got things wrong? I usually had a project sponsor to do the pushing. In this environment though, it was only me. I believe in me though, so I pushed forward.

Lessons for Leaders (From Someone Who Got Schooled by Research)

Lesson 1: Your practices probably work—but you don’t know why.

I had a system that improved my productivity. Retrospectives proved it.

But I couldn’t explain why it worked, which meant I couldn’t teach it, scale it, or adapt it when conditions changed.

Google’s research gave me vocabulary and frameworks. Now I could explain: “Tiered documentation reduces context without losing accessibility” instead of “I made some short reference docs.”

Your teams need this too. It’s the difference between “this worked for us” and “this should work for you.”

Lesson 2: What you don’t measure will hurt you.

I was measuring output (features, bugs, tokens). I wasn’t measuring understanding.

Did I actually learn how the code worked? Could I maintain it months from now? Was I building expertise or just watching Claude code?

Those questions matter. If AI development turns developers into code reviewers who never understand the code, you’re building a time bomb.

Lesson 3: Delay is often avoidance.

Pushing deployment to “later” felt pragmatic. It was actually fear.

Fear I’d built the wrong thing. Fear deployment would expose problems I didn’t want to face. Fear real users would hate what I’d made.

Your teams will do this too. They’ll push the hard stuff to “next sprint.” That’s not planning, it’s avoidance.

Make the hard stuff early and unavoidable.

The Practices That Survived Research Review

After rebuilding everything based on the book:

Kept:

  • Two-hour sessions (research supported this strongly)
  • Parallel work with role separation (key practice in Google’s findings)
  • Quick reference guides (validated as “tiered documentation”)
  • Metrics tracking (but enhanced significantly)

Changed:

  • Added structured review levels (quick/standard)
  • Implemented version control for standards and references
  • Enhanced session planning with success criteria
  • Moved deployment much earlier

Added:

  • Weekly comprehensive reviews with full standards
  • Gap tracking that feeds continuous improvement
  • Meta-planning for session goals beyond task completion
  • Explicit learning objectives for each session

The Confidence That Comes From Validation

Here’s what changed after reading the book:

Before: “I think this is working.”

After: “Research suggests this should work, my retrospectives confirm it’s working, and I understand why it works.”

That’s not hubris. That’s evidence-based practice.

I’d gone from intuition to measurement to research validation. I had a system I could explain, defend, and teach.

More importantly: I had a system I could evolve.

Because the book didn’t just validate what worked. It showed me what I was still missing.

And what I was missing was about to become critical.


This is part 7 of a 10-part series. Part 8 explores what happened when that system faced its first real test: shipping to actual users.

About the Author: I teach Agile practices and coach engineering teams. This series documents building with agentic AI—including the moment when research showed me I’d been lucky to stumble into some best practices and foolish to miss others.