The first time our platform took a stumble, it wasn’t a mystery as to why. We were a sales-led startup trying to act like an enterprise IT department. Sprints on the wall. Story points. Daily stand-ups at 9:00. And still, when a big customer hit us with real traffic, timeout graphs rose like a tide and we waded into the water with buckets of error logs.
We shipped fast. We also re-shipped, hot-fixed, apologized, and re-explained. We were living the contradiction a lot of B2B teams know: a fluid flow of work forced through rigid roles. Agile ceremonies on top; agency behavior underneath. The people closest to the work had the least permission to change how the work happened.
In late 2024, after a run of production incidents and a stack of rushed custom builds we weren’t even billing for, something gave. Our founder—smart, commercial, not a product person—stepped back. I was told: take the wheel. What followed is the point of this piece.
I’m going to describe a way of operating that looks like chaos from far away and feels like discipline up close. It’s a report from a small lab that got to try things most product teams can’t. I know many readers can’t flip their org overnight. My goal is to give you language, structure, and a few hard gates you can steal.
If you stick with me, I’ll show the numbers at the end.
The moment it clicked
Two things changed at once.
First, AI collapsed handoffs. Cursor writes good code fast; internal GPT condenses messy Slack threads and call notes into a clean SOW and a PRD with acceptance criteria (AC). The glue work—the parts that used to bounce between “that’s not my job” and “please schedule a meeting”—started to look like a bug we could delete.
Second, we stopped pretending jobs were fixed. The metaphor I use is soccer, but here’s the part people miss: everyone on the pitch knows how to kick, and they move. The starting positions are different—striker, fullback, keeper—but they overlap on purpose. Even the keeper runs out of the box sometimes. In our world:
- Engineers join customer calls.
- PMs demo.
- Solutions leads roadmap sessions when that’s the fastest path.
- Designers help write ACs.
- Account Management/Customer Success (AM/CS) is purely relational—the “politicians”—because all the admin is automated.
This is possible because we run every meaningful effort as a triad: PM + builder + technical customer-facing operator (usually Solutions; sometimes a very technical AM/CS). Remove one leg and you slide back to “agency mode.” Keep the three in the room and the work can flow without ceremony.
The improv jazz analogy and why we skip “tier 2”
Most org charts are built around the wrong musicians. They over-index on Tier 2—the seasoned players who can perform a set without sheet music but freeze when the band modulates on the fly. In an AI era that collapses handoffs and deletes admin, Tier 2 becomes the danger zone: experienced enough to defend the old charts, not fluent enough to improvise the new ones.
What we actually need is a barbell:
- Tier 1 — Readers (curious juniors). They’ve got the chart in front of them and real appetite. Give them repeatable work that still requires taste—marketing assets, micro-UX, content, QA via ACs—and let them hear the changes. They anchor tempo, ask blunt questions, and build muscle memory fast.
- Tier 3 — Improvisers (true pros). These are the players who can hear the room, call the key change, and trade fours without tipping into noise. They’re the ones who can comp, solo, and still keep time. In product terms: they can talk to a customer, rewrite the chart, ship the part, and deploy—without waiting for permission.
- Tier 2 — Memorisers (most professionals). Valuable in stable shows, they know the standards cold but don’t bend with the band. They maintain ceremony, not groove. In small, high-change teams, they slow the song.
The goal isn’t to eliminate Tier 1 or to “coach up” Tier 2 forever. It’s to staff for Tier 1 + Tier 3 and build a pipeline from 1 → 3:
- Delete the sheet music that isn’t music. Treat admin as a bug. Use GPT/HubSpot/scripts so juniors practice feel and phrasing, not form-filling.
- Put them onstage early. Tier-1s should sit in on customer calls, co-write ACs, run real-use tests, and post demo clips. Reps beat lectures.
- Triad every tune. PM + builder + technical “politician” (Solutions/AM) in the room; that’s your rhythm section. Without it, you drift back to wedding-band requests.
- Make one hard musical rule. Our acceptance criteria are the chord changes. If the parts don't pass on staging (builder + PM + one non-dev in a real-use run), it doesn’t ship. Freedom inside form.
- Promote on improvisation, not tenure. The moment a junior can hear changes and carry a chorus, hand them the mic. Titles follow chops, not years.
Agile’s manifesto was right about people and collaboration. The dissonance came when we told players to “be agile” while chaining them to rigid chairs and fixed parts. Be agile so you can work agile: hire readers and improvisers, automate the music stands away, and let the band vibe.
“Sounds like chaos.” Here’s the guardrail.
We replaced a stack of soft rituals with one hard gate:
No deployment unless 100% of Acceptance Criteria pass on staging, checked by the builder and the PM, and at least one non-developer runs a real-use test. No Friday deployments. Post a 60-second Loom validating the core flow. If the triad approves, we push. Then do a production smoke for consent, prizes, and data pushes.
Everything else is optional. The gate isn’t.
Why it works:
- Everyone owns quality. When designers and Solutions help write ACs and then use them, bugs move left.
- Real use, not toy QA. We build an actual experience on staging and try to break it the way a customer would. We QA for core flows. Non-blocking edge cases are a waste of time to look for.
- Visibility beats ceremony. The Loom in our #gtm channel pulls the right people in without a standing meeting.
The two lanes we actually run
We stopped pushing everything through a sprint. Instead:
1) Always-on product themes.
Enterprise B2B hides its pain in papercuts. A clunky color picker here, a drag-and-drop that slips, a token mismatch that makes the studio feel off. In ticket land, these never win. In reality, they add up to churn. We compost the small stuff into real initiatives—Design studio UX revamp, Onboarding revamp—and work them continuously.
2) Tactical revenue work (paid SOWs).
Custom asks don’t scare us anymore. We gate them. If it helps our ICP and will live in the product, we price by dev-day, write the SOW, lock ACs, get sign-off and payment before we write code, then ship. It isn’t slower; it’s cleaner. Our lead time for custom dev fell because we no longer had to rework buggy releases and we had tighter ACs (trust me, when money is on the table, customers will take a lot more time to properly scope out their requests).
Capacity this year shook out to about 40% SOW / 50% Core / 10% Bugs+Infra. You’ll have your own ratio. The point is to know it—and stop doing unpaid custom dev that quietly owns your roadmap.
What we refuse to build (and why)
Three gates, simple and sharp:
- Core fit or bust. If it won’t help our ICP’s core flows, we won't build it.
- Partner, don’t reinvent. If our tech partners already do it, we route there.
- Impact vs priority. True one-offs require a severe premium to justify dev time.
This year we walked away from Mexico’s largest convenience store chain. They wanted POS integrations, transaction routing to multiple tools (including a new loyalty platform which none of our customers use), likely US/MX hosting, language and time-zone overhead. The offer was lucrative. To do it right—and not derail our strategy—we’d have needed 3x the contract value. We referred them to partners and exited the RFP. It’s easier to say no when your team can actually say yes to the right work.
Coverage, not heroics
Here’s what it looks like on a normal day.
Real life example situation (22 Oct 2025):
A new enterprise retail customer wanted a cleaner form aesthetic for a prize flow—email only (no opt-in checkbox). Hiding the on-page opt-in broke their CRM push because the consent value disappears from the API payload if you delete the opt-in field in our software’s form. Our Solutions consultant was out, so was our CS. The triad collapsed into two: I jumped on the call with our product engineer. We clarified the use case, shipped a 30-minute workaround—hide the opt-in with our software’s custom CSS feature while preserving the consent flag in the payload. No incident. Customer unblocked. Real fix scheduled. Still GDPR compliant (defaulting all form submissions to “not subscribed” in the CRM push).
Coverage like this isn’t “hero culture.” It’s the point of overlap. People move to where the ball is because they can kick.
Selection over salvation
We did not train curiosity into people. We hired for it, and we built an environment where osmosis works.
The barbell that holds: pair experienced, technical, product-minded operators (Tier 3’s) with high-curiosity juniors (Tier 1’s) who devour tools and iterate. If curiosity isn’t there, no checklist will save you. If experience isn’t there, curiosity pulls it in—if seniors can translate the context.
My job changed. Less “owner of the Jira board.” More translator. Tailored PRDs and ACs that read the way the individual builder thinks. Demos that show the why, not just the what. A culture where people jump into a #gtm Loom because the work is interesting, not because a calendar invite told them to.
Below are tables showing the skill requirement shifts of three “roles” in our new product operating model. As you can see, we’ve heavily put a focus on breadth of skill as opposed to specialty (everyone kicks the ball).
“But we can’t do that here.”
Maybe not all of it. But you can steal the parts that hold the most weight:
- Install the hard gate. Make AC quality your blocker; put a non-dev in the test; ban Friday deployments.
- Staff your Tier 3 triad on anything critical. If you can’t, don’t start.
- Price your custom dev with SOW + AC + sign-off before dev.
- Merge papercuts into themes so you see the forest and validate their priority.
- Treat admin as a bug. If a human does it twice, a machine should do it next time.
This will feel risky in a strict hierarchy. It is risk management in a small team.
The receipts
We didn’t set out to “prove” a management theory. We were trying to survive, then to get good. Here’s what changed on the other side.
- Bugs reported within 5 days of push: effectively near 100% (‘24) → ~0% (‘25 YTD) for core flows.
- P1 incidents: 3 server related and 2 app related (Dec ’24–Mar ’25) → zero since April ’25 once we moved to GCP and locked in our new AC process.
- Bugs per month: 26.6 → 21.9 → 11.4 (’23 → ’24 → ’25 YTD).
- Custom dev lead time: ~20 days → ~12 days with SOW + AC gating and sign-off before dev.
- Retention: We retained 95% of our ARR this year (Oct ‘25 YTD).
- Revenue alignment: paid SOWs accounted for 15% of revenue this year. (last year: effectively £0 paid for custom dev).
- Average meeting load per individual: ~10 hrs/week → 1–2 hrs/week of structured meetings; we kept only a Monday “if needed” Roadmap Review and ship Looms in #gtm. Now we conduct as-needed working sessions through Slack huddles (no drop calls though - ask if free first).
*Total bugs reported in 2023: 319 - poor product principles, “agency” / feature factory operations.
*Total bugs reported in 2024: 263 (-17.6%) - brought in the Tier 3’s (product-minded, product taste, bias for $action).
*Total bugs reported in 2025: 111 (−57.8% - as of Oct ‘25 YTD) - shakeout of Tier 2’s, empowered Tier 3’s to improvise through deleting admin tasks (AI / automation) & loosening fixed job roles (everyone is product).
Sign off
This isn’t a victory lap. It’s a field note from a team that stopped acting like an org chart and started acting like an outfit. We got calmer by removing the parts that made us trip and by making sure every person on the field could kick the ball.
If you’re a founder or a head of product staring at a frustrated team, start small. Pick one effort. Put a Tier 3 triad on it (or find your Tier 3’s first). Write ACs that would embarrass you if they failed (vibe with a GPT for an hour on this - trust me). Price the custom dev work. Automate one piece of glue work. Post a Loom instead of calling a meeting.
Then do it again next week.