
Three stories about confidence, control, and learning what actually scales.
At the start of 2025, I was convinced we finally had the right shape.
Not a customer product. Not something anyone outside Welby Health could log into. An internal system, built specifically for how we operate, how we coordinate care, and how we make decisions under real clinical and operational pressure.
We call it MARKUS.
It gets marketed. It gets demoed. We talk about it openly because it represents how far we’re willing to push AI inside healthcare. But no one outside Welby ever touches it. MARKUS isn’t sold. It isn’t licensed. It’s how our teams work.
That distinction matters more than I realized at the beginning of the year.
I remember saying things like, “Once this is in place, execution gets easier.” I remember believing that certainty meant clarity.
It didn’t.
By the end of 2025, MARKUS exists in a very real way. Our teams rely on it daily. Decisions move faster. Hand-offs are cleaner. Work collapses instead of stacking. Patients benefit indirectly, even though they never interact with it.
But it didn’t come together because I was right. It came together because I was wrong in a few specific ways, and those mistakes forced changes I wouldn’t have chosen voluntarily.
I want to tell you what I learned. What I actually have are three stories about being wrong, and a growing suspicion that I’m still wrong about parts of all three.
Early in the year, I obsessed over capability.
Models. Tooling. Context. Latency. Real-time voice. How many internal workflows MARKUS could reach into. How impressive it looked in demos.
When something didn’t land internally, my instinct was technical. Better prompts. More context. Tighter orchestration.
What I was really avoiding was the harder truth.
The hard part wasn’t intelligence. It was trust.
Because MARKUS isn’t used by customers. It’s used by our own people. Clinical ops, product, engineering, client success, billing. People who already carry real responsibility and real consequences.
They didn’t need something clever. They needed something predictable.
Once we stopped optimizing MARKUS for how it looked in a demo and started optimizing it for how it behaved on a Tuesday afternoon when things were messy, everything changed. We removed capabilities. Narrowed scope. Forced explicit hand-offs. Built refusal paths before adding power.
On paper, it looked smaller. In practice, it became usable.
Here’s my confession: I don’t know if I could have skipped the overbuilding phase and learned this sooner. I’d like to believe that. Experience says otherwise.
As MARKUS spread internally, friction followed.
Not because people didn’t believe in it, but because it touched how different teams think about risk, ownership, and decision-making.
I treated disagreement as a coordination problem.
More meetings. Clearer narratives. Stronger alignment language.
What I was actually doing was compressing dissent.
The moment that stuck with me came from a quiet engineer who said, “I don’t think people disagree with the direction. I think they’re unsure whether it’s safe to say what feels wrong.”
That reframed everything.
MARKUS wasn’t just an internal system. It was a mirror. It surfaced assumptions we’d been carrying quietly. It forced conversations we’d been postponing.
Once disagreement became explicit, the system got messier and more accurate at the same time. Different teams needed MARKUS to behave differently, and pretending otherwise was the real risk.
Here’s the uncomfortable part: I still like alignment too much. I like momentum. I don’t always spot when silence is resistance. I just know the cost of ignoring it now.
This one took the longest to see.
Because MARKUS is internal and high-leverage, I felt responsible for everything it touched. Every edge case. Every decision. Every escalation.
I was everywhere. Reviewing flows. Jumping into threads. Responding instantly.
I told myself I was protecting quality. What I was actually doing was turning myself into a dependency.
The shift didn’t come from a failure. It came from absence.
Late in the year, I stepped back from parts of MARKUS decision-making. Not as an experiment. Mostly out of exhaustion.
The team didn’t stall. They didn’t lower the bar. In several cases, they made better calls than I would have.
That’s when it clicked.
MARKUS couldn’t be an operating system if it required an operator.
I still struggle with this. I still jump in too fast. But the system didn’t stabilize until it could operate without me as the final checkpoint.
That wasn’t a technical milestone. It was a leadership one.
If you asked me in January of 2025 what success looked like, I would have described a system.
If you ask me now, I’d describe a behavior change.
MARKUS matters not because it’s intelligent, but because it reshaped how our internal teams make decisions, where human judgment stays essential, and where work quietly disappears instead of accumulating.
We market it. We demo it. We’re proud of it. Not because anyone else can use it, but because it proves what’s possible when AI is built to serve real operations, not abstract users.
I’m still wrong about parts of this. I’ll discover those later, in hindsight, the same way I discovered these.
What I know is this: being willing to be wrong internally mattered more than any architectural decision we made.
If you’re building internal AI systems and it feels slower, messier, and more personal than you expected, you’re probably doing something meaningful.
I don’t have clean lessons. Just a system that pushed back, and a team that made it better by not letting me be right too easily.
That might be enough.
Or it might just be the story I’m telling myself now at the beginning of 2026.
Either way, it’s the most honest one I have.