January 4, 2026

Why I'm Building a Browser Extension to Make AI Agents Actually Reliable

Rey Columna
January 4, 2026
2
min read

Why I'm Building a Browser Extension to Make AI Agents Actually Reliable

Everyone at CES is talking about "agentic AI" this week. Nvidia's keynote today will showcase agents that can orchestrate tools, access data, and take actions autonomously. It sounds incredible.

Here's what the demos won't show you: the moment your agent tries to click a button that moved, fill a form that changed, or navigate a workflow that requires actual browser state.

I'm the CTO at Welby Health, where we build technology for Remote Patient Monitoring and Chronic Care Management. Over the past year, we've built Markus, an internal AI assistant that helps our nurses and clinical staff with everything from patient lookups to workflow automation. Markus connects to our EHR, BigQuery, Looker, HubSpot, and a dozen other systems via MCP servers.

Markus is genuinely useful. Until it needs to touch a browser.

The Browser Problem

Browser automation has been "solved" for 20 years, if you're running deterministic test scripts against your own application. But AI agents don't work that way. They need to:

  • Navigate applications they weren't explicitly programmed for
  • Handle dynamic content, auth flows, and session state
  • Recover gracefully when the UI doesn't match expectations
  • Work with the same context a human would have

Playwright and Puppeteer are phenomenal tools. I've built MCP servers around both. But they're designed for automation, not observation. They operate headlessly by default, can't easily share browser state with a user's actual session, and struggle with the "I need to see what the user is seeing" use case.

When a nurse asks Markus to help pull data from a vendor portal we don't have API access to, the current answer is: sorry, can't help.

Enter Markus Bridge

Markus Bridge is a Chrome extension that flips the model. Instead of the agent controlling a headless browser, the agent observes and interacts with the user's actual browser, with their permission and in their context.

The architecture is straightforward:

  1. The extension captures page state (DOM, screenshots, metadata) and exposes a messaging API
  2. An MCP server bridges the extension to Markus via WebSocket
  3. Markus can now "see" what the user sees and take actions in that context

This means:

  • Auth is handled (user is already logged in)
  • Session state is real (no cookie/token juggling)
  • The agent sees exactly what the human sees (no "works on my machine")
  • Actions happen in context (user can watch, intervene, learn)

Why This Matters for Healthcare

Healthcare is full of systems that don't talk to each other. We have athenahealth for EHR, separate portals for labs, pharmacies, insurance verification, the list goes on. Many of these have APIs. Many don't. Some have APIs that are so limited they might as well not exist.

Our nurses spend hours copying data between systems. An AI agent that can reliably assist with that, while the human stays in control, is genuinely valuable.

But "reliably" is the key word. A browser automation that works 80% of the time is worse than nothing. It creates false confidence, wastes time on failures, and erodes trust in the tool.

The Honest Part

Markus Bridge isn't done. I'm still working through:

  • Reliability at scale: Works great for one nurse, but what about 50?
  • Security model: How much browser access is appropriate? How do we audit it?
  • Failure recovery: When an action fails, how does the agent communicate that clearly?

I'm sharing this now because the "agentic AI" conversation is dominated by people selling platforms and futures. There's less discussion about the actual engineering work required to make agents useful in constrained, high-stakes environments like healthcare.

The Takeaway

If you're building AI agents that need to interact with the real world, especially the messy world of healthcare IT, don't underestimate the browser problem. Headless automation is a starting point, not a solution.

The agents that actually get adopted will be the ones that work with users in their existing context, not the ones that promise to replace workflows entirely.

Happy to connect with others working on similar problems. What's been your experience with browser automation for AI agents?

Rey Columna
January 4, 2026
5 min read

Contact Us

Our team would love to hear from you.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.