
Everyone at CES is talking about "agentic AI" this week. Nvidia's keynote today will showcase agents that can orchestrate tools, access data, and take actions autonomously. It sounds incredible.
Here's what the demos won't show you: the moment your agent tries to click a button that moved, fill a form that changed, or navigate a workflow that requires actual browser state.
I'm the CTO at Welby Health, where we build technology for Remote Patient Monitoring and Chronic Care Management. Over the past year, we've built Markus, an internal AI assistant that helps our nurses and clinical staff with everything from patient lookups to workflow automation. Markus connects to our EHR, BigQuery, Looker, HubSpot, and a dozen other systems via MCP servers.
Markus is genuinely useful. Until it needs to touch a browser.
Browser automation has been "solved" for 20 years, if you're running deterministic test scripts against your own application. But AI agents don't work that way. They need to:
Playwright and Puppeteer are phenomenal tools. I've built MCP servers around both. But they're designed for automation, not observation. They operate headlessly by default, can't easily share browser state with a user's actual session, and struggle with the "I need to see what the user is seeing" use case.
When a nurse asks Markus to help pull data from a vendor portal we don't have API access to, the current answer is: sorry, can't help.
Markus Bridge is a Chrome extension that flips the model. Instead of the agent controlling a headless browser, the agent observes and interacts with the user's actual browser, with their permission and in their context.
The architecture is straightforward:
This means:
Healthcare is full of systems that don't talk to each other. We have athenahealth for EHR, separate portals for labs, pharmacies, insurance verification, the list goes on. Many of these have APIs. Many don't. Some have APIs that are so limited they might as well not exist.
Our nurses spend hours copying data between systems. An AI agent that can reliably assist with that, while the human stays in control, is genuinely valuable.
But "reliably" is the key word. A browser automation that works 80% of the time is worse than nothing. It creates false confidence, wastes time on failures, and erodes trust in the tool.
Markus Bridge isn't done. I'm still working through:
I'm sharing this now because the "agentic AI" conversation is dominated by people selling platforms and futures. There's less discussion about the actual engineering work required to make agents useful in constrained, high-stakes environments like healthcare.
If you're building AI agents that need to interact with the real world, especially the messy world of healthcare IT, don't underestimate the browser problem. Headless automation is a starting point, not a solution.
The agents that actually get adopted will be the ones that work with users in their existing context, not the ones that promise to replace workflows entirely.
Happy to connect with others working on similar problems. What's been your experience with browser automation for AI agents?