Americans ask AI for health care. Hospitals think the answer is more chatbots. – Ars Technica

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
Do you trust AI chatbots for health advice? What about one in your patient portal?
With many Americans turning to Large Language Models for health advice, health systems around the country are eyeing and even rolling out their own branded chatbots in an attempt to harness this already popular tool and steer more people to their services. But the burgeoning trend is raising immediate questions and concerns for the country’s complicated and generally underperforming health care system.
Executives frame the new offerings as a convenience for patients, meeting people where they are and providing a service with digital equity. They also suggest their chatbots will be a safer alternative to commercial versions people are using now.
“We are at an inflection point in healthcare,” according to Allon Bloch, CEO of clinical AI company K Health. “Demand is accelerating, and patients are already using AI to navigate their lives.”
K Health is working with partner Hartford HealthCare, in Connecticut, to roll out its PatientGPT chatbot to tens of thousands of its existing patients.
“The question isn’t whether AI will shape healthcare, it’s about how we do it in a safe, transparent way, inside a health system that connects to your medical records and your care team. PatientGPT represents that turning point.”
But some experts are wary of the rollouts, raising concerns about whether chatbots are ready for such branded debuts, if there will be sufficient monitoring, what liability will look like, and also whether or not this is the answer to the care problems patients are really raising.
While these risks and questions swirl, the benefits to patients are still only hypothetical. “It’s a tempting idea,” Adam Rodman, a clinical reasoning researcher and internist at Beth Israel Deaconess Medical Center in Boston, told Stat News recently. But, there isn’t yet an evidence-base to show that integrating chatbots into health systems improves patient outcomes. “We’re not there yet,” he said.
To consider AI’s potential role, it’s useful to consider the wider context of US health care. America is one of the wealthiest countries in the world, but its health care system consistently and significantly underperforms compared with those of other high-income countries. Americans have lower life expectancy, more avoidable deaths, higher rates of maternal and infant deaths, and higher rates of obesity and chronic conditions. Americans have less access to care and worse health outcomes. The US is an outlier in not providing universal care. A 2023 report found that nearly a third of Americans—more than 100 million people—don’t have a primary care provider.
Now artificial intelligence has entered this mix. Anyone with an Internet connection can access comforting, confident-sounding LLM-powered chatbots, and Americans are navigating in droves to these new tools to ask health and medical questions. A poll from KFF last month found 1 in 3 adults have used an AI chatbot for health information.
Among those that used AI, 41 percent reported uploading personal medical information, like test results, to the tool. When asked about their “major” reasons for turning to AI, 19 percent said it was because they couldn’t afford care, and 18 percent cited not having a regular health care provider or not being able to get an appointment. Sixty-five percent, meanwhile, said they just wanted a quick answer. In the end, many said they didn’t follow-up with a doctor after their AI consults, including 58 percent who asked about mental health and 42 percent who asked about physical health.
With so many Americans using AI to fill health care gaps, there’s now mounting cautionary tales and horror stories. The examples highlight pitfalls in both what the LLMs are asked and what information they’re hoovering up.
In February, a study in Nature Medicine involving nearly 1,300 participants tried to assess the medical accuracy of LLMs (specifically GPT-4o, Llama 3, and Command R+) in real-world interactions. When the researchers provided the LLMs with text of specific medical scenarios, the LLMs were able to correctly identify the medical condition about 95 percent of the time and correctly identify the next steps—such as going to an emergency department—about 56 percent of the time. But, when the participants used their own prompts to ask about the same medical scenarios, the LLMS were only able to help correctly identify a medical condition about a third of the time. The LLMs steered participants to the appropriate next step just 43 percent of the time.
The study essentially shows that “people don’t know what they are supposed to be telling the model,” lead author Andrew Bean, an AI researcher at Oxford University, told NPR last month.
Senior author Adam Mahdi added: “The disconnect between benchmark scores and real-world performance should be a wake-up call for AI developers and regulators.”
Then there’s the concern about the quality of medical information LLMs may pull in. Just last week, Nature News reported that LLMs were chatting with users about “bixonimania,” a skin condition that was entirely made up by researchers in Sweden. The team posted two fake studies online on the condition wanting to see how easily medical misinformation would get taken up by AI tools. Too easily, was the answer. They have since taken the studies down.
Nevertheless, several health care systems are moving forward with their own chatbots. Hartford HealthCare and K Health’s PatientGPT was rolled out as a beta version to select patients last month, and the company is planning to expand the rollout to tens of thousands more this week, according to Stat.
Hartford posted a pre-print (not peer-reviewed) study involving 75 participants that suggested its iterative stress testing (aka red teaming approach) improved its failure rate, particularly in “high risk” scenarios, over time. The testing dropped the failure rate in high-risk scenarios from 30 percent to 8.5 percent. But what that means for real-life settings is unclear—as is how bad the 8.5 percent failures might be.
According to Stat, PatientGPT works in two modes: a generic medical question-and-answer mode that may incorporate information about the patient; or a “medical intake” mode, in which a patient starts providing symptom information and the chatbot gets less chatty and starts going through clinical flow charts. After the AI agent collects enough information in the intake mode, it will provide a next step, including setting up a follow-up appointment with primary care or seeking urgent or emergency care. If the latter is recommended, the chatbot stops responding to further questions.
Hartford said it will continue to monitor the chatbot’s performance amid the larger rollout. In piloting, Hartford was monitoring every interaction. But now the system will drop down to having human reviews of just 20 interactions a day while a separate AI agent monitors the rest. They’ll also do batch studies of every 1,000 conversations.
“We’re on a mission to be the most consumer centric health system in the country,” Jeff Flaks, president and CEO of Hartford HealthCare, said last month. “So much of healthcare has traditionally been organized around the provider, but it’s clear we have to meet people where they are and where they desire to be met. With PatientGPT we are introducing a new tool that supports your health and provides access to a 24/7 care team, while protecting the human relationships at the heart of care.”
Beyond PatientGPT there’s Emmie, an AI chat assistant being released by Epic, the electronic health records behemoth behind MyChart. Several health systems are slowly rolling Emmie out to users through the online portal, including California-based Sutter Health and Indiana-based Reid Health.
In an executive address last year, Epic’s founder and CEO, Judy Faulkner, described Emmie as an assistant that can help patients prepare for appointments by drafting visit agendas and, afterward, help patients understand test results and answer follow-up questions, according to reporting by Becker’s Hospital Review.
Sutter Health’s FAQ on Emmie notes that the chatbot can “answer general health questions, and find or summarize information already visible in your chart—such as notes, results, past visits or messages.” But it emphasizes that it “doesn’t give personalized medical advice or make care decisions. Emmie is not intended for use in the diagnosis of disease or other conditions, or in the cure, mitigation, treatment or prevention of disease. Emmie is also not intended to replace, modify or be substituted for a physician’s professional clinical judgment.”
Right now, Emmie is only offered to a small subset of Sutter patients. Those patients are able to provide feedback on Emmie’s responses with simple thumbs-up or thumbs-down reactions.
Reid Health is following in Sutter’s footsteps as the second Emmie adopter. In an interview last week with Becker’s, Muhammad Siddiqui, CIO at Reid Health, noted that the system largely serves rural communities and that the company sees Emmie as a way to broaden access and help patients navigate care.
“Patients want clearer answers, easier access and more guidance between visits,” Siddiqui said. “If we can provide that inside the health system experience, in a way that is connected to trusted clinical workflows, that is a much better path than leaving people on their own with public tools that may or may not be accurate.”
Ars Technica has been separating the signal from the noise for over 25 years. With our unique combination of technical savvy and wide-ranging interest in the technological arts and sciences, Ars is the trusted source in a sea of information. After all, you don’t need to know everything, only what’s important.

source

Scroll to Top