Hidden risks in looking for medical recommendation from LLMs
Final 12 months, ChatGPT handed the US Medical Licensing Examination and was reported to be “extra empathetic” than actual docs. ChatGPT at present has round 180 million customers; if a mere 10% of them have requested ChatGPT a medical query, that’s already a inhabitants two occasions bigger than New York Metropolis utilizing ChatGPT like a physician. There’s an ongoing explosion of medical chatbot startups constructing skinny wrappers round ChatGPT to dole out medical recommendation. However ChatGPT will not be a physician, and utilizing ChatGPT for medical recommendation will not be solely in opposition to OpenAI’s Utilization insurance policies, it may be harmful.
On this article, I establish 4 key issues with utilizing present general-purpose chatbots to reply patient-posed medical questions. I present examples of every drawback utilizing actual conversations with ChatGPT. I additionally clarify why constructing a chatbot that may safely reply patient-posed questions is totally totally different than constructing a chatbot that may reply USMLE questions. Lastly, I describe steps that everybody can take — sufferers, entrepreneurs, docs, and corporations like OpenAI — to make chatbots medically safer.
Notes
For readability I exploit the time period “ChatGPT,” however the article applies to all publicly accessible general-purpose massive language fashions (LLMs), together with ChatGPT, GPT-4, Llama2, Gemini, and others. Just a few LLMs particularly designed for drugs do exist, like Med-PaLM; this text will not be about these fashions. I’m centered right here on on general-purpose chatbots as a result of (a) they’ve essentially the most customers; (b) they’re straightforward to entry; and (c) many sufferers are already utilizing them for medical recommendation.
Within the chats with ChatGPT, I present verbatim quotes of ChatGPT’s response, with ellipses […] to point materials that was overlooked for brevity. I by no means overlooked something that will’ve modified my evaluation of ChatGPT’s response. For completeness, the complete chat transcripts are supplied in a Phrase doc connected to the tip of this text. The phrases “Affected person:” and “ChatGPT:” are dialogue tags and have been added afterwards for readability. These dialogue tags weren’t a part of the prompts or responses.