As healthcare technology professionals, we are keenly aware of the potential of large language models (LLM) and generative artificial intelligence (AI) to revolutionize how we deliver care and manage our workflows. LLMs are AI models trained on large sets of data, created to process, and produce text. Generative AI uses the power of machine learning algorithms to produce novel and original content. There has been a flurry of activity in the development of LLMs and Generative AI designed specifically for clinicians and patients, with the promise of increased efficiency, reduced clinician burden and burnout, unrivaled decision support, and improved patient engagement. This technology is advancing rapidly, and while the applications of LLMs and generative AI in healthcare are still in their infancy, it is important to consider the limitations and potential risks associated with their use.
In this article, we present a framework for thinking about the applications of LLMs and generative AI. This framework assists with the evaluation of use cases and assessing limitations and risks before considering adoption in clinical practice. We also discuss the spectrum of use-cases for LLMs and generative AI, ranging from low-risk classification tasks to high-risk tasks such as diagnosis and treatment.
Machine learning-based predictive analytics have long been used in healthcare settings and have generally received significant scrutiny and evaluation prior to implementation. LLMs and generative AI require the same degree of scrutiny and validation given the risk of inaccuracies and hallucinations. However, generative AI is simpler for an untrained novice to develop a “model” for solving a clinical problem. While predictive analytics have remained in the realm of data scientists and those with deep knowledge of the risks and benefits of how algorithms are developed, the same is not true for generative AI. As a result, it’s important to maintain guardrails - including undertaking traditional validation exercises, developing a process to flag high risk errors in a clinical setting, and always keeping a “human in the loop” - when implementing clinical use cases to mitigate risks from these inaccuracies. Conversely, when using LLMs and generative AI in the many non-clinical healthcare scenarios in which they can be applied, there may be a higher tolerance for riskier models that can drive automation of work.
LLMs and generative AI can generally be trusted when they are asked to perform a task that requires them to analyze data that is presented to them, focused on “reductive” tasks, in which a large amount of text is reduced or transformed to more concise, easily readable form (e.g., summarization, categorization, information extraction).
However, these tools are less reliable when they are asked to generate new text or make inferences based on their underlying knowledge base. Despite being pretrained on trillions of tokens, this still represents a subset of all human knowledge both known and unknown. The intent of pretraining is not to store facts with any fidelity, but rather to discover and encode probabilistic rules about human language. Therefore, it is high risk to rely on information that may (or may not) be encoded in the LLM, especially in healthcare, a domain in which veracity and verifiability is critical.
The following is a set of ways LLM and generative AI tools might be used for clinical use cases from low to high risk of inaccuracy. This framework specific to LLMs should be considered alongside more traditional measures of risk and value when implementing such tools. Those with high value might justify a higher risk, keeping the safeguards and validation discussed above in place.
It is clear that over time, many healthcare tasks will be automated, and even more will be augmented by AI. However, with few exceptions, the applications of LLMs and generative AI in healthcare right now are still in the feasibility and safety evaluation phase. As healthcare technology professionals, it is important to understand the risks associated with the use of these tools and to be aware of the evaluation that should be conducted before adoption in clinical practice.
In section three of this series, the team will cover how generative AI, specifically LLMs aid in documentation.
The views and opinions expressed in this content or by commenters are those of the author and do not necessarily reflect the official policy or position of HIMSS or its affiliates.