Skip to content
Snippets Groups Projects
user avatar
Alex Goodell authored
8b91dab4

OpenMedCalc: Large Language Model Agents Can Use Tools to Perform Clinical Calculation

[!NOTE]

Data from this project will be released as the ABACUS-212 (49 calculation tasks, 212 vignettes) and ABACUS-409 (10 calculation tasks, 10,000 vignettes). Though this repository can generate all of the raw data that was used in this experiment, this repo is primarily code for running the models.

Overview

This project explores the augmentation of large language models (LLMs) like ChatGPT with clinician-informed tools to improve performance on medical calculation tasks. The project addresses the limitations of LLMs in performing basic mathematical operations and their tendency to hallucinate knowledge. By integrating an open-source clinical calculation API, OpenMedCalc, with ChatGPT, the project demonstrates significant improvements in accuracy for clinical calculations. This integration allows for the execution of common clinical calculations with enhanced reliability, aiming to revolutionize medical practice by automating routine tasks and providing accurate, evidence-based calculations to clinicians.

Large language models (LLMs) such as ChatGPT have shown the ability to answer expert-level multiple-choice questions in medicine but are limited by their tendency to hallucinate knowledge and inadequacy in performing basic mathematical operations. This project explores the ability of ChatGPT to perform medical calculations, evaluating its performance across diverse clinical calculation tasks. Initial findings indicated that ChatGPT is an unreliable clinical calculator, delivering inaccurate responses in a significant number of trials.

Objectives

To address the limitations of LLMs, the project developed an open-source clinical calculation API, OpenMedCalc, which was integrated with ChatGPT. The augmented model was evaluated against standard ChatGPT using clinical vignettes in common clinical calculation tasks. The goal was to enhance the accuracy and reliability of medical calculations performed by LLMs.

Key Findings

  • The integration of OpenMedCalc with ChatGPT significantly improved the accuracy of clinical calculations.
  • The augmented model demonstrated a marked improvement in accuracy over unimproved ChatGPT.
  • The project highlights the potential of integrating machine-usable, clinician-informed tools to alleviate the reliability limitations observed in medical LLMs.

LLM Calc is a comprehensive tool designed to facilitate medical calculations using large language models (LLMs). It integrates various clinical calculators and provides a user-friendly interface for performing complex medical computations. The project aims to enhance the accuracy and efficiency of medical calculations by leveraging the power of LLMs.

Additional Models and Calculators

In this revision, we have expanded the range of models and calculators used in the project:

Models:

  • gpt4o
  • llama3_1 (provided by OpenRouter)

Calculators:

  • psiport
  • ariscat
  • cci
  • caprini
  • gad7
  • sofa
  • meldna
  • hasbled
  • wellsdvt

Features

  • Interactive Menu: Navigate through the application using a simple command-line interface.
  • Database Management: Easily rebuild and manage the database of medical calculators.
  • Vignette Generation: Automatically generate clinical vignettes for testing and demonstration purposes.
  • Configuration Viewing: View and customize the configuration settings of the application.
  • Testing Suite: Run comprehensive tests to ensure the reliability and accuracy of the calculations.

Usage

To use LLM Calc, you can execute the following commands:

Usage: llmcalc [OPTIONS] COMMAND [ARGS]...

Options:
  --install-completion          Install completion for the current shell.
  --show-completion             Show completion for the current shell, to copy it or customize the installation.
  --help                        Show this message and exit.

Commands:
  interpreter        Start an iPython interpreter in the current context.
  rebuild-database   Rebuild the database.
  test               Run tests.
  view-config        View config page 
  vignettes          Build the vignettes.
  experiment         Manage and run experiments.

Getting Started

  1. Installation: Clone the repository and install the required dependencies using poetry install. This will install a command line application called llmcalc.
  2. Configuration: Set up the environment variables and configuration files as needed. See sameple_env.txt for reference. You will need an account with OpenAI and OpenRouter. Some functionality will require a LangChain API key as wel.
  3. Execution: Use the command-line interface to interact with the application and perform calculations. To run the full workflow, you will need to run the following commands:
llmcalc rebuild-database

This takes the vignette data, calculators, arms, etc and places it into a database.

llmcalc experiment rebuild

This prepares the system to run an experiment. After, you can start an experiment with the following line. Note that "number of cases" is on a per-calculator per-arm basis, so depending on configuration, can be many cases.

llmcalc experiment new  --description "Evaluating 10 vignettes" --number-of-cases 1

This builds the experiment, including the actual vignettes/cases. A file called cases.json will be output into the data/build_database folder after this step; this contains the synthetic patients information. It will also start the experiement, running the vignettes against the calculators listed in the llmcalc/lib/config.py file. That file contains the following lines, where one can specify the configuration of the experiement.


# Calculators
self.DEFAULT_SELECTED_CALCULATORS_SLUGS = [
CalculatorSlug.nihss,
CalculatorSlug.hasbled,
CalculatorSlug.meldna,
CalculatorSlug.gad7,
CalculatorSlug.sofa,
CalculatorSlug.psiport,
CalculatorSlug.wellsdvt,
CalculatorSlug.caprini,
CalculatorSlug.cci,
CalculatorSlug.ariscat,
]

# arms
self.DEFAULT_SELECTED_ARM_SLUGS = [
ArmSlug.llama_base,
ArmSlug.llama_ci,
ArmSlug.llama_rag,
ArmSlug.llama_rag_ci,
ArmSlug.llama_omc,
ArmSlug.gpt4_base,
ArmSlug.gpt4_ci,
ArmSlug.gpt4_rag,
ArmSlug.gpt4_rag_ci,
ArmSlug.gpt4_omc,
]

Contributing

We welcome contributions from the community. Please feel free to submit issues, fork the repository, and make pull requests.

License

N/A currently.