MSAR: Identify comorbidities associated with recurrent ED and in-patient visits

Published in Journal 1, 2022

Recommended citation: LLL. (2022). "Identify comorbidities associated with recurrent ED and in-patient visits." Journal 1. 1(3). https://arxiv.org/abs/2110.13769

Background

  • Recurrent patients (RP), also known as frequent flyers, high utilizers, or super users in hospitals, are a small group; however, they impose a disproportionately high utilization of resources.

  • Two major types of reasons reported from literature: 1. mental health or substance (drug and/or alcohol) abuse 2. Some chronic diseases such as AIDS

  • Social-behavioral interventions and outpatient care could be introduced to reduce future recurrent visits.

  • Challenges:

    • no standardized criteria on identify RP
    • the lack of tools on selecting top reasons associated with recurrent visits
    • data elements may not be easily accessible

Objectives: identify RP & rank top factors associated with recurrent visits

  1. identify high utilizers / recurrent patients
  2. interpretable method to identify top-X (from on avg. 7+ from past visits) comorbidities associated with their frequent visits
  3. achieve object 1 and 2 on easily accessible data elements

Observations: Confidence-Support trade-offs

Definition of confidence and suport associated with a set with a single or combination of comorbidities: \(Confidence:=P{\{RP|comorbidities\}};\) \(Support:= P{\{comorbidities\}}.\)

We have observed confidence-support trade-offs in this problem:

  • Some comorbidities reported by literature that are highly associated with recurrent visits have relatively low support/prevalence

  • Single comorbidity has low confidence to indicate recurrent visit behavior, we therefore use a combination of comorbidities to boost confidence.

  • Increasing the number of comorbidities n will increase confidence while decreasing support.

Limitations of existing ML approaches

  1. A conventional interpretable Association rules (AR) method picks the highest confidence comorbidities combination while its support >= a pre-defined minimum support threshold

    Pros: a statistical method, interpretable

    Main drawback: potentially eliminate those high-confidence but low support conditions

  2. It is challenging for conventional ML algorithms such as XGBoost to detect high confidence, low-support comorbidity combinations

  3. It is a hard problem from ML perspective:

    3.1 no ground truth labels on important factors;

    3.2 limited feature space. Only using comorbidity (derive from ICD codes) greatly increases the usage for the algorithm, as ICD codes are easily accessible in EHR, compared to relying on hard to access information like economic status, insurance status. However, the combination of comorbidities alone has very limited power to model recurrent patients accurately, limiting the useage of modern machine learning/deep learning method in this case.

Proposed algorithm: min.-similarity association rules (MSAR)

Main Algorithm innovations in MSAR:

  • formulate the problem mathematically by
    1. defining pairs of rules that are similar (exactly 1 comorbidity difference between pairs of triples of comorbidity combinations)
    2. defining similarity between similar rules and minimizes their similarity
    3. setting up an efficient convex quadratic programming to solve the optimization
  • To balance conf-supp trade-off, we use a weighted average of confidence and support

  • Weights of confidences and supports are learned from retrospective data via minimizing MSAR rule scores on pairs of similar rules (G in the similarity graph)

Main numeric results

Proposed MSAR algorithm is validated on large EMR dataset from some U.S. hospitals with about 400K unique inpatient and ED patients.

  1. MSAR balances confidence-support trade-offs compared to conventional AR which always take the highest confidence rule.

  2. MSAR successfully selects challenging high-confidence, low-support comorbidities relates to recurrent visits

    Frequency of comorbidities from the top $25\%$ of learned MSAR rules. Drug abuse, even with low prevalence compared to other comorbidities, has been successfully identified as being in $35\%$ of those top rules for recurrent visits.

    Compare with a popular XGBoost method, MSAR rules select high-confidence, low-support comorbidities better.

  3. The consistency of MSAR is validated through cross-validation (CV)

    3.1 consistency of learned weights is high.

    3.2 consistency of outputs is high, which is also better than XGBoost.

  4. Compare with a popular XGBoost method, MSAR rules distinguishes across comorbidities better.

    Note, XGBoost ranking is obtained by Shapley values.

    Overall zero weights rate:

    Examples of comorbidities with zero weights:

Main achievements of MSAR algorithm innovations

  • It balances confidence support trade-off
  • it consistently identifies high confidence comorbidities associated with recurrent visits but low support
  • it has a general usage in selecting combination of significant (Top-X) factors

Summary of MSAR compared to other algorithms

Tips for other potential applications

  1. If in need of dealing with continuous variables, a preprocessing of thresholding to binary will be needed.
  2. Training phase is computationally costly for large feature dimension. It is not recommended for applications where inputs are raw signals (in which cases neural networks and attention mechanism are widely used).

Extended reading

A good reference for extensive material on interpertable machine learning can be found by a book written by Christoph Molnar titled Interpretable Machine Learning.

Download MSAR paper

Download Summary