MSAR: Identify comorbidities associated with recurrent ED and in-patient visits
Published in Journal 1, 2022
Recommended citation: LLL. (2022). "Identify comorbidities associated with recurrent ED and in-patient visits." Journal 1. 1(3). https://arxiv.org/abs/2110.13769
Background
Recurrent patients (RP), also known as frequent flyers, high utilizers, or super users in hospitals, are a small group; however, they impose a disproportionately high utilization of resources.
Two major types of reasons reported from literature: 1. mental health or substance (drug and/or alcohol) abuse 2. Some chronic diseases such as AIDS
Social-behavioral interventions and outpatient care could be introduced to reduce future recurrent visits.
Challenges:
- no standardized criteria on identify RP
- the lack of tools on selecting top reasons associated with recurrent visits
- data elements may not be easily accessible
Objectives: identify RP & rank top factors associated with recurrent visits
- identify high utilizers / recurrent patients
- interpretable method to identify top-X (from on avg. 7+ from past visits) comorbidities associated with their frequent visits
- achieve object 1 and 2 on easily accessible data elements
Observations: Confidence-Support trade-offs
Definition of confidence and suport associated with a set with a single or combination of comorbidities: \(Confidence:=P{\{RP|comorbidities\}};\) \(Support:= P{\{comorbidities\}}.\)
We have observed confidence-support trade-offs in this problem:
- Some comorbidities reported by literature that are highly associated with recurrent visits have relatively low support/prevalence
Single comorbidity has low confidence to indicate recurrent visit behavior, we therefore use a combination of comorbidities to boost confidence.
Increasing the number of comorbidities n will increase confidence while decreasing support.
Limitations of existing ML approaches
A conventional interpretable Association rules (AR) method picks the highest confidence comorbidities combination while its support >= a pre-defined minimum support threshold
Pros: a statistical method, interpretable
Main drawback: potentially eliminate those high-confidence but low support conditions
It is challenging for conventional ML algorithms such as XGBoost to detect high confidence, low-support comorbidity combinations
It is a hard problem from ML perspective:
3.1 no ground truth labels on important factors;
3.2 limited feature space. Only using comorbidity (derive from ICD codes) greatly increases the usage for the algorithm, as ICD codes are easily accessible in EHR, compared to relying on hard to access information like economic status, insurance status. However, the combination of comorbidities alone has very limited power to model recurrent patients accurately, limiting the useage of modern machine learning/deep learning method in this case.
Proposed algorithm: min.-similarity association rules (MSAR)
Main Algorithm innovations in MSAR:
- formulate the problem mathematically by
- defining pairs of rules that are similar (exactly 1 comorbidity difference between pairs of triples of comorbidity combinations)
- defining similarity between similar rules and minimizes their similarity
- setting up an efficient convex quadratic programming to solve the optimization
To balance conf-supp trade-off, we use a weighted average of confidence and support
- Weights of confidences and supports are learned from retrospective data via minimizing MSAR rule scores on pairs of similar rules (G in the similarity graph)
Main numeric results
Proposed MSAR algorithm is validated on large EMR dataset from some U.S. hospitals with about 400K unique inpatient and ED patients.
MSAR balances confidence-support trade-offs compared to conventional AR which always take the highest confidence rule.
MSAR successfully selects challenging high-confidence, low-support comorbidities relates to recurrent visits
Frequency of comorbidities from the top $25\%$ of learned MSAR rules. Drug abuse, even with low prevalence compared to other comorbidities, has been successfully identified as being in $35\%$ of those top rules for recurrent visits.
Compare with a popular XGBoost method, MSAR rules select high-confidence, low-support comorbidities better.
The consistency of MSAR is validated through cross-validation (CV)
3.1 consistency of learned weights is high.
3.2 consistency of outputs is high, which is also better than XGBoost.
Compare with a popular XGBoost method, MSAR rules distinguishes across comorbidities better.
Note, XGBoost ranking is obtained by Shapley values.
Overall zero weights rate:
Examples of comorbidities with zero weights:
Main achievements of MSAR algorithm innovations
- It balances confidence support trade-off
- it consistently identifies high confidence comorbidities associated with recurrent visits but low support
- it has a general usage in selecting combination of significant (Top-X) factors
Summary of MSAR compared to other algorithms
Tips for other potential applications
- If in need of dealing with continuous variables, a preprocessing of thresholding to binary will be needed.
- Training phase is computationally costly for large feature dimension. It is not recommended for applications where inputs are raw signals (in which cases neural networks and attention mechanism are widely used).
Extended reading
A good reference for extensive material on interpertable machine learning can be found by a book written by Christoph Molnar titled Interpretable Machine Learning.