Using Self-Reported Data to Segment Older Adult Populations with Complex Care Needs

Background: Tailored care management requires effectively segmenting heterogeneous populations into actionable subgroups. Using patient reported data may help identify groups with care needs not revealed in traditional clinical data. Methods: We conducted retrospective segmentation analyses of 9,617 Kaiser Permanente Colorado members age 65 or older at risk for high utilization due to advanced illness and geriatric issues who had completed a Medicare Health Risk Assessment (HRA) between 2014 and 2017. We separately applied clustering methods and latent class analyses (LCA) to HRA variables to identify groups of individuals with actionable profiles that may inform care management. HRA variables reflected self-reported quality of life, mood, activities of daily living (ADL), urinary incontinence, falls, living situation, isolation, financial constraints, and advance directives. We described groups by demographic, utilization, and clinical characteristics. Results: Cluster analyses produced a 14-cluster solution and LCA produced an 8-class solution reflecting groups with identifiable care needs. Example groups included: frail individuals with memory impairment less likely to live independently, those with poor physical and mental well-being and ADL limitations, those with ADL limitations but good mental and physical well-being, and those with few health or other limitations differentiated by age, presence or absence of a documented advance directive, and tobacco use. Conclusions: Segmenting populations with complex care needs into meaningful subgroups can inform tailored care management. We found groups produced through cluster methods to be more intuitive, but both methods produced actionable information. Applying these methods to patient-reported data may make care more efficient and patient-centered.


Context
Population heterogeneity makes it difficult to design and implement effective care management interventions for individuals with complex care needs [1][2][3].The most effective programs use multiple modalities to target specific needs or specific subpopulations based on age, diagnoses, or other characteristics [4,5].However, there is no common or systematic approach to identifying relevant subpopulations.
To address this heterogeneity, the National Academy of Medicine (NAM) has proposed a 'starter' typology differentiating 6 groups characterized by age and medical, social, and behavioral needs [6].Likewise, the American Diabetes Association provides a three-tiered model based on morbidity burden to guide treatment intensity for individuals with diabetes [7].However, even these typologies are fairly broad.Further, within any delivery system, subgroups of individuals with complex needs may differ.The subpopulations within the Veterans Administration population are likely quite different from those within an urban safety net population, and both of those are likely different from groups within other delivery systems and settings.Delivery systems interested in developing patient-centered care management programs need to understand the characteristics of the subpopulations they serve.
Traditional electronic data such as diagnostic codes and laboratory values may not capture essential information on factors that drive care needs, including function, personal preferences, and social resources, that can only be reported by individuals themselves.Identifying and characterizing complex needs subpopulations requires patient-reported information to help match care delivery to personal needs.Although newer data from electronic health records (EHRs) such as symptom assessments and ICD-10 codes that capture functional status can improve our ability to identify complex needs subpopulations, subjective information can add a level of specificity unlikely to be captured with objective coding.
Using the Medicare Health Risk Assessment, we explored two data-driven methods to segment a heterogeneous population of older adults with potentially complex care needs into clinically meaningful subgroups using self-reported information.The primary purpose of the analysis was to demonstrate how segmentation methods could be applied to patient-reported data, not to generate evidence to inform a taxonomy of subpopulations of older adults.The goals of the segmentation process were 1) to demonstrate the ability to identify groups with unique needs that could inform development of specific care management programs, and 2) to compare the two analytic methods for application to large, diverse populations.

Population and Setting
The population consisted of Kaiser Permanente Colorado (KPCO) members age 65 and older as of 05/01/2016 who were classified as having advanced illness as of 07/28/2017.Advanced illness was defined as individuals with complex or multiple chronic conditions and geriatric syndromes who are likely to have frequent hospital care needs.Cohort members also must have completed at least one Medicare Health Risk Assessment (HRA) between 05/01/2014 and 04/30/2017.If more than one HRA was administered in this time frame, the latest one was used in analyses.KPCO is a nonprofit integrated delivery system in which most Medicare beneficiaries are enrolled in a Medicare Advantage plan.KPCO's Institutional Review Board reviewed the project protocol and determined that it did not meet criteria for human subjects' research and could be reported as operational or quality improvement methods.

Data Sources and Variables
Input variables for the segmentation analyses were patient-reported variables drawn from the Medicare HRA, a component of the Medicare Annual Wellness Visit designed to identify patient-reported modifiable risk factors and health needs [8].Required elements include self-assessment of health status, psychosocial risks, depression, behavioral risks, and Activities of Daily Living and Instrumental Activities of Daily Living [9].Care delivery systems can add additional questions.As illustrated in Table 1, KPCO elected to add questions reflecting other domains.HRAs are completed at or prior to the visit and are addressed at the visit or as part of population care management.HRA responses are stored in extractable fields in the EHR.We dichotomized responses to the HRA questions based on whether the response was likely to prompt a clinical action.For example, if someone reported a fall within the preceding year, this might lead to a referral to physical therapy; or if someone reported a positive response to a depression screening question, that might prompt a referral to mental health services.On the theory that a missing response would not trigger action, we included missing responses as non-trigger responses.
Iterative variable simplification is a key element of exploratory cluster analyses.Prior to running segmentation analyses, we examined all HRA items and removed those less likely to define actionable subpopulations based on clinical judgment (e.g., daily servings of fruits and vegetables), and those that were strongly associated with others (e.g., general health and physical health or difficulty dressing, toileting, bathing, and getting in and out of bed/chairs).After examining initial clusters, we removed additional inputs that did not contribute to defining clusters, such as those with low prevalence (e.g., poor quality of life or anger).Variables and domains that are included in the KPCO HRA but were eliminated during the variable reduction process include quality of life, physical health, anger, difficulty bathing, difficulty toileting, difficulty getting in and out of bed/chairs, difficulty managing money, difficulty with household activities, eating fewer than 2 meals/day, having enough money for food, and alcohol use.Table 1 lists all initially considered variables, noting which were included in the final segmentation analyses along with dichotomized 'trigger' responses.
After segmentation based on patient-reported HRA variables, additional clinical, care delivery, and demographic variables were used to describe population segments.These variables were drawn from the HRA and from KPCO's Virtual Data Warehouse (VDW), a quality-controlled data repository including health care utilization, diagnoses, demographics, and enrollment.Utilization variables included emergency department (ED) visits, inpatient admissions, and observation admissions.We also described population segments by demographic (age, gender, education, marital status, independent living) and clinical (Quan Elixhauser score, [10] cancer history) characteristics.These variables were selected based on their potential to explain the clusters that had been derived from the patient-reported data, but that could have been less useful in segmenting this particular population if included as input variables.For example, adding ED utilization as an input variable could result in clusters of individuals with higher and lower ED use, but might not capture the difference between ED use for pain vs. ED use for falls.
Variable selection (both input variables and descriptive variables) for clustering methods is highly dependent on the population, available data, and planned application of the results.Because clustering is a method for exploring data and populations rather than generating evidence, variable selection is highly iterative and can be revised as more or less actionable clusters are identified.

Analytic Approach
We used both cluster and latent class analyses to identify relatively homogeneous subgroups within the heterogeneous cohort of older adults.Input variables listed in Table 1 were used in these analyses.

Cluster analysis
Cluster analysis refers to classification methods used for discovering groups or "clusters" of highly similar entities within data sets so that observations within one group are as like each other as possible and as dissimilar as possible to observations in all other groups.Due to the large size of our data set, we used a combined hierarchical and partitive method of generating clusters.We first used k-means clustering (PROC FASTCLUS) to generate a large number of primary clusters and saved the centroids; then we used hierarchical clustering (Ward's method) on these centroids to determine the recommended number of clusters based on the Cubic Clustering Criterion (CCC) and the pseudo-F statistic (PSF).The recommended number of clusters was then specified as seeds in the k-means clustering to group all the observations.The cluster analysis was implemented using SAS™ software version 9.4, SAS Institute, Cary, NC, USA.

Latent class analysis (LCA)
Latent class analysis is a method to identify underlying latent (unobserved) classes (LC) of people using individual level observed variables.Each LC represents a subgroup of individuals characterized by a pattern of responses on a set of categorical input variables.LCA was implemented using poLCA package in R with the same dichotomous variables as in Cluster analysis [11].We examined multiple LCA models with 1-10 class solutions, and the best fitting model was determined with the smallest Bayesian information criterion (BIC) and clinical interpretability.The chosen algorithm uses a finite mixture model and finds maximum likelihood estimates of model parameters with expectation-maximization and Newton-Raphson methods [11].
Finally, we evaluated the results of the cluster and LCA analyses.We expected that different segmentation methods would yield different groups and different numbers of groups and could potentially lead to different interpretations [12].The goal of both methods in this context was to identify clinically or operationally meaningful population segments, and to see whether the two methods identified common subgroups.

Findings
Of 20,316 older adults classified as having advanced illness and potential complex care needs, 9,617 completed at least one HRA during the project period and comprised the analytic cohort for the segmentation analyses.HRA completers were marginally older and healthier than non-completers and had a longer enrollment duration (data not shown).Characteristics of the analytic cohort are provided in the first column of Table 2.
In the cluster analysis, we selected a 14-cluster solution as one that was manageable, corresponded to peaks in the CCC and PSF values, and seemed likely to segment the cohort into clinically actionable subgroups.Table 2 provides characteristics of the overall analytic cohort and selected illustrative clusters and the full 14-cluster solution is presented in Appendix A. In this application, the analysis identified smaller clusters of individuals who reported poor physical and/or emotional health with or without functional limitations (Clusters A and D), as well as a larger cluster of individuals reporting better health status (Cluster B).The analysis also identified clusters characterized by discrete health needs such as reported problems with pain, balance and walking, and inactivity (Cluster C), and by pain and poor sleep quality without inactivity (Cluster E).
Descriptions of clusters by morbidity and utilization during the project period generally reflected the self-reported data, with Cluster A characterized by high morbidity burden and hospital utilization, Cluster D by higher morbidity burden and emergency service use, and Cluster B by lower morbidity and utilization.Cluster C, in which HRA responses highlighted pain and inactivity, was also characterized by higher hospital utilization.
The LCA analysis produced an 8-class solution reflecting subgroups with different patterns of variable combinations (Figure 1, Table 3).In these illustrative subgroups, class 2 demonstrates consistently low probabilities of trigger responses to the patient-reported variables-indicating a class with lower morbidity and higher function, while class 6 demonstrates higher probabilities of trigger responses indicating a subgroup of higher morbidity and lower functioning.As with the cluster analyses, the LCA analysis revealed a large subgroup of relatively lower morbidity and several smaller subgroups of individuals reporting either global or specific health concerns.
In some cases, clustering analysis and LCA seemed to identify common subgroups.For instance, 91 percent of cluster D was also in class 7; both the cluster and latent class were characterized by poor physical and mental health.In other cases, individuals in a given cluster had approximately equal representation in two or more latent classes; cluster E (pain and poor sleep quality) was associated with both latent class 1 and latent class 5, neither of which had high probabilities of pain or poor sleep quality.

Major Themes
The NAM describes the first key requirement in caring for high needs patients as segmenting patients based on factors that drive health care [6].This application of cluster and latent class analyses illustrates that both methods can be used to segment heterogeneous populations into clinically meaningful subgroups.Further, when these methods are applied to systematically collected patient-reported data, they may produce subgroups that better capture subjective care needs.Subjective information can supplement traditional administrative data on utilization and diagnoses to identify subgroups that are clinically actionable and can better inform clinical care management than traditional administrative data alone.Some segments elicited through LCA analyses appeared to have similar characteristics to those created through clusters.Although both methods can be used to segment heterogeneous clinical populations, cluster analyses create subgroups characterized by more ' all or nothing' categories than LCA analyses and may be more clinically interpretable and exhaustive in finding groups.Alternatively, LCA identified primary subpopulations using fewer groups, and groups could be easily represented graphically (Figure 3).Both methods require iterative interpretation to develop a meaningful set of subgroups, and neither method will produce subgroups that are all actionable.It is possible that both methods may perform differently in different data sets.Other approaches such as predictive modeling may also be useful for segmenting complex populations using large clinical data sets [13].Additional comparisons between these and other methods may help delineate which perform best in specific settings [12].
The Medicare HRA is designed to help clinicians address patient-reported risks for preventable adverse outcomes.Although the HRA is most commonly applied at the point of care, if data are systematically collected, representative, and stored in extractable formats, they can be used to inform program development, population health, and outcomes research [14].Although content collected through patient-reported outcomes may duplicate content obtainable through more traditional clinical data such as ICD codes, ICD codes alone are unlikely to capture subjective responses to questions about pain, loneliness, and independent activities of daily living (for example).In this project, HRA data revealed meaningful subgroups that might not have been obvious from other electronic clinical data and could inform specific clinical interventions.Important differentiators included function, falls, perceived health status, emotional well-being, pain, and presence or absence of an advance directive.Two large subgroups comprised relatively healthy individuals who could benefit from watchful waiting and routine preventive care plus (for one group) life care planning.Much smaller subgroups could be targeted for more intensive and tailored care management.The size of these subgroups can inform resource allocation within delivery systems.Utilization and cost of care are often primary concerns for patients complex care needs, their clinicians, and delivery systems.However, using utilization as a target criterion for care management can miss patients who report high needs but may not (yet) be using significant resources.For example, all our clusters had relatively equal proportions of patients enrolled in an internal utilization-based care management program.This suggests that utilization does not identify all individuals with care needs and that using patient-reported data may be able to identify individuals at risk prior to incurring higher costs of care. 1 Bold shading indicates cluster has a proportion of the input variable that is greater than the 95% CI for the population average and the 1 st or 2 nd highest proportion of all clusters.Italics shading indicates the cluster has a proportion of the input variable that is less than the 95% CI for the population average and the 1 st or 2 nd lowest proportion of all clusters. 2Row percentages this row only.Row percentages to do not add to 100% as these are selected example clusters.All other percentages in the table reflect proportions of columns.Italics shading indicates the cluster has a proportion of the input variable that is less than the 95% CI for the population average and the 1 st or 2 nd lowest proportion of all clusters.

Limitations
This project was designed to apply exploratory segmentation methods to systematically collected, patient-reported data.It was not designed to generate evidence on caring for specific subgroups.The analytic cohort was neither representative of the KPCO Medicare population nor of Medicare beneficiaries elsewhere, but rather reflected a convenience sample for whom HRA data were available.Therefore, the specific subgroups illustrate differences within the analytic cohort, but are not themselves generalizable.The characteristics of subgroups may not apply to other populations.In addition, the KPCO HRA is not a comprehensive assessment of all complex needs, although it addresses essential domains that predict care needs and quality of life.

Conclusions
The value of segmentation methods depends on the quality and representativeness of the input data.
Using patient-reported data to inform population-level care design and delivery will require a cultural and resource shift towards prioritizing patient-reported data collection and use.Segmentation methods can be used alone or combined with predictive models to identify clinically actionable subgroups and inform care for heterogeneous populations with substantial and varied care needs [15].

Table 1 :
Potential Cluster/LCA Inputs: HRA Items and Trigger Responses.
(1) score of 3 or higher: Not at all (0), Several days(1), More than half the days (2), Nearly every day Sum score of <3: Not at all (0), Several days(1), More than half the days (2), Nearly every day GAD-2* Feeling anxious, nervous or on edge Not being able to stop or control worrying Sum score of 3 or higher: Not at all (0), Several days (1), More than half the days (2), Nearly every day Sum score of <3: Not at all (0), Several days (1), More than half the days (2), Nearly every day Angry In the past 7 days, how often did you feel angry?* Because of a health or physical problem, do you have any difficulty with dressing without help or special equipment?Need help or special equipment, Do myself with some difficulty Do myself with no difficulty Toilet Because of a health or physical problem, do you have any difficulty with using the toilet without help or special equipment?Need help or special equipment, Do myself with some difficulty Do myself with no difficulty (Contd.)
Bayliss et al: Using Self-Reported Data to Segment Older Adult Populations with Complex Care Needs Art. 12, page 9 of 11 Bold shading indicates cluster has a proportion of the input variable that is greater than the 95% CI for the population average and the 1 st or 2 nd highest proportion of all clusters.