Measuring obesity prevalence across geographic areas should account for environmental and socioeconomic factors that contribute to spatial autocorrelation, the dependency of values in estimates across neighboring areas, to mitigate the bias in measures and risk of type I errors in hypothesis testing. Dependency among observations across geographic areas violates statistical independence assumptions and may result in biased estimates. Empirical Bayes (EB) estimators reduce the variability of estimates with spatial autocorrelation, which limits the overall mean square-error and controls for sample bias.

Using the Colorado Body Mass Index (BMI) Monitoring System, we modeled the spatial autocorrelation of adult (≥ 18 years old) obesity (BMI ≥ 30 kg m^{2}) measurements using patient-level electronic health record data from encounters between January 1, 2009, and December 31, 2011. Obesity prevalence was estimated among census tracts with >=10 observations in Denver County census tracts during the study period. We calculated the Moran’s I statistic to test for spatial autocorrelation across census tracts, and mapped crude and EB obesity prevalence across geographic areas.

In Denver County, there were 143 census tracts with 10 or more observations, representing a total of 97,710 adults with a valid BMI. The crude obesity prevalence for adults in Denver County was 29.8 percent (95% CI 28.4–31.1%) and ranged from 12.8 to 45.2 percent across individual census tracts. EB obesity prevalence was 30.2 percent (95% CI 28.9–31.5%) and ranged from 15.3 to 44.3 percent across census tracts. Statistical tests using the Moran’s I statistic suggest adult obesity prevalence in Denver County was distributed in a non-random pattern. Clusters of EB obesity estimates were highly significant (alpha=0.05) in neighboring census tracts. Concentrations of obesity estimates were primarily in the west and north in Denver County.

Statistical tests reveal adult obesity prevalence exhibit spatial autocorrelation in Denver County at the census tract level. EB estimates for obesity prevalence can be used to control for spatial autocorrelation between neighboring census tracts and may produce less biased estimates of obesity prevalence.

Adult obesity prevalence, defined as the total number of individuals 18 years of age or older with a body mass index (BMI) of greater than or equal to 30 kg/m^{2} among the overall adult population at risk, remains an important public health metric. The Centers for Disease Control and Prevention (CDC) estimate obesity prevalence in the United States to be 35.1 percent for adults 20 years old and older from 2010-2012 [

When examined at the census tract level, particularly census tracts of varying population levels, obesity prevalence can be distributed in a non-random pattern. Such clustering of observations within census tracts can induce correlation and impede the reliability of statistical tests, increasing the risk of type 1 error [

The primary objective of this paper is to use Empirical Bayes (EB) estimates to reduce the amount of spatial autocorrelation in obesity prevalence estimates with varying sample sizes across geographic areas. We will show that EB estimates can limit the overall mean square-error across geographies where occurrence of obesity prevalence are measured. We will compare crude obesity prevalence estimates to EB estimates across geographic areas. Finally, we will discuss the strengths and limitations of EB estimates for measuring obesity prevalence across census tracts.

We estimated adult crude and EB obesity prevalence estimates using the Colorado BMI Monitoring System, an electronic health record (EHR) based network comprised of multiple healthcare providers with patients residing in Denver County, Colorado [

Objectively measured heights and weights obtained during routine care were extracted from the EHRs of each individual site, along with other clinical and demographic characteristics including age, race, ethnicity and gender, geocoded location based on residence address, and insurance coverage at the time of the encounter. Encounters were de-duplicated within each site, and those without measures of height and weight were removed. Data was securely transferred to the Colorado Department of Public Health and Environment (CDPHE), then combined across sites. CDPHE geocoded home addresses to the census tract level and removed addresses from the data. CDPHE applied the CDC BMI SAS_{®} macro [

Unlike traditional Bayesian estimates, for EB estimates (also known as Stein estimation, penalized estimation and random-coefficient or ridged regression), [

The prevalence of obesity for a given geographic area, defined as the number of obese individuals in a geographic area divided by the total population at risk of obesity in the same geographic area, can lead to instability in the variance of obesity prevalence across geographic areas. The variance of the obesity prevalence estimate depends inversely on population at risk; i.e., as the population decreases, the variance of the expected value of the obesity prevalence estimate increases. Smaller sample sizes within geographic areas have larger variance compared to larger sample sizes. EB estimates use “prior” information to reduce the variability (from the overall mean (global mean) prevalence estimate) of the prevalence estimate across geographies, leveraging “priors” from the global mean prevalence estimate across all census tracts.

EB estimates reduce variability using the inverse function of variance [

For this analysis, we used the adult patient population from the Colorado BMI Monitoring System with a most recent valid BMI measure between January 1, 2009, and December 31, 2011, and a geocode based on residence address in Denver County, Colorado [^{2} in each census tract. We defined coverage as the number of adults in a given census tract with a valid BMI from the Colorado BMI Monitoring System divided by the estimated total number of adults in the census tract from the United States Census 2010 population estimates. We calculated the crude obesity prevalence for each census tract by dividing the total number of obese adults by the total number of adults with a valid BMI in each census tract. Obesity prevalence was estimated among census tracts with >=10 observations in Denver County census tracts during the study period [

We calculated the EB estimate of the obesity prevalence across census tracts in Denver County. We utilized a spatially-naïve EB estimate to reduce the variability of extreme values across census tracts with the global mean estimate. We employed the Queen’s contiguity matrix [

We compared the crude obesity prevalence to the EB obesity prevalence graphically, and statistically using a one-sample t-test. We generated maps of crude and EB obesity prevalence estimates for Denver County. We calculated the Moran’s I statistic [

Data aggregation of Colorado BMI Monitoring System data was performed using SAS^{®} 9.2. Geocoded addresses were created using Tele Atlas, U.S. Census, Environmental Systems Research Institute (ESRI) (Pop2010 fields) and Bowes Centrus^{®} Desktop v6.01, utilizing the TomTom^{©} address point database. Coverage and obesity estimates, statistical tests and maps were calculated and generated using GeoDa™ 1.4.6.

Table ^{-2}) in the 2009–2011 study period. Coverage of the BMI Monitoring System population in Denver County census tracts ranged from 3.7percent to 60.2 percent.

Summary of BMI Monitoring System for Denver County, CO, Adult Population 2009-2011

DENVER COUNTY | |
---|---|

Colorado BMI Monitoring System Population with valid BMI >= 18 years old | 97,710 |

U.S. Census 2010 Population Estimates | 471,392 |

Estimated Coverage* | 0.2073 |

Range of Coverage Across Individual Census Tracts | (0.0373, 0.6021) |

Total Obese (BMI >= 30 kg/m^{2} |
31,275 |

*Coverage defined as Colorado BMI Monitoring System adult population with valid BMI divided by U.S. Census 2010 Population >=18 years old

Table

Adult Obesity Prevalence Estimates for Denver County, CO, 2009–2011

CRUDE OBESITY PREVALENCE (%) | EB OBESITY PREVALENCE (%) | DIFFERENCE BETWEEN MEANS (TWO MEANS, ONE-SAMPLE T-TEST) | |
---|---|---|---|

Mean (se) | 29.8** (0.09) | 30.2** (0.08) | -.00046*** (0.0003) |

95% CI | (28.4, 31.2) | (29.0, 31.5) | (-0.0051, -0.0041) |

Range | (12.8, 45.2) | (15.3, 44.3) | |

Moran’s I Statistic | 0.7142*** | 0.7307*** |

*, **, *** denotes significance at the 90^{th}, 95^{th}, and 99^{th} percentile, respectively

Figure

Difference (Empirical Bayes – Crude) Obesity Prevalence Estimates by Census-Tract level BMI Monitoring Population in Denver County, CO, 2009–2011

Maps of obesity prevalence by census tract in Denver County are shown in Figure

Denver County Obesity Prevalence Estimates by Census Tract, 2009–2011

Data points of the Moran’s I statistic for individual census tracts for crude and EB obesity prevalence estimates and spatial lag (average across neighboring census tracts) of crude and EB obesity prevalence estimates for Denver County are plotted linearly in the Moran scatterplots in Figure

Moran Scatterplot of I-Statistic Comparisons of Obesity Prevalence Estimates

This paper presented the use of EB estimation to reduce spatial autocorrelation in obesity prevalence estimates across small geographies with different sample sizes. We estimated adult crude and EB obesity prevalence estimates in Denver County, Colorado using EHR-derived BMI data from the Colorado BMI Monitoring System. We compared and quantified the differences in crude and EB estimates, and showed that EB estimates can limit the errors in the residual estimates across geographies where occurrence of obesity prevalence are measured.

The crude adult obesity prevalence estimate derived from the Colorado BMI Monitoring System for Denver County was 29.8 percent; the EB obesity prevalence estimate was 30.2 percent. The difference between the two obesity prevalence estimates was statistically significant, revealing EB obesity prevalence for adults were non-random in Denver County at the census tract level. Clusters of EB obesity were highly significant (alpha<=0.05) in neighboring census tracts of high obesity prevalence. The Moran’s I statistic for the EB obesity prevalence estimate showed that a high degree of spatial autocorrelation exists within Denver County quantifying the degree to which obesity prevalence in neighboring census tracts were correlated across Denver County. The results suggest autocorrelation of obesity prevalence at the census tract level exists and should be accounted for to limit bias in calculated obesity estimates.

While comparisons of estimates derived from the BRFSS and Colorado BMI Monitoring System cannot be made directly due to sample size and data collection methods, assessing the reasonableness of obesity prevalence estimates derived from the Colorado BMI Monitoring System with an established alternative is important to validate this novel approach. Estimates of obesity prevalence from the 2009–2010 BRFSS adult obesity estimates for Denver County (19.6%; 95% CI [16.8–22.4]), [

Additional analyses can be conducted to further identify spatial autocorrelation within the BMI Monitoring System. EB estimates help identify if prevalence estimates across geographic areas contain spatial autocorrelation and whether estimates are distributed at random or in a non-random pattern. Demographic-specific obesity prevalence estimates and associated Moran’s I statistics can be compared to determine which particular demographic strata (e.g., age groups, gender, race and ethnicity) may be contributing more or less to spatial autocorrelation across census tracts for a given geographic area. Socioeconomic status (SES) and environmental data can be modeled at the census tract level to further determine the extent of autocorrelation due to these additional variables. Several studies have found SES and environmental exposures to explain large portions of variation in obesity prevalence across census tracts [

The Moran’s I statistic can be employed in studies as a useful tool for estimating correlations across census tracts. If neighboring census tracts are highly correlated but not accounted for, obesity prevalence estimates may be incorrect. Policy decisions and community-level inventions may be made from inaccurate estimates, which can in turn hinder the impact of such public health efforts. Public health entities and community policy makers can use EB estimation and the Moran’s I statistic to infer variability of obesity prevalence, as well as SES and environmental exposures within clusters of high or low obesity prevalence, that may be correlated with obesity.

There are several limitations to the use of EB estimates to calculate obesity prevalence, and to the Colorado BMI Monitoring System for measuring obesity prevalence over a large population and geographic area. The BMI Monitoring System does not employ a patient-master index for de-duplicating patients across data-contributing providers. Rather, patients having insurance coverage for another data-contributing site at the time of most-recent height and weight measure were reallocated to that site.

Weighting obesity prevalence estimates at the census tract level by the global mean obesity prevalence makes understanding the EB estimate difficult and not necessarily accessible for the public, a key aim for the Colorado BMI Monitoring System. Conversely public consumers of obesity prevalence estimates may be more interested in the accuracy of the estimates themselves (i.e., the accuracy of obesity prevalence estimates relative to the “true” obesity prevalence) and less on the derivation of the estimates themselves. EB estimates do provide an estimate of obesity prevalence with reduced spatial autocorrelation for modeling the association between obesity SES and environmental risk factors across neighboring geographic areas, reducing bias in estimates and interpretation of correlations [

Other measures of spatial autocorrelation were not considered in this study including autoreggresive parameter specification or simultaneous autoregressive (SAR) modeling, [

EB estimates of obesity prevalence can reduce bias of estimates across geographic areas with different sample sizes using the data within the sample to generate prior estimates, providing estimates of obesity prevalence that are less prone to bias due to spatial autocorrelation. EB estimates can help researchers discern whether prevalence estimates are distributed across geographies in non-random patterns. EHR data can provide a rich source of information to measure disease prevalence across populations and geographies. Additional analyses of demographic, SES and environmental data can further define spatial variance and autocorrelation across census tracts.