Examining the Influence of Socioeconomic Factors on Premature Death Rates Across Racial Groups
Examining the Influence of Socioeconomic Factors on Premature Death Rates Across Racial Groups
A statistical investigation into county-level predictors of health disparities, utilizing Gradient Boosted Trees and Linear Regression to identify key social determinants of health.
This project examines how socioeconomic factors influence premature death rates across U.S. counties and racial groups. Leveraging data from the County Health Rankings data set, the research quantifies the influence of systemic factors on premature mortality rates at the county level.
The analytical framework employed statistical modeling in R to evaluate relationships between mortality and various Social Determinants of Health. Based on exploratory data analysis, two distinct modeling approaches were compared:
- Multiple Linear Regression: Utilized to establish baseline correlations and effect sizes.
- Gradient Boosted Trees: Leveraged to capture non-linear relationships and high-dimensional interactions between variables.
Key Findings
Higher income inequality and unemployment are linked to more years of premature death. Higher high school completion rates are linked to fewer premature deaths.
Multiple linear regression identified AIAN population percentage as the most important variable, likely due to the model's sensitivity to this group's small population percentages across the U.S.
Cross-validation showed the gradient boosted tree captured relationships in the data that multiple linear regression did not.
This research was conducted as part of the Carnegie Mellon Summer Undergraduate Research Experience (SURE) program in partnership with UnitedHealth Group.