Tasnim Rida
  • Home
  • Projects
  • Resume

Examining the Influence of Socioeconomic Factors on Premature Death Rates Across Racial Groups

A statistical investigation of county-level predictors of health disparities, utilizing Gradient Boosted Trees and regression to identify key social determinants of health.
Published

August 1, 2024

Examining the Influence of Socioeconomic Factors on Premature Death Rates Across Racial Groups

A statistical investigation into county-level predictors of health disparities, utilizing Gradient Boosted Trees and Linear Regression to identify key social determinants of health.

August 2024
How do social determinants of health shape premature death rates across racial groups in the United States?
Research Overview

This project examines how socioeconomic factors influence premature death rates across U.S. counties and racial groups. Leveraging data from the County Health Rankings data set, the research quantifies the influence of systemic factors on premature mortality rates at the county level.

Premature deaths by county in the United States
Relationship between socioeconomic factors and premature deaths
Methodology

The analytical framework employed statistical modeling in R to evaluate relationships between mortality and various Social Determinants of Health. Based on exploratory data analysis, two distinct modeling approaches were compared:

  • Multiple Linear Regression: Utilized to establish baseline correlations and effect sizes.
  • Gradient Boosted Trees: Leveraged to capture non-linear relationships and high-dimensional interactions between variables.
Key Findings
Socioeconomic Impact

Higher income inequality and unemployment are linked to more years of premature death. Higher high school completion rates are linked to fewer premature deaths.

Variable Importance

Multiple linear regression identified AIAN population percentage as the most important variable, likely due to the model's sensitivity to this group's small population percentages across the U.S.

Model Comparison

Cross-validation showed the gradient boosted tree captured relationships in the data that multiple linear regression did not.

Research poster or presentation image

This research was conducted as part of the Carnegie Mellon Summer Undergraduate Research Experience (SURE) program in partnership with UnitedHealth Group.