[CDC Disease Data Preparation](flow_zone:puo7jcP) calls and modifies data from the Centers for Disease Control and Prevention (CDC) published by [PLACES](https://www.cdc.gov/places/index.html) database. 
This dataset contains four categories of model-based census tract-level estimates for chronic disease-related health outcomes, preventive services use,  health risk behaviors, and overall health status. These estimates can be used to identify emerging health problems and to help develop and carry out effective, targeted public health prevention activities. [CDC metadata](dataset:cdc_me) file provides a description of the different categories for each tract with the relevant FIPS, State, and County code.

![cdcprep.png](gFCuJPReUlte)

 - Python scripts [compute_CDC_disease](recipe:compute_CDC_disease), [compute_cdc_disease_county](recipe:compute_cdc_disease_county) access tract and county level datasets respectively through the [Socrata Open Data API (SODA)](https://dev.socrata.com/foundry/chronicdata.cdc.gov/cwsq-ngmh). Data released in 2022 provide information for the records during 2020 that we use in this solution. The original values consist of the Percentages of the Disease Values listed under ```Health Reason``` in the [cdc_disease dataset](dataset:cdc_di).
 
 - [Pivot](recipe:compute_cdc_disease_by_Short_Text) and [Distinct](recipe:compute_cdc_disease_health_outcomes) recipes distinguish only health outcomes that we use for data analytics. 
 - To generate the percentile values that show the relative disease prevalence across the U.S, we use a [```pandas.DataFrame```](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html)  function that ranks the percentage values and generates quantiles with four significant digits in the python recipe [compute_cdc_disease_percentage_percentile_tract](recipe:compute_cdc_disease_percentage_percentile_tract). 
 
 - All the information about percentage and percentile disease prevalence for each chronic disease across the U.S. tracts. is shown in  [cdc_disease_health_outcomes_percentiles](dataset:cdc_disease_health_outcomes_percentiles).
 
 
- Recipes [compute_cdc_disease_county_filtered](recipe:compute_cdc_disease_county_filtered), [compute_cdc_healthoutcome_percent_county](recipe:compute_cdc_healthoutcome_percent_county),[compute_cdc_county_by_disease](recipe:compute_cdc_county_by_disease),[compute_cdc_disease_percentage_county](recipe:compute_cdc_disease_percentage_county),[compute_cdc_disease_percentage_percentile_county](recipe:compute_cdc_disease_percentage_percentile_county), apply the same processing steps for generating percentile values at a county level that are used by the [social determinants of health webapp](web_app:wRmlvsW).

- Dataset [cdc_disease_prepared](dataset:cdc_disease_prepared) shows the same information about the disease percentages as the [cdc_disease_health_outcomes_percentiles](dataset:cdc_disease_health_outcomes_percentiles) but in a [long format](https://knowledge.dataiku.com/latest/courses/visual-recipes/pivot/pivot-values.html) used for graphs and visualizations later in this project.



 
 
 

 
 