Background Statistics Canada has initiated a series of data linkages of Census of Population long form and health outcome data. These linked data lack risk factor information. This study assesses the feasibility of using statistical modelling techniques to assign smoking status to census respondents. Data and methods The 2000/2001 Canadian Community Health Survey (CCHS) was used to develop age-/ sex-specific predictive models to model smoking status based on variables available on the 1991 Census. The 2002/2003 CCHS was used to validate the modelled variable. Data from the 2002/2003 CCHS linked to data from the Hospital Morbidity Database (2001/2002 to 2004/2005) were used to evaluate the use of modelled versus self-reported smoking status on smoking-related hospitalizations. Results For the current daily smoker models, income, education, marital status, dwelling ownership and region of birth were significant predictors. For the never smokermodels, marital status, dwelling ownership, Aboriginal identity and region of birth were significant predictors. Modelled current daily smoker status was associated with increased odds of smoking-related hospitalization, compared with being a never smoker, even when adjusting for covariates. Interpretation This study demonstrates the feasibility of using statistical modelling techniques to assign smoking status to census data, provided socio-economic and identity information is available.

Additional Metadata
Keywords Health surveys, Hospitalization, Roc Curve, Socioeconomic factors, Statistical models
Journal Health Reports
Citation
Sanmartin, C. (Claudia), Finès, P. (Philippe), Khan, S. (Saeeda), Peters, P, Tjepkema, M. (Michael), Bernier, J. (Julie), & Burnett, R. (Rick). (2013). Modelling risk factor information for linked census data: The case of smoking. Health Reports, 24(6), 9–15.