The Candrive/Ozcandrive project is a long term study that is now entering its sixth year focused on improving the safety of older drivers. The study includes 256 older drivers in the Ottawa area and is an example of a longitudinal study that generates big data sensor information recorded from the participant vehicles. This paper uses the Can drive data and proposes solutions that would enable differential privacy including a theoretical open access model for the data using k anonymity techniques for any combination of 7 parameters that have identifiable attributes. The dataset includes an in-vehicle sensor that captures Global Positioning System (GPS) and On Board Diagnostics II (OBDII) data for every second that the vehicle is operating. The resulting data set includes hundreds to thousands of hours of data for each of the study vehicles. The paper discusses methods to address the challenge of transitioning a large data set of GPS and other raw sensor samples to data ready to analyze. Automated methods to detect and correct any issues in the individual data samples along with the needed tools to adapt the raw sensor data into formats that can be easily processed are shown. The paper provides solutions to ensure k anonymity based privacy of the study participant's identity for seven parameters including location of their home through vehicle location information or through a combination of the sensor information. The paper presents mechanisms to augment the captured sensor data through fusion with external data resources to bring added information to the data set including weather information, road information from mapping sources and day/night status. The paper will present the performance applicability for analysis of the resulting dataset within a cloud computing architecture.

Additional Metadata
Keywords data analytics, Differential Privacy, driving, Global Positioning System (GPS), k-Anonymity
Persistent URL dx.doi.org/10.1109/BigDataCongress.2015.93
Conference 4th IEEE International Congress on Big Data, BigData Congress 2015
Citation
Wallace, B. (Bruce), Goubran, R, Knoefel, F. (Frank), Marshall, S. (Shawn), Porter, M. (Michelle), Harlow, M. (Madelaine), & Puli, A. (Akshay). (2015). Automation of the Validation, Anonymization, and Augmentation of Big Data from a Multi-year Driving Study. In Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015 (pp. 608–614). doi:10.1109/BigDataCongress.2015.93