Moving objects such as people, animals, and vehicles have generated a large amount of spatiotemporal data by using location-capture technologies and mobile devices. This collected data needs to be processed, visualized and analyzed to transform raw trajectory data into useful knowledge. In this study, we build a system to deliver a set of traffic insights and recommendations by applying two techniques, clustering, and sequential pattern mining. This system has three stages, the first stage preprocesses and samples the dataset into 168 subsets, the second stage applies two clustering techniques, the hierarchical density-based spatial clustering (HDBSCAN) and the Random Swap clustering (RS). We compare these two clustering algorithms in terms of processing time and quality of clusters. In the comparative analysis, the Silhouette coefficient shows that RS clustering outperforms HDBSCAN in terms of clusters quality. Moreover, the analysis shows that RS outperforms K-means in terms of the mean of square error (MSE) reduction. After that, we use a Google Maps approach to label the traffic districts and apply sequential pattern mining to extract taxi trips flow. The system can detect 146 sequential patterns in different areas of the city. In the last stage, we visualize traffic clusters generated from the RS algorithm. Furthermore, we visualize the taxi trips heatmap per weekday and hour of the day in Porto city. This system can be integrated with the current traffic control applications to provide useful guidelines for taxi drivers, passengers, and transportation authorities.

HDBSCAN, Random Swap, Sequential pattern mining
Journal of Big Data
School of Information Technology

Ibrahim, R. (Rami), & Shafiq, M.O. (2019). Detecting taxi movements using Random Swap clustering and sequential pattern mining. Journal of Big Data, 6(1). doi:10.1186/s40537-019-0203-6