Thakali, L., Fu, L., & Chen, T. (2016). Model Based versus Data-driven Approach for Road Safety Analysis : Does More Data Help? Transportation Research Record: Journal of the Transportation Research Board, No. 2601.
Crash data for road safety analysis and modeling are growing steadily in size and completeness due to latest advancement in information technologies. This increased availability of large datasets has generated resurgent interest in applying data-driven nonparametric approach as an alternative to the traditional parametric models for crash risk prediction. This paper investigates the question of how the relative performance of these two alternative approaches changes as crash data grows. We focus on comparing two popular techniques from the two approaches: negative binomial models (NB) for the parametric approach and kernel regression (KR) for the nonparametric counterpart. Using two large crash datasets, we investigated the performance of these two methods as a function of the amount of training data. Through a rigorous bootstrapping validation process, we find that the two approaches exhibit strikingly different patterns, especially, in terms of sensitivity to data size. We find that the kernel regression method outperforms the model-based approach – NB in terms of predictive performance and that performance advantage increases noticeably as data available for calibration grows. With the arrival of the Big Data era and the added benefits of enabling automated road safety analysis and improved responsiveness to latest safety issues, nonparametric techniques (especially those of modern machine approaches) could be included as one of the important tools for road safety studies.