Document Type : Original Article
Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran
Department of Industrial Engineering, Sharif University of Technology
In this paper, a data mining approach is proposed for duration prediction of the town trips (travel time) in New York City. In this regard, at first, two novel approaches, including a mathematical and a statistical approach, are proposed for grouping categorical variables with a huge number of levels. The proposed approaches work based on the cost matrix generated by repetitive post-hoc tests for different pairs. Then, a random forest model is constructed for the prediction of the type of trips, short or long. Finally, based on the trip type and each of the mathematical and statistical approaches, separate artificial neural networks (ANN) are developed to predict the duration time of the trips. According to the results, the mathematical approach performs better and provides more accurate results than the statistical approach. In addition, the proposed methods are compared with some other methods in the literature in which the results show that they perform better than all other methods. The RMSE of mathematical and statistical approaches is, respectively, 4.23 and 4.27 minutes for short trips, and the related value is 9.5 minutes for long trips. In addition, a modified version of the nearest neighborhood approach, entitled modified nearest neighborhood (MNN), is proposed for the prediction of the trip duration. This model resulted in accurate predictions where its RMSE is 4.45 minutes.