Ⅰ. Introduction
Traffic analyses, such as traffic signal optimization and predicting network-wise travel time, cannot be solved by using simple regression models. In addition, field experiments and implementations are often very limited due to safety issues and tremendous costs. Microscopic traffic simulation has been used to describe traffic flow characteristics and solve those issues based on pre-defined driving behavior models embedded in such tools (e.g. a car-following, lane-changing, gap-acceptance models, and so on). They can represent realistic traffic situations including normal and abnormal traffic conditions if their parameters are properly calibrated. Hence, calibration of the models is a key procedure to obtain realistic and reliable results from a microscopic traffic simulation analysis.
A variety of calibration approaches have developed using parametric/nonparametric statistical models or machine learning algorithms(Kim et al., 2005;Eriksson, 1999;Kesting and Treiber, 2008;Van Hinsbergen, et al., 2009;Vasconselos, 2014;Treiber and Kesting, 2013;Park and Qi, 2005;Park et al., 2006;Cunto and Saccomanno, 2008) while there has been no standard. A genetic algorithm (GA) is a search algorithm that emulates biological evolutionary theories. A GA searches solutions at multiple locations in a parameter space based on three probabilistic rules including selection, crossover, and mutation over generations (or iterations). Thus, the main advantage of GA is that it has a low likelihood of getting stuck in local optima while providing high computational efficiency(Kim et al., 2005;Howe and Bozdogan, 2010). This tool has become useful for finding near-optimal values of parameters embedded in a traffic simulation model(Kesting and Treiber, 2008;Vasconselos, 2014;Treiber and Kesting, 2013;Park and Qi, 2005;Park et al., 2006;Cunto and Saccomanno, 2008;Cheu et al., 1998). Although a GA has provided reliable solutions for calibration, improvement of its computational efficiency in traffic simulation study has less been paid attention. Traffic simulation studies generally require to calibrate a large number of model parameters and each simulation run takes longer computation time than deriving an analytic solution. That is, a large number of parameters with each parameter having multi-level values leads an excessive amount of calibration time (e.g., 5 parameters with 5 values per each makes 3,125 (=55) possible combinations).
This paper proposes a hybrid calibration procedure in which response surface methodology (RSM) with central composite design (CCD) is employed with GA to improve efficiency of the calibration process. Using field-collected vehicle trajectory data, the procedure is applied to calibrate the parameters of Gipps car-following model in order to fit spacing distribution between lead and following vehicles.
Ⅱ. Background on Methodology
1. Gipps Car-Following Model
Car-following models are one the most important models embedded in microscopic simulation tools. Most of them can be classified in a collision avoidance class aiming to specify a safe following distance behind the lead vehicle. Gipps car-following model is one of the most commonly used models in the class and mostly known for being a key factor of the Aimsun microscopic simulation tool. It consists of two components(Gipps, 1981): acceleration and deceleration models, corresponding to the empirical formulations defined by equations (1) and (2). These models calculate the speed of each vehicle at a given time t on the basis of its speed at the previous time step.
where,
-
τ : driver’s reaction time
-
n, n-1 : follower and leader, respectively
-
x : space traveled by vehicle
-
υ(t) : vehicle’s speed at time t
-
: follower’s desired speed and maximum acceleration, respectively
-
: most severe braking that follower wishes to undertake and his estimate of leader’s most severe braking capability
-
Sn-1 : leader’s effective length, which is the leading vehicle length plus the follower’s desired inter-vehicle spacing at a stop
The speed of vehicle n at time t + τ is given by the minimum of Equation (1) and (2)
Then, the position of the vehicle n inside the current lane is updated taking this speed into the movement equation
Gipps model includes five parameters in order to compute acceleration and deceleration of a following vehicle. <Table 1> shows the model parameters to be calibrated.
2. Kernel Density Estimation and Genetic Algorithm for Model Calibration
A kernel density estimation (KDE) is a nonparametric method to estimate the probability density of a traffic flow variable, such as density and speed, without assuming a certain distribution of the data. This KDE is used to calculate probability densities of data points obtained by the simulation in this study.
Since the task of simulation in this study assumes that no disaggregated level attribute is available, the simulation model should be calibrated in a way to represent similar distribution of the aggregated attributes that can be observed in collected data. In this sense, the objective of this optimization task is to maximize the likelihood of having simulation results same as values in observed data. For each time point, a probability density of having a predicted value from the simulation using the KDE from the observations can be calculated. Then, the likelihood here is defined as the joint probability that the all predicted data points of the attributes given a certain parameter estimates vector given by Equation (5).
Then, log transformation is used to make the calculation easier shown by Equation (6).
A genetic algorithm (GA) is a heuristic searching technique for both constrained and unconstrained optimization problems. It generates solutions by mimicking the natural selection process. This algorithm continuously evolves a population of candidate solutions toward an optimum over generations by three main methods: selection, crossover, and mutation. In each generation a new solution is generated from its “parents” in a previous generation. The GA has been used widely in calibration of simulation models, to solve the problem using an automated procedure. In this study, a GA solver of the optimization toolbox in Matlab was used to find the best solution for the Gipps model.
3. Response Surface Methodology with Central Composite Design
In general a first-order model from a full- or fractional-factorial design is less useful for searching an optimum since it can only identify linear relationships between explanatory variables and a response variable. Box and Wilson(1951) devised the response surface methodology (RSM) using a second-order model to investigate or optimize processes sequentially. The second-order model, expressed by Equation (7), contains linear interactions of two-factors and pure quadratic terms, as well as an intercept and linear main effect terms in a first-order model.
A central composite design (CCD) is the most frequently utilized second-order design for RSM. This design consists of three types of experimental runs: a two-factor factorial design, center points and axial points. A fractional factorial is generally used in CCD for five or more factors(Mee, 2009). From the center point runs, the pure quadratic effect can be captured.
Ⅲ. Proposed calibration procedure
The overall procedure of the proposed calibration framework is shown in <Fig. 1>. The major characteristics of this approach is to use CCD for estimating a quadratic response model and to employ dual GA on the basis of the response modeling result.
1. Central Composite Design for Response Surface Model Estimation
First of all, CCD was used to build a second-order model for the response variable. The 5 parameters () were considered as factors in <Table 1>, and half fraction of full-factorial was used. This 5-factor CCD is a 36-run design, including 16 of fractional factorial, 10 axial points, and 10 center points. <Table 2> shows both coded and natural units of the design. The ranges of each factor in the design was selected on the basis of authors’ experience with the car-following model. These values define the initial searching space for the proposed calibration framework. For each experiment run, the Gipps car-following model returns the negative sum of log-likelihood for spacing distribution and it is presented in the column Y as a response variable in <Table 2>.
The estimation result of a quadratic model is shown in <Table 3>. Although there are many statistically insignificant estimates, all these were remained for the following two main reasons. First, the 5 factors must be included in the model because those are to be calibrated in the Gipps car-following model. Second, the objective of this calibration framework is to estimate the response variable as much as possible, so simplifying the model will not benefit in that sense.
In the canonical curvature of RSM, if all of the eigenvalues were positive, the global minimum can be found by using the unique stationary point of the fitted surface. In this case, the global minimum was found as 2808.9 at X1=24.36, X2=1.01, X3=2.50, X4=2.51, and X5=10.02. However, instead of using this unique stationary point to find the optimum, GA was used to generate a population near the optimum, since the fitted surface is unbounded when there are both positive and negative eigenvalues.
2. 1st Genetic Algorithm
Once the fitted second-order model for the negative sum of log-likelihood is estimated, this regression model is plugged into GA as a fitness function. The basic idea of running GA with the regression model is to find a near optimum and start 2nd GA with relatively good initial population (i.e., good parents). Thus, the better fitted regression model will bring the initial population closer to optimal parameter sets and improve the fitness value at start of 2nd GA.
3. 2nd Genetic Algorithm
The major difference of 2nd GA is the initial population. The initial population is used to seed in the GA. Traditionally, the initial populations are generated using pseudo random numbers, but in this proposed framework, the 2nd GA uses the final population of 1st GA as its initial population. The advantage of this approach is that we can start GA from near optimum sets or relatively better populations, compared to starting from the pseudo random numbers.
A Dual Genetic Algorithm using Central Composite Design
-
1 : CCD_X = 36-run Combinations of 5 parameters using Central Composite Design (36-by-5 matrix))
-
2 : Y = Response variable (the negative sum of log-likelihood)
-
3 : ModelGipps = Gipps Car-following Model (input=5 parameters, output= the negative sum of log-likelihood)
-
4 : X1 = The local minimum from 1st genetic algorithm model
-
5 : P1 = The final population from 1st genetic algorithm model
-
6 : X2 = The local minimum from 2nd genetic algorithm model
-
7 : P2 = The final population from 2nd genetic algorithm model
-
8 : CCD_X ← Construct CCD and convert to natural unit
-
9 : for i = 1 to size(CCD_X)
-
10 : Yi = ModelGipps (x = CCD_Xi)
-
11 : end for
-
12 : Model1 = Fit a (second order) quadratic regression model (x=CCD_X, y=Y)
-
13 : [X1, P1]=GA [Fitness function =Model1; Initial population = CCD_X]
[X2, P2]= GA [Fitness function = ModelGipps; Initial population = P1]
As a number of options in Matlab’s GA tool affect resultant solutions, the most of them were remained as default set-up with a few changes (e.g. population size = 20, maximum number of iterations = 2,000) to consistently compare the results between the conventional GA and the proposed approach.
4. Simulation Set-up
The following is a pseudo code of the proposed framework, and it was run by Matlab scripts. Note that the conventional GA for model calibration skips the code line number 7-12 and uses initial population generated from pseudo random numbers.
Ⅳ. Case Scenarios
The main contribution of the proposed calibration approach is to increase the efficiency of a calibration process. Therefore, the performance of the proposed approach, named CCD-GA, was compared with that of the conventional GA approach in this study.
In order to calibrate a car-following model, ground truth data are required for validation of parameter estimates. Two case scenarios were established with different ground truth data: simulated trajectory using the Gipps model and sampled trajectories from the Next Generation Simulation (NGSIM) datasets. In the first scenario the purpose of using simulated data as ground truth is to avoid the accuracy issue of the selected car-following model. Hence the characteristics of the proposed calibration approach can be explored more clearly. In contrast, using real data, NGSIM, can give more realistic verification results for the proposed approach.
1. Scenario 1 Using Simulated Vehicle Trajectory
Data Collection for the Lead Vehicle
In order to generate the following vehicle’s behavior using the car-following model, the lead vehicle’s driving trajectory data is required. To get the real vehicle trajectory data, the latitude/longitude data and the corresponding speed of a vehicle were collected every second using a GPS data logger while driving. The data collection was carried out during off-peak hours of a weekday on an arterial in an urban area. Therefore, it includes comprehensive driving behaviors in terms of speed and acceleration. The 300-second data was selected to be used as the lead vehicle’s driving behavior, in which stop-and-go, acceleration and deceleration characteristics with varying magnitude are included.
Data Generation for the Following Vehicle
With respect to the lead vehicle’s data, the following vehicle’s driving attributes were generated using the Gipps car-following model with the parameter values from Vasconcelos et al.(2014) (see Table 4). It was assumed that those parameter values are correct for this study. Since the purpose of this study is to propose a new approach for model calibration and verify its characteristics, this assumption allows to avoid judging the accuracy of the Gipps car-following model.
The time-space diagram of the two vehicles is shown in <Fig. 2>. The spacing between two vehicles, i.e., the distance between two front bumpers of successive vehicles, for every second is measured so that the kernel density distribution was estimated as in <Fig. 2>. Finally, the negative log sum of the distribution was calculated, which was used in the GA in order to compare it with optimized statistics during the iterations.
2. Scenario 2 Using Real Vehicle Trajectory
NGSIM US-101 Datasets
The United States Department of Transportation (US DOT) Federal Highway Administration (FHWA) launched the NGSIM program in 2005 to develop microscopic traffic behaviour models. As a part of this program, the US 101 trajectory dataset(FHWA,2016) was collected on southbound US 101, the Hollywood Freeway in California. Every individual vehicle’s trajectory data were collected on the 640-meter (2,100-feet) length of study area during the morning peak hours.
The trajectories of two successive vehicles on the left-most lane were sampled because they shows deceleration and acceleration patterns repetitively. <Fig. 3> illustrates the time-space diagram and the kernel density distribution for the spacing between two vehicles.
Ⅴ. Analysis Results
1. Minimization of the Negative Log-Sum in GA
Total 2,000 generations were searched successively in the GA procedure to find the optimal set of parameter estimates. Although the negative log sums do not seem to converge after the 2,000 generations, the difference between two approaches can be observed from the beginning of iterations. <Fig. 4(a)> and <Fig. 4(b)> display the optimization performance of both approaches for Scenario 1 and 2, respectively. The traditional method, to apply a single GA, starts above 3,000 of the log sum for both scenarios. However, the proposed approach, to use double GA with CCD, does at around 2,500, which is the value that the single GA approach can reaches after the first three iterations. This is the strong evidence that the CCD-GA method starts to search from the area which is closer to the optimum than the single GA does. It implies that the CCD-GA can reduce a number of iterations from the beginning of the calibration process compared with the traditional method.
2. Comparison of the Follower’s Driving Behaviors
<Table 5> shows the optimal parameter values from the CCD-GA approach for the Gipps model with respect to the ground truth data. Although the negative log-sum of both approaches seemed to converge after the 2,000 iterations, their solutions varied within the searching space, particularly for the maximum acceleration(an), the follower’s estimate of the lead vehicle’s maximum deceleration rate() and the effective length(S(n-1)). If there was strong evidence that the solution should not be accepted, then one can go back to the experimental design step and modify the parameter ranges.
Interestingly, two reproduced trajectories of the following vehicle in Scenario 1 were almost overlapped with the ground truth data in the time-space diagram in <Fig. 5>, despite of the variance of parameter values between two methods. One potential cause of this result is the fact that the lead vehicle’s driving trajectory is too smooth except the complete stop between around 20 and 35 second, so that there are less variations in the follower’s reaction. On the other hand, the solution parameters of CCD-GA approach reproduced a more accurate pattern for the lead vehicle’s stop than those of GA. As a result, this made the kernel density distribution of the proposed method more similar to the ground truth than that of GA.
The reproduced trajectories for Scenario 2 are shown in <Fig. 6>. Both calibration approaches generated the overlapped trajectories each other despite of the different optimization performance at the beginning of the iterations, shown in <Fig. 4(b)>. However, those trajectories were not overlapped with the ground truth. This result stems from that the Gipps model tends to fall short of flexibility and sensitivity to reproduce an irregular spacing pattern in such repetitive accelerating and decelerating traffic flow condition in <Fig. 6>.
Ⅵ. Conclusions and Discussions
Calibration of a car-following model is an essential part of microscopic traffic simulation analyses. And GA has been frequently used for it. Despite the high performance of GA, it starts with the initial population that is assigned as random.
In this study, the dual GA combined with CCD has been proposed in order to improve the efficiency of the calibration process. Using the CCD, GA can start to search from the area that is closer to the optimum so that it can skip initial iterations of a simulation run. The higher efficiency of the proposed method was identified by comparing with the single GA approach which is the convention in traffic simulation area with the simulated and real vehicle trajectory data.
The proposed approach has benefits for the following cases. First, when a single simulation run takes a very long time (e.g. for a physically large-scale network analysis), reducing the number of simulation runs gives higher efficiency. Second, when one needs to find the near-optimum within a relatively short operational time to support fast decision making, this framework can reach the goal more quickly than the conventional approach does. This can be a case of real-time traffic analyses.
Many other options in the proposed method should be investigated for future studies in order to enhance the improvement in this paper. Running GA twice in the proposed algorithm brings additional complexity in computation, compared to running a single GA. Although the impact was not significant in the case study, improvement in the first GA may be necessary for a simulation analysis in real-time traffic operations. Adding more experiment runs in CCD or using a full factorial CCD may be able to estimate better population potentially. Changing GA parameters is another possibility. This includes to change searching boundary dynamically over generations, to use different selection methods, and so on. Applying more complex car-following models may show more evident improvement compared with the conventional GA approach. Lastly, another optimization method as a substitution of the second GA should be tested and compared with the proposed method. Neural-network-based models are known to perform well to find a solution in nonlinear optimization problems.