Ⅰ. Introduction
When confronted with traffic congestion, drivers are eager to know traffic conditions on their routes and road managers attempt to find ways to efficiently operate the road network. To this end, real-time travel-time (TT) information that can allow drivers to divert their routes to less congested ones or adjust their trip schedules are becoming an essential component of modern traffic systems. Conventionally, vehicle detectors that gather traffic volume, speed, and occupancy time data have been widely used to produce TTs, but as communication technologies are advancing, coupled with an increase in market penetration of relevant wireless devices, probe-based systems that use wireless communications such as global positioning systems combined with cellular communication, dedicated short-range communications (DSRC), Bluetooth, and Wi-Fi are attracting more interest worldwide. Compared to vehicle detectors, probe-based systems not only have the merit of being able to collect link TTs directly but also impose less monetary burden on road agencies.
However, probe-based systems have two main issues that need to be carefully addressed: one is outliers and the other is the time lag included in probe TTs. The main sources of outliers include parking activities, exiting/entering maneuvers on the route, illegal driving on shoulder lanes during congestion, and so on. Unlike vehicle-detector-based systems that can estimate TTs using current spot speeds from multiple detectors, a probe TT can be acquired when the probe terminates the trip on the route. In other words, probe TTs are past TTs from the perspective of drivers who receive the TT information at the start point. Without treating these two issues properly, probe-based TTs might not be as effective as expected.
To deal with the outlier problem in probe TTs, substantial studies have been conducted. The TransGuide algorithm uses a simple validity window developed by the Southwest Research Institute(1998). A statistical method for cleaning outlying observations included in probe TT data gathered from license number plates was proposed by Clark et al.(2002). An algorithm to estimate link TTs with an assumption that TTs follow a log-normal distribution was proposed by Dion and Rakha(2006). A median filter approach that uses the median, instead of the mean, as a measure of central tendency was suggested by Ma and Koutsopoulos(2010). A method that filters out probe TTs that lie outside a predefined confidence interval determined based on the Greenshield’s traffic flow model was invented by Boxel et al.(2011). An adaptive outlier detection algorithm that uses historical and current data gathered on signalized arterials to determine a validity window was developed by Soroush and Bruce(2014). Some studies to censor outliers contaminating probe TTs on signalized rural highways were also performed by the author(2016). However, all the studies performed previously are only suitable for aggregated data, so it is necessary to develop a new method for a probe system where TT information is generated in the unit of an individual probe vehicle.
To mitigate the above-mentioned time-lag problem in probe-based TT systems, various prediction models have been developed (Lint, 2005;Hinsbergen, 2009). Time series models, such as ARIMA and Kalman filter (KF), and pattern matching techniques, such as neural networks, support vector regression, and k-nearest neighbor, are two main categories employed in previous studies (Billings, 2006;You, 2000;Robinson, 2005;Bajwa, 2005;Myung, 2011). However, prediction techniques, though many researchers are still attempting to develop more robust techniques, are generally known to be effective where day-to-day TT patterns are regularly fluctuate. That is, if TT patterns are extremely irregular, no prediction techniques could be performed as expected; rather, they might significantly increase errors in real-time TT information provided. In an extreme case, complaints about inaccurate TT information are issued by drivers. Hence, many agencies that operate TT information systems with high variation tend to be reluctant to employ prediction techniques for their systems.
In this regard, a scheme that uses individual probe TTs for real-time TT information to reduce time-lag, especially during congestion, is suggested in this paper. Because of low probe sampling, direct use of raw individual probe TTs cause significant fluctuations, resulting in high error in TT information. To alleviate high fluctuations in individual probe TTs, KF, moving average, and Loess techniques were applied to smooth the individual TTs and compared their performances to select the optimal technique. Since no outlier treatment methods suitable for individual probe-based TT information have been found, an outlier removal algorithm applicable to individual probe TTs is also developed. During free-flow conditions, 5-min aggregated data that have been conventionally adopted by many agencies worldwide for smoothing short-term fluctuation in individual TTs as well as maintaining real-time effect of the TT information provided showed fewer errors due to the relatively stable TT patterns. Therefore, a hybrid method that uses individual probe TTs under congested conditions and 5-min aggregated data under uncongested conditions is proposed. The proposed method was evaluated with real-world probe TT data gathered on a multi-lane highway with high fluctuation in TT patterns and its superiority over two benchmark methods―individual probe TTs and 5-min aggregated TTs―is discussed.
Ⅱ. Study Site
A multi-lane highway, as shown in <Fig. 1>, in the vicinity of Seoul, South Korea, was selected to apply the proposed methodology. The four-lane roadway section spans approximately 3 km in a mountainous terrain with one intersection and one interchange, indicating a typical multi-lane highway according to the Korea Highway Capacity Manual (KHCM). At both ends of the section, 5.8-GHz DSRC scanners that collect unique identifiers and passage times of vehicles equipped with DSRC devices are installed. In Korea, 5.8-GHz DSRC is used for the electronic toll collection system. As shown in <Fig. 2> and 3, 5-min aggregated probe TT patterns fluctuate significantly with correlation coefficients of 0.07–0.53 (Table 1).
In the mountainous terrain, phantom jams―the spontaneous formation of traffic queues although the traffic volume does not exceed the capacity of the roadway section (Wismans, 2015)―are regarded as a critical cause of congestion on the stretch and frequently arise due to sudden braking by aggressive drivers behind slow-moving vehicles, such as trucks with heavy loads. So the time and duration of occurrence of traffic congestion cannot be predicted as well as it is not consistent with the time of day. Due to these high fluctuations, no prediction techniques that can reduce the time lag in probe TT information have been proven to be effective (Lim, 2013). Hence, current 5-min aggregated data are being used to generate real-time TT information. Consequently, real-time TT information contains large errors of more than 100% during peak hours mainly due to the aforementioned time-lag phenomenon, which, of course, should preferably be mitigated.
Recently, a question about TT information at the study site has arisen: could individual probe data be used, instead of 5-min aggregated data, to provide real-time TT information? Could this possibly decrease the time-lag, especially on the short roadway section (3-min driving distance during free flow) on which this study is based? To obtain an answer to the research question, this study initiated the development of a new scheme to use individual probe TTs for real-time TT information. After comparing the accuracy of TT information from individual probe and 5-min aggregated data, a hybrid model that resulted in the lowest error was chosen for the new scheme. This study could be highly regarded in that no previous studies have been found that use individual probe TTs to produce real-time TT information.
Ⅲ. Outlier Removal
As stated above, previous outlier removal techniques reviewed for this study can only be applicable to aggregated data. That is, a validity window of current probe TTs is determined from aggregated data (e.g., 5 min) from the previous or current interval. If no aggregation interval exists, the techniques cannot be applied. Hence, a new outlier removal algorithm needs to be developed for a TT information system based on individual probe observations.
In this study, a confidence interval concept based on the central limit theorem is employed to determine the validity window of individual probe TTs. Here, as the TTs of the study site follow a log-normal distribution rather than a normal distribution (see <Table 2> and <Fig. 4>), a logarithm was applied to the probe TTs. For the normality test, the Kolmogorov-Smirnov statistic at 5% significance level was adopted due to its conservative characteristic of having the lowest possibility of falsely rejecting a correct fit (Jang, 2012). To justify utilizing the z-score, the previous 30 valid TTs were used to calculate the mean and standard deviation. Here, the z-score representing the confidence interval can be optimized with specific data of interest.
where TAB(t) = =travel time from point A to B at time t; tAB(t) = =ln (TT) from point A to B at time t experienced by vehicle m; x = average of previous 30 valid ln (TTs); z = z-score at a confidence level [for this study, 99.9% (z = 3) was used]; and σ = standard deviation of previous 30 valid ln (TTs).
The developed algorithm is relatively simple compared to previous ones, but the performance was proven to be satisfactory when it was applied to three-day block data collected on January 14–16, 2013. <Fig. 5> shows that most of the apparently aberrant TTs lie outside of the validity window determined by Equation 2. The DSRC scanners installed at the study site generated substantial outliers due to U-turns, exit/entry maneuvers, and intermediate stops during the trip. Sometimes, a car is observed in one direction and reobserved in the opposite direction sometime later, usually on the way back. In this case, the first observation generates a correct TT, but the later one generates an abnormally long TT. This situation frequently occurs because DSRC scanners installed on thhe experiment route cannot spot the direction.
Ⅳ. Travel-Time Estimation Technique
1. Exploring Techniques for Smoothing Individual Travel Times
Due to high variability in individual probe TTs, their direct use for providing real-time TT information causes large errors. This high variations can be attributed to variability in the vehicle mix, in drivers’ characteristics, among other things. To mitigate the high variability, applying smoothing techniques to raw probe TTs was necessary. In this study, three widely-recognized smoothing techniques―Kalman filter, moving average, and Loess ―were explored and the technique that exhibits the lowest error was chosen for use in generating individual probe-based TT information.
Kalman Filter
Initially developed by Kalman in 1960, Kalman filter has been broadly applied to smooth and predict variables observed in a time sequence (Kim, 2010). In this study, the KF algorithm (see below) constructed by Chien (20) for use in processing probe TT data was adopted. It should be noted that this algorithm was used for smoothing probe TTs in this study, although it has been used for predicting TTs in other studies (Chien, 2003;Jang, 2013).
-
Step 1. Initialization: set t = 0 and let and
-
Step 2. Extrapolation State estimate extrapolation: Error covariance extrapolation:
-
Step 3. Kalman gain calculation:
-
Step 4. Parameter update State estimate update: Error covariance update:
-
Step 5. Next iteration: let t = t + 1 and return to step 2.
where t = time; x(t) = travel time at t; = smoothed travel time at t; P(t) = covariance of estimation error at t; φ(t) = transition parameter; Q(t) = variance of noise term; K(t) = Kalman gain; R(t) = variance of measurement error at t; and z(t) = observed travel time at t.
Moving Average
A moving average (MA) filter smooths data by replacing each data point with the average of the neighboring data points defined within the span. This process is equivalent to low-pass filtering with the response of the smoothing given by the difference equation (3).
where ys(i) = the smoothed value for the i-th data point; N = the number of neighboring data points on either side of ys(i); and 2N+1 = the span for calculating the moving average.
The two fundamental and widely used MAs include a simple moving average, which is the simple average of values over a predefined span, and an exponential moving average, which places greater weight on more recent observations. MAs are most commonly applied to identify the fundamental trends and to resist any forms of unexpected short-term variations.
Loess
The name of ‘Loess (or Lowess)’ is derived from the term ‘locally weighted scatter plot smooth’ and refers to the use of locally weighted linear regression to smooth observations. The smoothing procedure is performed locally in that each smoothed value is determined by neighboring values lying within the predefined span. The procedure is weighted in that a regression weight function, shown in equation (4), is defined for the data points contained within the span.
where wi = weight function for xi; x = predictor value associated with the response value to be smoothed; xi = neighbors of x as defined by the span; and d(x) = distance along the x-axis from x to the most distant predictor values within the span.
The greatest advantage of Loess procedure is the fact that it does not require the specification of a function to fit a model to all the data in the sample. Instead, the analyst only has to provide a smoothing parameter value and the degree of the local polynomial. In addition, Loess procedure is very flexible, making it ideal for modeling complex processes for which no theoretical models exist. These two advantages, combined with the simplicity of the method, make Loess procedure one of the most attractive of the modern regression methods for applications that fit the general framework of least-squares regression but have a complex deterministic structure (NIST, 2017).
Selection of optimal smoothing technique
The performance of the three techniques was compared using probe TTs gathered on the study section. Values smoothed according to each technique using arrival time-based probe TTs were evaluated using baseline data obtained from departure time-based probe TTs at the next time interval. As shown in <Table 3> and <Fig. 6>, because KF exhibited the best performance, though not statistically verified, in all cases, KF was selected as a smoothing technique for generating individual probe-based TT information. The reason for the KF’s high performance is presumably attributable to its ability to update variables with high weight placed on the latest observation, which contributes to reducing time lag between arrival time-based and departure time-based TTs. On the other hand, MA and Loess techniques put less weight on the latest value because of their autoregressive nature.
2. Analysis of Travel Times from Individual Probe and Five-Minute Aggregation
TT information from each scheme―individual probe TTs smoothed by KF and 5-min aggregated TTs―was evaluated using baseline TTs obtained by probes that pass the start point while providing the TT information from each scheme. The baseline TTs for the two schemes were therefore different from each other. For the smoothed individual probe TTs, baseline TTs were gathered until the next probe arrived at the end point, normally after 10-30 s. For 5-min aggregated TTs, baseline TTs were generated by aggregating TTs obtained by probes that passed the start point during the next 5 min after the generation of the 5-min aggregated data. In short, TTs experienced by probes that receive the TT information through each scheme were used as baseline data. This baseline acquisition method is considered to be reasonable in that the real-time TT information produced by each scheme is supposed to be provided to the vehicles that pass the start point immediately after the production of the information. <Fig. 7> shows the plots of errors of real-time TT information from smoothed individual probes and 5-min aggregated TTs for the three-day block data.
It is notable, as a result of the analysis on the data used in this study, that the errors in the 5-min aggregated data are significantly higher than those in the data from the individual probes during congestion, and vice versa. The traffic management center (TMC) categorizes the traffic flow conditions on the experiment roadway section into two categories―congestion and non-congestion―using the classification methodology presented in the KHCM. According to the KHCM, the level of service (LOS) for a multi-lane highway is determined by the travel speed, which can be measured directly using DSRC scanners. LOS A through E are regarded as uncongested conditions, while LOS F (travel speed less than 42 km/h) is considered to be congested.
As <Table 4> shows, the differences in the percentage errors in TT information on three consecutive days from individual probes and 5-min aggregated data proved to be significant, where t-statistics at 5% significance level were significantly higher than the critical value of 1.96 and p-values were considerably lower than the critical value of 0.05. The difference in errors is probably attributable to the fact that while the time-lag phenomena of 5-min aggregated TTs increased errors under congested conditions, smoothing effects of 5-min aggregated TTs contributed to decrease errors under uncongested conditions. Taking these findings into account, a new scheme that can reduce errors in real-time TT information was developed.
3. Hybrid Model for Estimating Travel Times
The fundamental logic behind the proposed scheme is to maximize the benefit and minimize the detriment of each method for generating TT information. That is, under normal (or uncongested) conditions, 5-min aggregated data are used; under congested conditions, individual probe data smoothed by KF are used directly for real-time TT information. The proposed algorithm, combined with the outlier removal method, is represented in <Fig. 8>. Based on the threshold value for speed from the individual probes, a method for generating TT information is established. The threshold value for the test site was set at 42 km/h, which corresponds to LOS F, according to the KHCM.
Ⅴ. Assessment of the Proposed Scheme
The proposed hybrid scheme and the two benchmark methods―5-min aggregated TTs and individual probe TTs smoothed by KF―were evaluated using two evaluation indices: mean absolute percentage error (MAPE) and root relative square error (RRSE). The MAPE, given by Equation 5 is officially specified for use as an evaluation index for traffic data in Korea, according to the Intelligent Transport Systems performance evaluation guidelines (MOLIT, 2010). The RRSE is given by Equation 6. The proposed hybrid scheme yielded the lowest errors―in terms of percentage error (MAPE) and relative error (RRSE)―for the three-day block data (see <Table 5>). The error reductions ranged from 0.8 to 1.9 percentage points, which corresponds to a 9–18% improvement compared to the current state of practice of 5-min aggregation.
where MAPE = mean absolute percent error; n = number of samples; x(t) = baseline travel time; and = travel time information produced by each scheme; RRSE = root relative square error.
To statistically verify the improvement, statistical t-tests were performed. Consequently, the differences in errors were proven to be significant by paired t-tests, as shown in <Table 6>, with t-statistics higher than the critical t-statistic of 1.96 at 5% significance level and with the corresponding p-values being lower than 0.05. Since the proposed scheme exploits the advantages of both benchmark methods, the consequences seem to be reasonable. <Fig. 9> shows comparison of percentage errors of real-time TT information from individual probe TTs smoothed by KF, 5-min aggregated data, and the proposed (hybrid) scheme.
Ⅵ. Conclusions and Future Studies
As probe-based TT systems become popular, techniques for reducing time-lag are gaining more interest in traffic management systems. To decrease the time lag, various TT prediction techniques have been applied. However, they cannot be effectively applied to TT systems with high TT fluctuations. So simple 5-min aggregated data are used in many TT systems with high TT variability; they, of course, cause substantial errors in real-time TT information. Recently, a research question has arisen: could individual probe TTs be used as real-time TT information? Could this possibly reduce the time lag in TT information compared to the use of 5-min aggregated data, which is the current state of practice? In this study, a thorough investigation of TT information errors from the two types of TT information generation schemes―individual probe and 5-min aggregated data―was conducted. Subsequently, a hybrid method that selectively uses individual probe and 5-min aggregated data with a judgment logic was proposed. To exploit individual probe TTs as real-time TT information, a new outlier treatment technique was also developed with consideration of the TT distribution characteristics of the study site and showed satisfactory performance. To smooth individual probe TTs, KF technique was applied, which led to better performance than other techniques including MA and Loess.
The proposed method was evaluated with real-world data, and the real-time TT information errors diminished by 9–18% in comparison with the benchmark method of using 5-min aggregated data. The improvement was also proven to be significant by t-statistics. The findings of this study can be applied in practice to real-world systems that are compelled to use simple 5-min aggregated data due to their high TT variability, which prevents the effective application of any kind of prediction technique. The next step of the research would be to expand the temporal data as well as to transfer the proposed scheme to other sites to verify its robustness. Also, other schemes that use various data from optional gathering points and time need to considered if circumstances permit. Lastly, given the recent advances in prediction techniques, further studies on robust methods that forecast TTs satisfactorily under irregular conditions would be required.