Ⅰ. INTRODUCTION
A tunnel is considered as one of the most vulnerable road facilities for traffic accidents. Moreover, traffic accidents in a tunnel can have terrible consequences because of the limited shoulder area coupled with the closed characteristics. An accident occurred in the Mont Blanc tunnel, linking France with Italy in 1999, which took 35 lives as a result of a truck catching fire after crashing into other vehicles (EC, 2001). Since the tragic accident, the tunnel-incident detection system has attracted substantial interest worldwide. Approximately 600 vehicle crashes occur in tunnels annually in South Korea, and the accident rate is showing an upward trend mainly because of the increasing number of tunnels (Kim and Lee, 2004). In Korea, the installation of a tunnel-traffic management system is recommended in roadway tunnels longer than 1 km, as described in <Fig. 1>. In the system, an accident is first detected by vehicle detection systems, followed by the verification using closed circuit televisions. Once confirmed, the strategies to prevent secondary accidents are carried out using variable message signs (VMS) and lane control systems (LCS). All the devices are connected with optic cable and controlled automatically in a remote center.
Traditionally, incident detection algorithms, which are based on the traffic data collected using the inductive loop detectors, have been used to automatically detect traffic incidents in tunnels (Balke, 1993;Castro et al., 2012;Browne et al., 2005). Among them, California, APID, and McMaster algorithms are widely reputed algorithms, which were developed in the United States and Canada. The algorithms identify aberrant traffic flow patterns generated by the aftermath of an incident. Some parameters are predefined to detect the abnormality. Unfortunately, the calibration of those parameters has been considered a cumbersome task (Abdulhai and Ritchie, 1999). Although they have been calibrated properly in a specific situation, the lack of spatiotemporal transferability of the parameters hinders the broad adoption of these algorithms. One survey showed that only 12.5% of tunnel management centers in the United States use a fully operational incident detection algorithm because of this deficiency (Willian et al., 2007).
The video image-based incident detection system has become popular worldwide from the mid-2000s (Prevedouros et al., 2006). This can automatically detect various incidents in tunnels, such as fire, crashes, debris, stopped vehicles, pedestrians, wrong-way driving vehicles, etc. (Fahrtash, 2012). Generally, a video incident detection system has the merit that the incident and its severity can be verified promptly because of its unique characteristic of visibility. On the other hand, its incident detection capability is influenced significantly by some challenging conditions, such as sun glare, changing illumination, dust, etc. Furthermore, incidents are obscured easily by other vehicles as a result of the installation height limitation (approx. 4 m) in the tunnel, which can cause a delay of the incident detection time.
To resolve the shortcomings of previous practices, an incident detection system based on an acoustic signal was invented in this study. Incidents, such as crashes and skids, are accompanied normally by their distinct sounds, and the sound diffusion effect can be minimized in a tunnel. Therefore, it can detect incidents instantaneously, even if other tall vehicles obscure the incident site. The developed system is comprised of three parts: algorithm, acoustic signal collector, and server system. An acoustic signal-processing algorithm, based on nonnegative tensor factorization (NTF) and a hidden Markov model (HMM), was suggested to identify the accident-prone acoustic signals, such as crashes and skids. An aesthetically designed acoustic signal collector was developed to gather sounds in the tunnel. The collector also digitalized the gathered analogue sounds, and transmitted them to a server system in the tunnel-traffic management center. The center system was established to operate the proposed acoustic signal-processing algorithm. An operator in the management center also controls the equipment (VMS and/or LCS) that provide incident information and controlling vehicles.
Ⅱ. SYSTEM COPONENTS
1. Acoustic Signal Processing Algorithm
As described in <Fig. 2>, the suggested acoustic signal-processing algorithm is based on NTF and HMM. The algorithm identifies incident-prone sounds, among other sounds, produced by moving vehicles. The algorithm, which was initially developed by Jeon et al. (2017), uses the channel gains produced by the NTF method, to detect acoustic signals generated by incident events. An HMM-based likelihood ratio test is then conducted to verify the detected events. The NTF method, which was proposed initially by Shashua and Hazan, has been applied widely to discriminate the event sounds from a range of acoustic sources (FitzGerald et al., 2005). The HMM method is a ubiquitous tool to model time series occasions that is generally applied to represent the probability distributions over sequences of acoustic observations. Because of its robustness for pattern recognition, it has been applied to many fields, such as speech, biology, and computer vision (Ghahramani, 2001).
The algorithm, installed in a server system, activates the entire procedure once it receives acoustic signals from an acoustic signal collector installed in a tunnel. First, the short-term Fourier transform (STFT) and a Mel filter bank, comprised of a series of overlapping triangular filters defined by their center frequencies, process the acoustic signals to obtain the Mel-spectral magnitude. Next, the NTF algorithm (Equations 1 – 4) is the executed to enhance the signal-to-noise ratio (SNR) by discriminating the incident-related acoustic signals from other signals that are previously accumulated in the basis tensor database, and then to identify the incidents initially using the predefined min-to-max threshold value determined by the channel gain of the acoustic signals.
Update:
Pre-trained:
where = channel gain matrix at the ith frame and hth iteration, k = kth FFT, m = mth frame, = trained NTF base matrix at the ith frame, = activation matrix at the ith frame and hth iteration, Yi = original signal of the ith tensor, = NTF-based estimated signal of the ith tensor at the hth iteration, = base matrix of incident sound at the ith frame, and = base matrix of the background sound at the ith frame.
<Fig. 3> depicts the noise-attenuating procedure by the NTF algorithm. The SNRs for the acoustic signals, which were collected at distances of 10 and 50 m, were improved by 14 and 17 dB on average, according to Equation 5, suggesting that the proposed algorithm can be applied effectively to incident sounds in a tunnel, in which echoes from the wall are the main source of noise.
where = signal powers of the signal and noise estimated by NTF and = signal powers of original signal and noise.
The proposed algorithm identifies incidents in two steps. The NTF initially identifies an incident-associated acoustic signal, and the signal is finally verified using the HMM-based likelihood test. Compared to previous algorithms (Gemmeke et al., 2013;Valenzise et al., 2007;Clavel et al., 2005;Foggia et al., 2015;Lee et al., 2004;Vacher et al., 2004;Fabaoui et al., 2008), the developed algorithm has an advantage; it conducts the redundant verification using the HMM method to reduce the false alarm rate without harming the detection rate. This redundant verification scheme has not been considered previously.
2. Acoustic Signal Collector
The acoustic signal collector digitalizes the gathered analogue acoustic signals and transmits them to a center server system, onto which the acoustic signal-processing algorithm is mounted. As shown in <Fig. 4>, the aesthetically designed collector is comprised of two microphones, a video camera, and necessary interfaces. The microphones attached to both sides of the collector gather analogue sounds. It has a signal-to-noise ratio of 65 dB, sensitivity of -40 dB, and dynamic range of 30 to 120 dB. An incident, which is identified by the acoustic signal-processing algorithm, is verified with the video camera capable of pan/tilt/zoom functions. The associated interfaces, including the Ethernet, speaker, temperature sensor, and USB, are also included to provide the necessary functions, such as transmitting the digitalized acoustic signals, debugging, alarming verified accidents to the drivers on the road ahead, observing environmental conditions, and debugging any possible errors of the collector.
The firmware of the collector was programmed on ARM Cortex-M7 using C language. The process of the firmware is composed of two threads, audio, and network. The audio thread receives acoustic signals through the microphones and accumulates them in SDRAM. The network thread, operating similar to a TCP/IP server, transmits the stored acoustic signals to the center server system whenever the server with a predefined protocol requests them. Hence, the TCP/IP server remains on a standby status with a socket open, until the center server requests the accumulated acoustic signal data.
3. Server System
The server system (Fig. 5) installed in a tunnel traffic management center was categorized into acoustic signal analysis and a traffic management server. The acoustic signal analysis server receives the digitalized acoustic signals sampled at 84 ms from the acoustic signal collector, and operates the acoustic signal-processing algorithm described in the previous chapter to detect incidents. The sampling rate of 84 ms was determined, where the algorithm performed satisfactorily in terms of the detection rate, false alarm, and detection time, which is also coincident with previous studies (Harlow and Wang, 2002). Once an incident is detected and verified, the information is then transmitted to the traffic management server that controls other relevant devices, including LCS, VMS, CCTV, and so on.
Ⅲ. PERFORMANCE EVALUATION
The performance of the developed system was evaluated in two phases: controlled and uncontrolled roadway tunnel environment. For an evaluation in a controlled environment, an unused roadway tunnel and recorded incident sounds were prepared. For the uncontrolled test, six-month long real world incident data from in a tunnel were used. Three widely used evaluation indices for incident-detection systems, detection rate (DR), false-alarm rate (FAR), and mean time to detection (MTTD), as expressed in Equations 6 to 8, were used to evaluate the performance.
where DR = detection rate, TCD = total number of correct incidents detected, TDS = total number of incidents in data set, FAR = false-alarm rate, TFD = total number of false incidents detected, TIO = total number of incidents observed in test data set, MTTD = mean time to detection, DST = detected incident start time, IST = actual incident start time, and TIO = total number of incidents observed.
1. Evaluation in a Controlled Environment
An old tunnel in Daejeon, Korea was selected for a controlled evaluation. The tunnel was restricted for normal vehicles because of a rehabilitation project. Hence, a controlled test with recorded incident sounds was possible with permission from the corresponding agency. According to the Korean guidelines that stipulate spacing for a tunnel-incident detection system, the acoustic signal collectors were installed at a 10, 30, and 50 m spacing, as shown in <Fig. 6>. The acoustic sources for the performance test consisted of real-world 200 crash and 37 skid sounds that were obtained from broadly recognized organizations, including the Euro New Car Assessment Program and the Insurance Institute for Highway Safety of the United States. The sounds were played using a speaker at similar sound pressure levels (SPLs) to those of real sounds. According to a study (Neale, 2008), the SPLs for vehicle crashes and skids ranges from 110 to 130 and from 90 to 100 dB, respectively. The experiment was conducted for four hours on October 18, 2017 under controlled roadway tunnel conditions.
The performance of the developed system under the controlled environment condition exhibited DRs of 95, 92, and 80%, FARs of 2.6, 3.0, and 3.6%; and MTTD of 1.2, 1.3, and 1.4 s for distances of 0, 30, and 50 m, respectively (Table 1). The DR shows a significantly different performance according to distance. This indicates a decreasing pattern of performance, as the sound collectors are located farther from the sound source, possibly because of the decrease in sound magnitude. Otherwise, other two categories showed no notable differences. Here, the minor difference in FAR can be highly regarded, because according to a survey (Hancocks, 2011), the reluctance of road agencies to deploy incident detection systems is normally attributed to high FARs. <Fig. 7> presents a three-dimensional graph for the performances at each distance compared to the perfect score (DR of 100 %, FAR of 0 %, and MTTD of 0 s).
2. Evaluation in an Uncontrolled Environment
A roadway tunnel with high frequency of incidents was selected for an uncontrolled evaluation. The 755 meter-long tunnel named Gwang-Ahm is located in the vicinity of Seoul and is comprised of four lanes in each direction. The acoustic sensors with a 100 m spacing were attached temporarily to extinguishers on the tunnel. For the sake of easy installation, the sensors were designed specially and no cameras were included. Indeed, many cameras had been already installed in the tunnel, so no further cameras were necessary to verify the incidents that occurred. The acoustic data sampled at 84 ms were transmitted to a server system, on which the developed algorithm had been mounted, as shown in <Fig. 8>. The fiber-optic cable was used for the real-time data transmission, and a LTE-based wireless communication dongle was used for the backup of the incident event data (sound and video image) identified by the algorithm for further analysis.
When initially installed at the site, all parameters of the algorithm were set at the same values, as in the controlled environment. To optimize the algorithm parameters, two-month long acoustic data, including 10 skids and 1 crash, were obtained. The collected acoustic signals were classified into two groups: incident and non-incident sounds. Using the archived data, the non-negative basis model was re-trained to adapt to the characteristics of sound (incident and non-incident) in the tunnel (see <Fig. 9>). <Table 2> lists the difference in performance before and after optimizing the parameters. FAR was reduced significantly after optimization. The main causes of the FAR, when initially installed, included the horn, siren, and braking sounds from large vehicles, and they were resolved by optimizing the parameters. As shown in <Table 3>, however, large noises from extremely fast moving sports cars could not be settled because of the low sample size. Therefore, the cause of the FAR after optimization was comprised solely of the extremely loud sounds from fast moving vehicles.
Ⅳ. DISCUSSION
Compared to existing technologies, the results of the field evaluations were considered satisfactory. As shown in <Table 4>, the incident detection time is the most encouraging performance element of the suggested system that is superior to the existing technologies, and has been emphasized to prevent a traffic pileup (Ozbay and Kachroo, 1999). A truck driver who escaped from the Mont Blanc tunnel accident reportedly said, “Everything was ablaze in half a minute. I ran for my life. Behind me, all hell broke loose. In a few minutes, the tunnel was like an oven.” The developed system has advantages from the perspective of operations; it can be calibrated easily compared to the benchmark systems, and there are no restraints for tunnel environments in terms of the installation height and occlusion by tall vehicles.
Although the developed system has some merits over the existing systems, is not a magic bullet for identifying tunnel incidents. <Table 4> shows that it can only detect tunnel incidents associated with sounds, e.g., crash and skid. Therefore, it is desirable to deploy the developed system in combination with traditional systems, as a complete solution for identifying tunnel incidents.
Ⅴ. CONCLUSIONS
As tunnel construction technologies, such as tunnel boring machines advance, more agencies are opting for the construction of roadway tunnels rather than harming the environment by building roads on mountains. Recently, underground roads are being actively built and planned in the Seoul metropolitan region in Korea. Therefore, an emphasis on securing traffic safety in the infrastructure is becoming increasingly essential. The instantaneous detection of incidents is extremely important for preventing secondary accidents and minimizing the rescue time. On the other hand, conventional practices have shown some limitations from the perspective of immediacy, such as traffic detector databased and video image-based algorithms. To resolve the shortcoming, a tunnel-incident detection algorithm using the acoustic signals from crashes and skids was suggested and evaluated thoroughly in real-world situations using abundant data obtained over an eight-month period.
The developed system is comprised broadly of three elements: acoustic signal processing algorithm, acoustic signal collector, and center server system. An acoustic signal-processing algorithm using nonnegative tensor factorization (NTF) and hidden Markov model (HMM) was developed. The NTF algorithm was verified to increase the signal-to-noise (SNR) ratio of crash sounds by 27 dB at the maximum, as tested using the tunnel incident sounds. This SNR enhancement is generally considered to be critical for improving the capability of incident detection for any kind of sound-based system. The aesthetically designed acoustic signal collector observes, digitalizes, and transmits a range of acoustic signals to the center server system. The algorithm also has the capability to verify the types of the incident through a built-in video image camera equipped with a pan/tilt/zoom function. The center server system identifies the incident cases from the acoustic signals transmitted at 84 ms intervals. Once an incident is detected by the algorithm and verified using a video camera, appropriate mitigation strategies, such as displaying the incident information on the variable message signs and dispatching rescue personnel, can be implemented. The real world evaluations, under controlled and uncontrolled environment conditions of the proposed system, resulted in encouraging consequences with DR of 95–80%, FAR of 2.6–3.6%, and MTTD less than 1.4 s for the controlled condition; and DR of 94%, FAR of 7%, and MTTD of 1.8 s for the uncontrolled condition. These performances are superior to the existing technologies in terms of the detection time, which is critical to preventing the adverse effects of an incident.
The sound processing algorithm suggested could be improved further by applying a more sophisticated pattern-matching algorithm, such as a recurrent neural network, when a large number of incident sounds become available. Other incident-associated acoustic signals from flat tires and bangs could be considered for widening the capability to identify incidents in the tunnel. Only two sensors (microphones) were employed for this study, as is the typical case for a sound-based event detection system (Jeon et al., 2017). On the other hand, applying more sensors might improve the performance, which is planned to be attempted in subsequent research.