Ⅰ. Introduction
Bicycle sharing schemes attract attention as a sustainable alternative for personal mobility in many large cities worldwide, offering convenience and reducing travel time. Thus, cities worldwide are introducing shared bicycle services to promote a healthy society while solving traffic-related problems. Since its introduction, policymakers have gradually increased the supply of shared bicycles, and demand has increased in line with it. However, shared bicycle system (SBS) operators are facing considerable bicycle imbalance problems at rental stations due to inefficiencies in the approaches used to predict shared bicycle demand. Researchers have been investigating approaches for estimating real-time demand for each rental station to help relocate surplus bicycles from low-demand stations to those with higher demand using suitable techniques and considering environmental factors known to affect the use of shared bicycles.
The literature has identified that precipitation is a key factor to consider when estimating shared bicycle demand. However, it is challenging to predict the shared bicycle demand in microscopic spatiotemporal ranges on a rainy day because precipitation, which significantly affects demand for shared bicycle use and is perceived differently by each user, fluctuates over time. Especially, the influence of precipitation on the use of shared bicycles varies depending on how users perceive it. For instance, shared bicycle users exhibit characteristics of changing their usage behavior over time, and the frequency of shared bicycle demand fluctuates depending on weather factors(Sears et al., 2012). Other studies have also shown that the precipitation negatively correlates with the shared bicycle demand(Corcoran et al., 2014;Gebhart and Noland, 2014). Therefore, incorporating precipitation information and the perceptions of individual users into prediction models could help improve prediction accuracy. This can help to reduce the imbalance experienced at rental stations and lessen the inefficiencies in the long-term operation of the SBS.
Several studies have reflected the importance of precipitation in shared bicycle demand prediction models. Some of them categorized the days into rainy days, cloudy days, and snowy days, based on the amount of precipitation(Gebhart and Noland, 2014;Saneinejad et al., 2012;Chen et al., 2016;Hulot et al., 2018). However, this approach is inefficient as it does not provide an appropriate threshold for distinguishing between different categories. Other studies have predicted shared bicycle demand based on hourly precipitation(Kim, 2018;Sathishkumar and Cho, 2020;Gao and Lee, 2019) and daily precipitation information(Corcoran et al., 2014;Hyland et al., 2018;Sun et al., 2018). It was identified that various methods exist for incorporating precipitation to predict shared bicycle demand. Despite the importance of accurately predicting shared bicycle demand and the significance of precipitation information in that process, there has been no study on how to consider precipitation elaborately in modeling shared bicycle demand. Because the influence of precipitation on shared bicycle demand depends on how users perceive precipitation(Gebhart and Noland, 2014), it would be worthwhile to incorporate time-based and user-specific cognitive characteristics regarding precipitation into shared bicycle demand prediction models.
The purpose of this research is to determine the best approach for incorporating precipitation into improving the accuracy of hourly shared bicycle rental and return predictions by station, thereby establishing a time-based and precipitation-reflecting alternative that considers users’ cognitive characteristics related to precipitation. We construct hourly shared bicycle rental and return prediction models for each alternative using the Random Forest and Long Short-Term Memory (LSTM) Ensemble methods. The best alternative for predicting the demand for shared bicycles based on the comprehensive performance comparison is derived and defined as the optimal way to reflect precipitation. This study contributes to addressing bicycle imbalance problems across rental stations by capturing users' perceptions of precipitation, as shown in shared bicycle demand prediction models. In particular, this study enables a more accurate prediction of hourly rental and return demand for each rental station on rainy days. This could help SBS operators determine the proper number of bicycles needed at any rental station during rainy days.
Ⅱ. Literature Review
1. Summary
This section reviews research on predicting the demand for shared bicycles. The necessity of selecting an appropriate analysis unit was emphasized in studies focused on predicting the demand for shared bicycle use. Most previous studies analyzing hourly rental and return demand for shared bicycles have suggested that it is more accurate to predict demand for each rental station than to predict demand by dividing the spatial area into clusters(Rudloff and Lackner, 2014;Lin et al., 2018). Since the shared bicycle demand by rental stations exhibits temporal and spatial correlation, they discovered that two rental stations not on the same road section can have similar usage demand patterns. Additionally, if rental stations are clustered solely by spatial factors, those with different usage patterns can be grouped together, which reduces the reliability of the prediction model. Some research has shown that the rental and return of shared bicycles are highly dependent on spatial and temporal effects(Faghih-Imani and Eluru, 2016). Therefore, we analyze the hourly rental and return for each rental station.
Shared bicycle use has been found to be influenced by the spatial characteristics of rental stations, and some studies have identified spatial factors that affect shared bicycle demand. Studies that explore crucial spatial factors regarding shared bicycle demands have been actively conducted. In particular, it has been identified that the shared bicycle demand was affected by socioeconomic factors, including population and employee densities(El-Assi et al., 2017). Additionally, employment, population, bars, restaurants, and distance to a central location were significant in explaining the demand for shared bicycles(Guidon et al., 2020).
In several studies on predicting shared bicycle demand, weather factors were selected as major influencing factors, in addition to spatial characteristics. Previous studies have attempted to predict the real-time demand for shared bicycles by reflecting weather factors such as temperature and precipitation. From these studies, it was found that a negative correlation exists between precipitation and demand for shared bicycles(Eren and Uz, 2020;Saneinejad et al., 2012;Sathishkumar and Cho, 2020). The tendency is stronger at rental stations near subway stations(Gebhart and Noland, 2014). This is because the influence of precipitation on shared bicycle use varies depending on the type of shared bicycle user. Additionally, the influence on precipitation varies according to the location of the shared bicycle rental station (Kim, 2018). Interestingly, it was found that weather factors had different influences during the various time periods within a day. This finding suggests that it is necessary to consider the spatiotemporal characteristics of rental stations by taking into account weather factors. In this study, we considered only temperature and precipitation in the prediction model to exclude the interaction effects between precipitation and other influencing factors, such as public bicycle use.
Shared bicycle travel behavior varies significantly depending on the purpose of use(Zhang et al., 2016). For example, tourists with a one-day pass tend to use bicycles to travel between attraction spots, whereas subscribers tend to use them for commuting between work, school, and home. Users for leisure travel long distances over a long time, while commuters travel relatively short distances. In other words, it is necessary to consider the travel purposes of shared bicycle users when investigating the effects of weather factors or spatial characteristics on the decision to rent shared bicycles.
Numerous prediction methods for shared bicycle demand were proposed. Accurately predicting demand in microscopic spatiotemporal ranges is crucial because it enables the identification of the appropriate scale of SBS and the development of a relocation strategy. Recently, as the scale of the shared bicycle project has expanded, the volume of microscopic spatiotemporal data has increased, and machine learning techniques that can effectively handle it are being frequently employed. In particular, the prediction performance of ensemble techniques such as Random Forest has been found to be very efficient(Pan et al., 2019). Deep learning techniques, such as Long Short-Term Memory (LSTM) or Back Propagation Neural Network (BPNN), which incorporate multiple layers, are also useful in predicting shared bicycle demand(Pan et al., 2019;Xu et al., 2018;Gao and Lee, 2019). Besides, support vector machines and Poisson regression analysis have also been employed(Sachdeva and Sarvanan, 2017).
2. Critique of the State-of-the-Art
Previous studies have incorporated precipitation into models to predict hourly rentals and returns for each rental station. Since the effect of precipitation on shared bicycle use varies according to characteristics such as rental stations, user types, or the way users perceive precipitation, it is essential to accurately reflect precipitation. Despite the importance of precipitation in predicting demand for shared bicycles, many studies have reflected the precipitation data they collected “as is”. There was no study that accurately predicted real-time shared bicycle demand based on precipitation. Moreover, there was no study that captured how shared bicycle users' perception regarding precipitation affects their usage decisions, therefore requiring further studies.
Reflecting precipitation is helpful for accurately predicting hourly rental and return for each rental station. It is also important to select methodologies that can account for exogenous factors, including precipitation, and prevent overfitting in training sets. In this study, we used Random Forest and LSTM ensemble method that combines four predictors. The Random Forest can prevent the overfitting problem using the bagging method. The LSTM also addresses this issue by combining four predictors, which include two LSTM layers and two fully connected layers. Additionally, these two methods take into account exogenous factors when predicting hourly rental and return for each rental station. Therefore, these two methods were employed to construct the predictive models in this study. The findings of this study can be utilized to operate the SBS systems more efficiently. Additionally, the cognitive characteristics of shared bicycle users regarding precipitation can be investigated through the results of this study, and provide insight into how shared bike users perceive precipitation when deciding to rent bicycles.
Ⅲ. Analysis Framework
1. Shared Bicycle Imbalance Problem and the Study Area
Seoul, the administrative capital of South Korea, first introduced shared bicycles known as “Ttareungyi” in 2015 to reduce traffic congestion resulting from overdependence on private vehicles for short trips and to increase accessibility to public transit stations. However, the bicycle imbalance problem has become a major issue for SBS operators. The problem is predominantly prevalent in five districts in Seoul: Jongno-gu, Jung-gu, Yongsan-gu, Seongbuk-gu, and Dongdaemun-gu. Some stations have an excess supply of bicycles, while others have few, despite high demand for shared bicycles. Furthermore, at the stations in these districts, the number of relocation requests by users was also the largest compared to the number of rentals. For this reason, the abovementioned districts were selected as the spatial study scope for analysis.
The five districts have an estimated population of 1.36 million people. The shared bicycle usage records in these districts, comprising approximately 2.3 million transactions, were obtained from the 263 shared bicycle rental stations within the spatial scope. The bicycle imbalance problem is most severe on weekdays. In particular, on weekdays, the imbalance between rental and return is most severe during commuting hours, from 8:00 a.m. to 10:00 a.m. and from 5:00 p.m. to 8:00 p.m. During this time window, there exist rental stations that have too many bicycles. In the case of Seongbuk-gu, there was a record of having 82 shared bicycles at a specific rental station during rush hours, while other stations had no available shared bicycles.
The SBS operators have employed several approaches to address this issue, including identifying stations with excess bicycles and relocating them to stations with fewer bicycles using a truck. However, the algorithms used for relocation do not adequately consider precipitation, resulting in inefficiencies in system operation during rainy days. Research shows that shared bicycle use tends to decrease when it rains(Sears et al., 2012;Corcoran et al., 2014;Gebhart and Noland, 2014;Hyland et al., 2018;Kim, 2018;Sun et al., 2018;Sathishkumar and Cho, 2020). In the case of “Ttareungi”, the average hourly rental per station is 0.26 when it rains, whereas it is 1.32 when it does not rain, which is five times less than when it rains. This phenomenon is more pronounced at rental stations near subway stations, but its variation over time is notable(Kim, 2018;Eren and Uz, 2020).
2. Data Description
Real shared bicycle rental and return data, spanning from January 1 to December 31, 2019, were provided from Seoul-si. It was necessary to determine an appropriate spatiotemporal analysis unit and build a dataset accordingly for precise analysis. It is reasonable to use rental stations as spatial units to predict the demand for shared bicycles. Meanwhile, long-term temporal units, such as days or weeks, show almost identical numbers of rentals and returns, making it difficult to clearly identify the influence of precipitation on the demand for shared bicycles. Conversely, when a too short-term temporal unit is used, such as minutes, it appears that rentals and returns of shared bicycles hardly occur at some stations. Therefore, it is necessary to use hours as a temporal unit. The hourly rental and return amounts for each rental station in each district were aggregated from the original rental records. Hourly temperature and precipitation data were collected from the Korea Meteorological Administration to incorporate weather factors in predicting shared bicycle demand. The hourly temperature and precipitation data in the district to which the rental station belongs were combined with hourly rental and return amount data for each rental station. Only the weather factors are considered to minimize the effect of interactions between other influential factors.
Descriptive statistics are analyzed to identify the characteristics of each influential factor, including temporal and weather factors. Generally, the number of rentals exceeds the number of returns, because cases of loss, breakdown, repair, or relocation are encoded as “rentals”. For each rental station, the average number of rentals per hour was 1.27, and the average number of returns per hour was 1.24. The temporal factors revealed that shared bicycle demand patterns are significantly different between weekdays and weekends. The average number of rentals per hour by a rental station on weekdays was 1.34, and the average number of returns per hour on weekends was 1.1. Rainfall and snowfall are significant factors contributing to reduced demand for shared bicycles. The average number of rainy days in Seoul in 2019 was 81.45, accounting for 22% of the entire year. The descriptive statistics of the dependent and independent variables are presented in <Table 1>.
<Table 1>
Variable Description
Category | Description | Mean | Std | Type |
---|---|---|---|---|
Dependent variable | Number of rentals per hour of rental station (bicycles/hour) | 1.27 | 2.12 | Continuous |
Number of returns per hour of return station (bicycles/hour) | 1.24 | 2.07 | Continuous | |
Independent variables | Spatial Factors | |||
District | - | - | Categorical | |
Station | - | - | Categorical | |
Temporal Factors | ||||
Month (m) | 6.53 | 3.45 | Integer | |
Day (d) | 15.72 | 8.80 | Integer | |
Time (h) | 12.50 | 6.92 | Integer | |
Weekday or Weekend | - | - | Categorical | |
Day of week | - | - | Categorical | |
Weather Factors | ||||
Temperature (℃) | 13.60 | 10.50 | Continuous | |
Precipitation (mm/h) | 0.11 | 0.91 | Continuous |
Ⅳ. Methodology
1. Workflow
The workflow of this study comprises five steps. First, the base dataset was created by aggregating the hourly rental records of shared bicycles by individual rental stations. Second, precipitation-reflecting alternatives, which reflect users' cognitive characteristics regarding precipitation, were established. These alternatives employed two methods of the Korea Meteorological Administration: one is the numerical expression, and the other is the categorical or ordinal expression. Third, for each alternative, the Random Forest and the LSTM Ensemble methods were applied to construct models that predict hourly rentals and returns for each rental station. The root mean squared error (RMSE) was calculated as a prediction performance measure for each alternative, behavior (rental / return), and model, and then averaged by alternative. Fourth, the alternative with the best prediction performance was identified and finalized as the most effective method for predicting shared bicycle demand based on precipitation information. Finally, this study concludes by determining the consistency of the best model to reflect precipitation in terms of user behaviour and usage of the shared bicycles.
2. Random Forest
Decision Tree is a widely employed machine learning model for prediction. However, using a single Decision Tree tends to overfit the training data, and this problem is effectively mitigated by combining multiple of them. Random Forest is an ensemble technique that trains multiple decision trees through a bagging process. Bagging is a method of aggregating the results output from individual bootstrapped samples(Breiman, 1996).
When the model is trained, some influential factors for the samples are randomly selected. This prevents overfitting and results in appropriate consideration of all influential factors(Breiman, 2001;Liaw and Wiener, 2002). Thus, Random Forest regression averages the performance measure values predicted from each Decision Tree. The Random Forest technique employed in this study to predict the shared bicycle demand of each rental station is illustrated in <Fig. 1>. Each rental station's hourly rentals and returns are considered output variables, while temporal and weather factors are considered input variables. Thus, this model predicts hourly rentals and returns for each rental station based on temporal factors, precipitation, and temperature at a point in time. K samples are generated through bootstrapping after dividing the total dataset into a training set and a test set. The Decision Tree is trained using randomly selected influential factors for each sample. Then, the future demand for shared bicycles is predicted through bagging.
3. Long Short-Term Memory
LSTM is a type of Recurrent Neural Network that memorizes information from both long-term and short-term contexts. The Recurrent Neural Network suffers from a vanishing gradient problem, whereby the slope at the initial time becomes very small as the time step increases(Hochreiter, 1991). LSTM overcomes this as it has a layer that includes a cell state, which stores information from previous time points and an input and output gate that handle new information(Hochreiter and Schmidhuber, 1997). The LSTM layer can release some information up to the previous time point through internal resources, including the “forget gate”(Gers et al., 2000). Therefore, the LSTM layer does not suffer from the vanishing gradient problem and achieves excellent performance, even for long-term data (Hochreiter and Schmidhuber, 1997).
The LSTM technique employed to predict hourly rentals and returns for each shared bicycle rental station is illustrated in <Fig. 2>. The LSTM model captures both the endogenous and exogenous factors influencing shared bicycle demand, as the hourly rentals and returns for each rental station are influenced by usage at the previous timestamp and are also affected by weather conditions. The hourly rentals and returns for each rental station at a specific timestamp were selected as the output variable. Temporal and weather factors are included as input variables. Also, this study designs LSTM layers in the form of a one-way and many-to-one sequence model. The model is trained using the training set to predict a point in the future based on 24-hour data from the past that includes the present.
It is necessary to build a robust model to determine the superiority of prediction performance among various precipitation-reflecting alternatives. Therefore, we proposed the architecture illustrated in <Fig. 3>. It has an ensemble structure that combines four predictors. Each predictor contains two LSTM layers and two fully connected layers. As each predictor is trained on the training set and produces an output, the future demand is predicted by averaging the prediction values across all predictors.
4. Experimental Setup
To find a best method of reflecting precipitation that improves the performance of the shared bicycle demand prediction model, it is necessary to set realistic precipitation-reflecting alternatives that consider how users perceive and react to the precipitation. There are two considerations when setting up alternatives. First, we need to reflect precipitation over the next several hours, as shared bicycle users make decisions to rent shared bicycles considering near-future precipitation. When users rent shared bicycles, it is essential to determine the time period, as they are affected not only by current precipitation but also by the potential for future precipitation. Second, it is necessary to determine how to define precipitation intensity and establish a threshold, as shared bicycle users are more affected by precipitation intensity than by the amount of precipitation. Thus, it depends on how the precipitation is ‘expressed’, like quantitatively with a numeric scale or qualitatively with a ordinal scale. To sum up, establishing alternatives with the proper time period to be referenced and the way of expressing precipitation is key for accurately predicting shared bicycle demand.
Most of Seoul’s shared bicycle users have been getting precipitation information through the news, the internet, and smartphones. Therefore, the decision to rent public bicycles is inevitably affected by the precipitation forecast method of the Korea Meteorological Administration. The Korea Meteorological Administration defines precipitation as the total amount of water in the form of rain, snow, and hail falling from the sky. Precipitation is typically aggregated by different units according to their specific purpose. First, daily precipitation (mm/d) is mainly used for recording. If the daily precipitation is 0.1 mm or more, it is recorded as a rainy day. Second, six-hour precipitation (mm/6h) is used for forecasting. A day is divided into four time windows: dawn (12 AM - 6 AM), morning (6 AM - 12 PM), afternoon (12 PM - 6 PM), and night (6 PM - 12 AM). Third, the hourly precipitation (mm/h) is typically used for recording and forecasting.
The Korea Meteorological Administration provides precipitation in both numerical and ordinal scales. The numerical scale refers to hourly precipitation (mm/h) and daily precipitation (mm/d). The ordinal scale defines precipitation intensity by classifying it into several ordinal categories. If the hourly precipitation is 1-3 mm, it is referred to as light rain. People can still be found outside during this condition because they do not mind getting their clothes wet. If the hourly precipitation is 3-15 mm, it is referred to as medium rain. The falling rain is discerned with the eyes. If the hourly precipitation is 15-30 mm, it is referred to as heavy rain. In this situation, an umbrella and a raincoat become useless. Lastly, if the hourly precipitation is 30 mm or more, it is referred to as intense rain that can be critical to people's lives.
<Table 2> and <Table 3> present the 15 precipitation-reflecting alternatives established in this study and the descriptive statistics of the precipitation variable for each alternative, respectively. Alternative 0 refers to a baseline scenario in which precipitation is not reflected in the shared bicycle demand prediction and was created to identify the superiority of precipitation-reflecting alternatives. Alternatives 1 and 2 reflect current timestamp’s precipitation (mm/h) in numerical and ordinal formats, respectively. Alternatives 3 and 4 reflect daily precipitation (mm/d) in numerical and ordinal formats, respectively. Alternative 5 reflects daily precipitation in a categorical format, with dummy variables encoded as either rainy or non-rainy days based on a threshold of 0.1 mm. Alternatives 6 and 7 reflect the six-hour precipitation from the current timestamp in numerical and ordinal formats, respectively. Alternatives 8 and 9 reflect two-hour precipitation from the current timestamp, which is the sum of precipitation at the current timestamp and the next hour, in numerical and ordinal formats, respectively. Alternatives 10 and 11 reflect the three-hour precipitation from the current timestamp, which is the sum of the precipitation at that specific timestamp and in the next two hours, in numerical and ordinal formats, respectively. Alternatives 8, 9, 10, and 11 investigate whether the possibility of future precipitation influences the rental decisions of shared bicycle users, in addition to the current status. Alternative 12 reflects the precipitation at the current timestamp and in the next hour, where the statuses of the current and next timestamps are treated as separate numerical variables. Alternative 13 reflects the precipitation at the current timestamp and in the next two hours, where the status of the current and next timestamps are treated as separate numerical variables. Alternatives 12 and 13 indicate whether the current precipitation amount and the possibility of future precipitation differently affect users’ decisions. Alternatives 14 and 15 reflect the sum of precipitation at the current timestamp and the previous hour in numerical and categorical formats, respectively. These investigate whether the possibility of past precipitation influences the rental decisions of shared bicycle users, in addition to the current status. For each alternative, a real-time shared bicycle demand prediction model is constructed using the Random Forest and LSTM techniques, and the average RMSE is then calculated to compare the prediction performances and identify the most effective way to incorporate precipitation information into the model.
<Table 2>
Analytical Alternatives Categorized by Precipitation Reflection
Alt. | Description of how precipitation was reflected | Expression | Note |
---|---|---|---|
0 | Not Considered | - | |
1 | Numerical-format One-hour Precipitation | Quantitative | |
2 | Ordinal-format One-hour Precipitation | Qualitative | |
3 | Numerical-format Daily Precipitation | Quantitative | |
4 | Ordinal-format Daily Precipitation | Qualitative | |
5 | Whether Daily Precipitation Exists (threshold as 0.1 mm/day) | Qualitative | Rainy / Not rainy |
6 | Numerical-format Next Six-hour Precipitation | Quantitative | |
7 | Ordinal-format Next Six-hour Precipitation | Qualitative | |
8 | Numerical-format Next Two-hour precipitation | Quantitative | |
9 | Ordinal-format Next Two-hour precipitation | Qualitative | |
10 | Numerical-format Next Three-hour precipitation | Quantitative | |
11 | Ordinal-format Next Three-hour precipitation | Qualitative | |
12 | Numerical-format Next One-hour Precipitation as Separative Variables | Quantitative | Current / Future |
13 | Numerical-format Next Three-hour Precipitation as Separative Variables | Quantitative | Current / Future |
14 | Numerical-format Previous Two-hour Precipitation | Quantitative | |
15 | Ordinal-format Previous Two-hour Precipitation | Qualitative | |
Note: The time intervals of 'Next' and 'Previous' contain current timestamps. |
<Table 3>
Descriptive Statistics of Precipitation Values
Note: For the Alternatives 12 and 13, the current and next timestamps' were denoted as 'Current' and 'Next', respectively.
Variable | Type | Mean | Std. | Proportion by intensity class (%) | ||||
---|---|---|---|---|---|---|---|---|
Notrainy | Light | Medium | Heavy | Intense | ||||
1 | Numerical | 0.105 | 0.921 | - | - | - | - | - |
2 | Ordinal | - | - | 95.39 | 3.80 | 0.72 | 0.06 | 0.01 |
3 | Numerical | 2.534 | 8.885 | - | - | - | - | - |
4 | Ordinal | - | - | 77.69 | 22.12 | 0.18 | 0 | 0 |
5 | Categorical | - | - | 77.69 | 22.3 | |||
6 | Numerical | 0.634 | 3.557 | - | - | - | - | - |
7 | Ordinal | - | - | 89.38 | 9.80 | 0.81 | 0 | 0 |
8 | Numerical | 0.211 | 1.353 | - | - | - | - | - |
9 | Ordinal | - | - | 92.27 | 6.88 | 0.81 | 0.02 | 0 |
10 | Numerical | 0.319 | 1.911 | - | - | - | - | - |
11 | Ordinal | - | - | 91.12 | 8.09 | 0.77 | 0.01 | 0 |
12(Current) | Numerical | 0.105 | 0.912 | - | - | - | - | - |
12(Next) | Numerical | 0.105 | 0.912 | - | - | - | - | - |
13(Current) | Numerical | 0.105 | 0.912 | - | - | - | - | - |
13(Next) | Numerical | 0.211 | 1.353 | - | - | - | - | - |
14 | Numerical | 0.211 | 1.579 | - | - | - | - | - |
15 | Ordinal | - | - | 92.27 | 6.90 | 0.79 | 0.02 | 0 |
V. Results and Discussions
The Random Forest and LSTM models for sixteen precipitation alternatives were constructed. The dataset contains information on district, rental stations, month, day, hour, day of the week, weekdays and weekends, the number of rentals and returns, temperature, and precipitation. Training and test sets were created by dividing the original dataset into two sets, with a ratio of 7:3. For each alternative, an optimal demand prediction model was constructed through hyperparameter tuning using the training set, and its performance was evaluated based on the mean RMSE of the test set. The RMSE of each precipitation-reflecting alternative was compared to that of the alternative that did not consider precipitation, and the percentage improvement was calculated.
The performance of Random Forest outputs for each alternative are presented in <Table 4>. The mean RMSEs range from 0 to 2, indicating low prediction errors. The RMSE of Alternative 8 is 1.436, showing the best predictive performance. This is 7.61% lower than that of Alternative 0, which does not consider precipitation in its modeling structure. This indicates that considering the precipitation of the current and next timestamps with a single variable is the most effective in predicting shared bicycle demand at the station level when Random Forest is employed.
<Table 4>
The Random Forest Analysis Results
Alt. | Type | RMSE | Mean RMSE | Rank | Improvement |
---|---|---|---|---|---|
0 | Rental | 1.637 | 1.554 | 16 | - |
Return | 1.470 | ||||
1 | Rental | 1.500 | 1.455 | 9 | 6.34% |
Return | 1.411 | ||||
2 | Rental | 1.500 | 1.454 | 8 | 6.45% |
Return | 1.408 | ||||
3 | Rental | 1.539 | 1.495 | 12 | 3.81% |
Return | 1.451 | ||||
4 | Rental | 1.588 | 1.522 | 15 | 2.07% |
Return | 1.456 | ||||
5 | Rental | 1.563 | 1.507 | 13 | 3.02% |
Return | 1.451 | ||||
6 | Rental | 1.494 | 1.436 | 2 | 7.60% |
Return | 1.378 | ||||
7 | Rental | 1.654 | 1.517 | 14 | 2.41% |
Return | 1.379 | ||||
8 | Rental | 1.489 | 1.436 | 1 | 7.61% |
Return | 1.383 | ||||
9 | Rental | 1.489 | 1.438 | 3 | 7.45% |
Return | 1.387 | ||||
10 | Rental | 1.494 | 1.441 | 5 | 7.27% |
Return | 1.389 | ||||
11 | Rental | 1.490 | 1.439 | 4 | 7.43% |
Return | 1.387 | ||||
12 | Rental | 1.521 | 1.444 | 6 | 7.08% |
Return | 1.368 | ||||
13 | Rental | 1.519 | 1.447 | 7 | 6.09% |
Return | 1.374 | ||||
14 | Rental | 1.534 | 1.477 | 10 | 4.94% |
Return | 1.421 | ||||
15 | Rental | 1.532 | 1.479 | 11 | 4.86% |
Return | 1.425 |
The performance of LSTM outputs for each alternative is presented in <Table 5>. The mean RMSEs range from 0 to 2, indicating low prediction errors. The RMSEs of the Alternative 8 and Alternative 10 are 1.569 and 1.570, respectively, showing almost identical and the best predictive performance. These are 3.56% and 3.50% lower than that of Alternative 0, respectively, which does not consider precipitation in the modeling structure. This indicates that considering the precipitation of the current and next timestamps with a single variable is the most effective in predicting shared bicycle demand at the station level when LSTM is employed.
<Table 5>
The LSTM Analysis Results
Alt. | Type | RMSE | Mean RMSE | Rank | Improvement |
---|---|---|---|---|---|
0 | Rent | 1.623 | 1.627 | 15 | 0% |
Return | 1.631 | ||||
1 | Rent | 1.658 | 1.608 | 11 | 1.16% |
Return | 1.558 | ||||
2 | Rent | 1.630 | 1.598 | 8 | 1.75% |
Return | 1.567 | ||||
3 | Rent | 1.636 | 1.610 | 12 | 1.01% |
Return | 1.585 | ||||
4 | Rent | 1.685 | 1.604 | 9 | 1.42% |
Return | 1.522 | ||||
5 | Rent | 1.633 | 1.588 | 6 | 2.38% |
Return | 1.543 | ||||
6 | Rent | 1.597 | 1.581 | 4 | 2.80% |
Return | 1.566 | ||||
7 | Rent | 1.676 | 1.620 | 14 | 0.42% |
Return | 1.564 | ||||
8 | Rent | 1.633 | 1.569 | 1 | 3.56% |
Return | 1.507 | ||||
9 | Rent | 1.642 | 1.585 | 5 | 2.59% |
Return | 1.527 | ||||
10 | Rent | 1.609 | 1.570 | 2 | 3.50% |
Return | 1.531 | ||||
11 | Rent | 1.609 | 1.574 | 3 | 3.26% |
Return | 1.539 | ||||
12 | Rent | 1.636 | 1.627 | 16 | -0.02% |
Return | 1.618 | ||||
13 | Rent | 1.646 | 1.596 | 7 | 1.89% |
Return | 1.546 | ||||
14 | Rent | 1.662 | 1.611 | 13 | 1.01% |
Return | 1.559 | ||||
15 | Rent | 1.641 | 1.604 | 10 | 1.42% |
Return | 1.567 |
To identify the best precipitation-reflecting alternative, the mean RMSEs of each alternative were averaged once again for the two models. The results are shown in <Table 6> and <Fig. 4>. Alternative 8 showed the highest predictive performance, with a mean RMSE of 1.503. This value is 5.51% lower than the mean RMSE of Alternative 0, which did not reflect precipitation. The predictive performance of the alternative that considers next two-hour precipitation in numerical format is the best. Alternatives that consider the next three-hour precipitation follow it, with generally great predictive performance.
<Table 6>
Mean RMSE by Models and Alternatives
Alt. | Mean RMSE | Rank | Improvement | ||
---|---|---|---|---|---|
Random Forest | LSTM | Overall | |||
0 | 1.554 | 1.627 | 1.591 | 16 | - |
1 | 1.455 | 1.608 | 1.532 | 8 | 3.71% |
2 | 1.454 | 1.598 | 1.526 | 7 | 4.06% |
3 | 1.495 | 1.610 | 1.553 | 13 | 2.39% |
4 | 1.522 | 1.604 | 1.563 | 14 | 1.73% |
5 | 1.507 | 1.588 | 1.548 | 12 | 2.70% |
6 | 1.436 | 1.581 | 1.509 | 4 | 5.16% |
7 | 1.517 | 1.620 | 1.568 | 15 | 1.38% |
8 | 1.436 | 1.569 | 1.503 | 1 | 5.53% |
9 | 1.438 | 1.585 | 1.511 | 5 | 4.97% |
10 | 1.411 | 1.570 | 1.506 | 2 | 6.29% |
11 | 1.439 | 1.574 | 1.506 | 3 | 5.28% |
12 | 1.444 | 1.627 | 1.536 | 9 | 3.46% |
13 | 1.447 | 1.596 | 1.521 | 6 | 4.34% |
14 | 1.477 | 1.611 | 1.544 | 11 | 2.92% |
15 | 1.479 | 1.604 | 1.541 | 10 | 3.08% |
Overall, the performance of alternatives that consider daily precipitation is the lowest, as Alternatives 3, 4, and 5 rank 13th, 14th, and 12th, respectively, among the sixteen alternatives. Moreover, the performance of alternatives that consider observed past precipitation is generally low, as Alternatives 14 and 15 rank 11th and 10th, respectively. These results indicate that the prediction model for hourly rentals and returns, which considers previous or long-term future precipitation, yields relatively low performance compared to one that considers short-term future precipitation. Consequently, it has been proven that considering the next two or three hours’ precipitation significantly increases the hourly shared bicycle demand on rainy days.
It is necessary to determine whether the best way to reflect precipitation is consistent and not contradictory with real-world situations, such as the psychology and behaviors of shared bicycle users. This is crucial because travel time, travel distance, and the selection of departure and arrival stations by users vary according to their purpose of shared bicycle use, and ultimately, the decisions are influenced by the amount of precipitation.
As shown in <Table 7>, which describes the characteristics of shared bicycle users in 2019, the average usage time and distance for one-day pass users were higher than those for regular pass users, whereas the rental ratio was the opposite. As shown in <Table 8>, which describes the trip purposes of shared bicycle users in 2019, regular pass users primarily used shared bicycles for commuting to workplaces and schools, whereas one-day pass users mainly used them for leisure and hobbies. It can be inferred that regular pass users primarily rent bicycles for short-distance travel to access nearby trunk transportation, while one-day pass users primarily rent bicycles for a relatively long time. Taken together, it can be inferred that regular pass users would be relatively less affected by the possibility of precipitation than one-day pass users, as most of them would ride bicycles in light rain. Considering the daily usage time limit for rented shared bicycles in Seoul is 2 hours, and the average usage time of one-day pass users is approximately 64 minutes, they would likely be affected by precipitation for the next 1 to 2 hours, especially.
<Table 7>
Daily Characteristics of Shared Bicycle Users in 2019
Type | Ratio of rental amount (%) | Mean of rental amount | Mean usage time (m) | Mean travel distance (km) | ||
---|---|---|---|---|---|---|
Regular pass | One-hour | Member | 54.23 | 1.49 | 20.24 | 4.38 |
Two-hour | Member | 24.98 | 1.48 | 43.2 | 7.36 | |
One-day pass | One-hour | Member | 13.72 | 1.22 | 34.35 | 5.94 |
Non-Member | 1.50 | 1.15 | 41.21 | 6.17 | ||
Two-hour | Member | 3.85 | 1.18 | 87.97 | 11.77 | |
Non-Member | 0.46 | 1.12 | 92.74 | 11.41 | ||
Etc. | 1.27 | 1.27 |
<Table 8>
The Rental Purposes of Shared Bicycle Users in 2019
Type | Ratio of rental according to the purpose of use (%) | |||||
---|---|---|---|---|---|---|
Exercise | Leisure | Commuting | School | Shopping | Etc. | |
Total | 17.0 | 26.8 | 36.3 | 7.8 | 5.0 | 5.1 |
Regular pass | 16.2 | 19.1 | 44.1 | 10.7 | 4.7 | 5.2 |
One-day pass | 19.2 | 47.2 | 15.6 | 7.3 | 5.8 | 4.9 |
Another interesting finding is shown in <Table 9>. During weekends, the proportion of one-day pass users of the entire shared bicycle rentals significantly increases compared to weekdays. This is related to the high number of people who use shared bicycles for leisure on weekends(O'brien et al., 2014), and demonstrates that the alternative reflecting next two-hour precipitation is the best for capturing cognitive characteristics of shared bicycle users regarding precipitation. The prediction structure of hourly rentals and returns based on two-hour precipitation is illustrated in <Fig. 5>.
<Table 9>
Rental of Regular Pass Users and One-Day Pass Users on Weekdays and Weekends in 2019
Type | Regular pass | One-day pass | ||
---|---|---|---|---|
Number of rentals | Ratio (%) | Number of rentals | Ratio (%) | |
Weekday | 11,676,685 | 77.37 | 2,347,968 | 58.95 |
Weekend | 3,415,182 | 22.63 | 1,634,959 | 41.05 |
Ⅵ. Conclusion
In this study, the most effective method for precipitation-based prediction of the hourly rentals and returns of shared bicycles by station was identified. The pattern of shared bicycle usage changes constantly when it rains. This has made it challenging to predict shared bicycle usage demand, resulting in bicycle imbalance problems and inefficiencies in long-term operation. For this reason, it is necessary to understand and predict shared bicycle rentals and returns at the station level with an appropriate precipitation-reflecting model. Based on the Korea Meteorological Administration's precipitation recording and forecasting methods, 15 precipitation-reflecting alternatives were established, representing the cognitive characteristics of shared bicycle users regarding precipitation by different means. For each alternative, Random Forest and LSTM techniques were applied to construct hourly rental and return prediction models for individual rental stations. The RMSEs of each alternative, behavior, and model were calculated and averaged by alternative. The alternative that reflected the sum of precipitation of the current and next hour output the lowest mean RMSE. This was 5.53% improved compared to the alternative that did not reflect precipitation. One-day pass users primarily use shared bicycles for leisure, hobbies, and exercise, with an average usage time of approximately 64 minutes, which is more than an hour. This indicates that they are greatly affected by the probability of precipitation after an hour. To summarize, one-day pass users are more influenced by precipitation than regular pass users, and primarily consider the precipitation for the next two hours, during which most of their activities are conducted.
Based on the findings, several important implications were derived regarding the bicycle imbalance problem at stations. First, SBS operators should develop a system that incorporates real-time predicted future precipitation into their relocation strategies by linking it to the Korea Meteorological Administration database. By considering precipitation in predicting the rentals and returns of shared bicycles for each rental station by hour, the accuracy of prediction can be improved, and the issue will be efficiently managed. Secondly, predicted demand at each station provides insights into how much people want to use shared bicycles on rainy days. This information could help SBS operators make better use of bicycles on rainy days at rental stations frequently visited by regular pass users. The analysis revealed that these rental stations are primarily used by individuals commuting to work or school. These stations could be made more attractive by providing a roof to keep the bikes dry and a towel to wipe off the water. Besides, bicycles at these stations could be made more user-friendly by attaching a device that can hold the user's umbrella while they ride. Thirdly, operators could identify stations that do not require relocation during unfavorable weather conditions, based on efficiently predicted shared bicycle demand considering precipitation. As there is no need to supply redundant bicycles at rental stations whose rentals decrease significantly when it rains, operators could save resources and manpower.
The contribution of this study is that it provides insight into how shared bicycle users perceive precipitation when making rental decisions. SBS operators can apply the analysis results to more accurately predict the hourly usage of shared bicycle rental stations during rainy conditions. This can be utilized to solve the bicycle imbalance problem that exists in many rental stations, effectively relocating bicycles to optimal locations. This would reduce the waste of manpower and operational costs of SBS, and improve the service level experienced by shared bicycle users. Furthermore, the framework of this study can be applied to analyze the cognitive characteristics of other public transportation users in relation to precipitation. This will enable more accurate predictions of public transport usage when it rains.
This study has a limitation in that it relies solely on historical precipitation data, rather than the predicted data from the Korea Meteorological Administration. Therefore, there can be a slight discrepancy between the recorded precipitation used in this study and the forecasted precipitation from the Korea Meteorological Administration. For example, it may not rain, despite the Korea Meteorological Administration's forecast that it would, and vice versa. In this case, the forecasted future precipitation and the recorded actual precipitation at a specific timestamp may have differed from each other. Therefore, the cognitive characteristics of users regarding precipitation would be more clearly derived if data on predicted future precipitation at each timestamp were obtained and utilized. However, it is very difficult to utilize the predicted future precipitation by individual timestamps because the prediction changes every minute as the timestamps approach. Lastly, more insights can be derived if temperature, which is another crucial weather factor for shared bicycle demand, is analyzed in conjunction with precipitation. Consequently, the framework of this study can be further developed to explore methods for incorporating precipitation into improving the performance of other public transportation and personal mobility demand prediction models with various factors.