Journal Search Engine

View PDF Download PDF Export Citation Korean Bibliography PMC Previewer
The Journal of The Korea Institute of Intelligent Transport Systems Vol.24 No.4 pp.1-20
DOI : https://doi.org/10.12815/kits.2025.24.4.1

Relationships between Shared Bicycle Demand and Precipitation

Eunryong Han*, Jung-yeol Hong**, Dongjoo Park***
*Dept. of Transportation Eng., Univ. of Seoul
**Dept. of Transportation Eng., Keimyung Univ.
***Dept. of Transportation Eng. & Urban Big Data Convergence, Univ. of Seoul
Corresponding author : Dongjoo Park, djpark@uos.ac.kr
18 July 2025 │ 28 July 2025 │ 13 August 2025

Abstract


Shared Bicycle Systems (SBS) are attracting attention as a sustainable transportation method that provides convenient mobility in many cities worldwide. However, operators face imbalances in bicycle inventory across various rental stations because they supply bicycles using uniform and simple methods without accurate demand forecasting. To address this issue, real-time bicycle rental and return demand at each station must be accurately estimated. However, predicting shared bicycle demand during rainy weather is challenging because the impact of precipitation on shared bicycle usage not only changes constantly but also primarily depends on users' decisions. Therefore, shared bicycle demand prediction models need to incorporate both precipitation information and individual users' decisions. This study proposes an optimal method to reflect the impact of precipitation, taking into account users' cognitive characteristics, in predicting the shared bicycle demand at stations. Random Forest and Long Short-Term Memory (LSTM) Ensemble methods were applied to build hourly shared bicycle rental and return prediction models for each precipitation reflection alternative. By comprehensively comparing the prediction accuracy of each alternative, the alternative with the best prediction performance was derived. The research results showed that prediction accuracy improved when next two-hour precipitation was reflected in the model. This indicates that shared bicycle users decide whether to rent based on the precipitation for the next two hours. These results are expected to help shared bicycle operators determine the appropriate number of bicycles to deploy at each station during rainy weather.



공유자전거 이용수요와 강수량과의 관계

한 은 룡*, 홍 정 열**, 박 동 주***
*주저자 : 서울시립대학교 교통공학과 석사
**공저자 : 계명대학교 교통공학과 조교수
***교신저자 : 서울시립대학교 교통공학과 & 도시빅데이터융합학과 교수

초록


공유자전거 (SBS)는 전세계 많은 도시에서 편리한 이동수단을 제공하며 지속가능한 교통수 단으로 주목받고 있다. 그러나 운영자가 정확한 수요예측 없이 일률적이고 단순한 방법으로 자전거를 공급하고 있기 때문에 여러 대여소에서 보유대수의 불균형 문제에 직면하고 있다. 이를 해결하기 위해서는 각 대여소의 실시간 자전거 대여 및 반납수요가 정확히 추정되어야 한다. 그러나 강수량이 공유자전거 이용에 미치는 영향은 시시각각 변할 뿐만 아니라 주로 이용 자의 판단에 의존하기 때문에, 우천시의 공유자전거 수요 예측은 어려운 문제이다. 따라서 공유 자전거 수요예측 모델에 강수량 정보와 개별 이용자의 판단이 모두 포함될 필요가 있다. 본 연구의 목적은 공유자전거 대여소 수요 예측에 이용자의 인지적 특성을 고려한 강수량의 영향 을 반영하는 최적의 방법을 찾는 것이다. Random Forest와 Long Short-Term Memory (LSTM) Ensemble을 적용하여 강수량 반영 대안별, 시간별 공유 자전거 대여 및 반납 예측 모델이 구축되 었다. 각 대안의 예측 정확도를 종합적으로 비교하여 예측 성능이 가장 좋은 대안이 도출되었다. 연구 결과, 모델에 2시간 예측강수량을 반영할 경우, 예측 정확도가 향상되는 것으로 나타났다. 이는 공유자전거 이용자가 향후 2시간 동안의 강수량을 토대로 임대 여부를 결정한다는 것을 의미한다. 이러한 결과는 공유자전거 운영자가 우천시 대여소별로 적정한 자전거 배치대수를 결정하는 것에 도움이 될 것으로 기대된다.



    Ⅰ. Introduction

    Bicycle sharing schemes attract attention as a sustainable alternative for personal mobility in many large cities worldwide, offering convenience and reducing travel time. Thus, cities worldwide are introducing shared bicycle services to promote a healthy society while solving traffic-related problems. Since its introduction, policymakers have gradually increased the supply of shared bicycles, and demand has increased in line with it. However, shared bicycle system (SBS) operators are facing considerable bicycle imbalance problems at rental stations due to inefficiencies in the approaches used to predict shared bicycle demand. Researchers have been investigating approaches for estimating real-time demand for each rental station to help relocate surplus bicycles from low-demand stations to those with higher demand using suitable techniques and considering environmental factors known to affect the use of shared bicycles.

    The literature has identified that precipitation is a key factor to consider when estimating shared bicycle demand. However, it is challenging to predict the shared bicycle demand in microscopic spatiotemporal ranges on a rainy day because precipitation, which significantly affects demand for shared bicycle use and is perceived differently by each user, fluctuates over time. Especially, the influence of precipitation on the use of shared bicycles varies depending on how users perceive it. For instance, shared bicycle users exhibit characteristics of changing their usage behavior over time, and the frequency of shared bicycle demand fluctuates depending on weather factors(Sears et al., 2012). Other studies have also shown that the precipitation negatively correlates with the shared bicycle demand(Corcoran et al., 2014;Gebhart and Noland, 2014). Therefore, incorporating precipitation information and the perceptions of individual users into prediction models could help improve prediction accuracy. This can help to reduce the imbalance experienced at rental stations and lessen the inefficiencies in the long-term operation of the SBS.

    Several studies have reflected the importance of precipitation in shared bicycle demand prediction models. Some of them categorized the days into rainy days, cloudy days, and snowy days, based on the amount of precipitation(Gebhart and Noland, 2014;Saneinejad et al., 2012;Chen et al., 2016;Hulot et al., 2018). However, this approach is inefficient as it does not provide an appropriate threshold for distinguishing between different categories. Other studies have predicted shared bicycle demand based on hourly precipitation(Kim, 2018;Sathishkumar and Cho, 2020;Gao and Lee, 2019) and daily precipitation information(Corcoran et al., 2014;Hyland et al., 2018;Sun et al., 2018). It was identified that various methods exist for incorporating precipitation to predict shared bicycle demand. Despite the importance of accurately predicting shared bicycle demand and the significance of precipitation information in that process, there has been no study on how to consider precipitation elaborately in modeling shared bicycle demand. Because the influence of precipitation on shared bicycle demand depends on how users perceive precipitation(Gebhart and Noland, 2014), it would be worthwhile to incorporate time-based and user-specific cognitive characteristics regarding precipitation into shared bicycle demand prediction models.

    The purpose of this research is to determine the best approach for incorporating precipitation into improving the accuracy of hourly shared bicycle rental and return predictions by station, thereby establishing a time-based and precipitation-reflecting alternative that considers users’ cognitive characteristics related to precipitation. We construct hourly shared bicycle rental and return prediction models for each alternative using the Random Forest and Long Short-Term Memory (LSTM) Ensemble methods. The best alternative for predicting the demand for shared bicycles based on the comprehensive performance comparison is derived and defined as the optimal way to reflect precipitation. This study contributes to addressing bicycle imbalance problems across rental stations by capturing users' perceptions of precipitation, as shown in shared bicycle demand prediction models. In particular, this study enables a more accurate prediction of hourly rental and return demand for each rental station on rainy days. This could help SBS operators determine the proper number of bicycles needed at any rental station during rainy days.

    Ⅱ. Literature Review

    1. Summary

    This section reviews research on predicting the demand for shared bicycles. The necessity of selecting an appropriate analysis unit was emphasized in studies focused on predicting the demand for shared bicycle use. Most previous studies analyzing hourly rental and return demand for shared bicycles have suggested that it is more accurate to predict demand for each rental station than to predict demand by dividing the spatial area into clusters(Rudloff and Lackner, 2014;Lin et al., 2018). Since the shared bicycle demand by rental stations exhibits temporal and spatial correlation, they discovered that two rental stations not on the same road section can have similar usage demand patterns. Additionally, if rental stations are clustered solely by spatial factors, those with different usage patterns can be grouped together, which reduces the reliability of the prediction model. Some research has shown that the rental and return of shared bicycles are highly dependent on spatial and temporal effects(Faghih-Imani and Eluru, 2016). Therefore, we analyze the hourly rental and return for each rental station.

    Shared bicycle use has been found to be influenced by the spatial characteristics of rental stations, and some studies have identified spatial factors that affect shared bicycle demand. Studies that explore crucial spatial factors regarding shared bicycle demands have been actively conducted. In particular, it has been identified that the shared bicycle demand was affected by socioeconomic factors, including population and employee densities(El-Assi et al., 2017). Additionally, employment, population, bars, restaurants, and distance to a central location were significant in explaining the demand for shared bicycles(Guidon et al., 2020).

    In several studies on predicting shared bicycle demand, weather factors were selected as major influencing factors, in addition to spatial characteristics. Previous studies have attempted to predict the real-time demand for shared bicycles by reflecting weather factors such as temperature and precipitation. From these studies, it was found that a negative correlation exists between precipitation and demand for shared bicycles(Eren and Uz, 2020;Saneinejad et al., 2012;Sathishkumar and Cho, 2020). The tendency is stronger at rental stations near subway stations(Gebhart and Noland, 2014). This is because the influence of precipitation on shared bicycle use varies depending on the type of shared bicycle user. Additionally, the influence on precipitation varies according to the location of the shared bicycle rental station (Kim, 2018). Interestingly, it was found that weather factors had different influences during the various time periods within a day. This finding suggests that it is necessary to consider the spatiotemporal characteristics of rental stations by taking into account weather factors. In this study, we considered only temperature and precipitation in the prediction model to exclude the interaction effects between precipitation and other influencing factors, such as public bicycle use.

    Shared bicycle travel behavior varies significantly depending on the purpose of use(Zhang et al., 2016). For example, tourists with a one-day pass tend to use bicycles to travel between attraction spots, whereas subscribers tend to use them for commuting between work, school, and home. Users for leisure travel long distances over a long time, while commuters travel relatively short distances. In other words, it is necessary to consider the travel purposes of shared bicycle users when investigating the effects of weather factors or spatial characteristics on the decision to rent shared bicycles.

    Numerous prediction methods for shared bicycle demand were proposed. Accurately predicting demand in microscopic spatiotemporal ranges is crucial because it enables the identification of the appropriate scale of SBS and the development of a relocation strategy. Recently, as the scale of the shared bicycle project has expanded, the volume of microscopic spatiotemporal data has increased, and machine learning techniques that can effectively handle it are being frequently employed. In particular, the prediction performance of ensemble techniques such as Random Forest has been found to be very efficient(Pan et al., 2019). Deep learning techniques, such as Long Short-Term Memory (LSTM) or Back Propagation Neural Network (BPNN), which incorporate multiple layers, are also useful in predicting shared bicycle demand(Pan et al., 2019;Xu et al., 2018;Gao and Lee, 2019). Besides, support vector machines and Poisson regression analysis have also been employed(Sachdeva and Sarvanan, 2017).

    2. Critique of the State-of-the-Art

    Previous studies have incorporated precipitation into models to predict hourly rentals and returns for each rental station. Since the effect of precipitation on shared bicycle use varies according to characteristics such as rental stations, user types, or the way users perceive precipitation, it is essential to accurately reflect precipitation. Despite the importance of precipitation in predicting demand for shared bicycles, many studies have reflected the precipitation data they collected “as is”. There was no study that accurately predicted real-time shared bicycle demand based on precipitation. Moreover, there was no study that captured how shared bicycle users' perception regarding precipitation affects their usage decisions, therefore requiring further studies.

    Reflecting precipitation is helpful for accurately predicting hourly rental and return for each rental station. It is also important to select methodologies that can account for exogenous factors, including precipitation, and prevent overfitting in training sets. In this study, we used Random Forest and LSTM ensemble method that combines four predictors. The Random Forest can prevent the overfitting problem using the bagging method. The LSTM also addresses this issue by combining four predictors, which include two LSTM layers and two fully connected layers. Additionally, these two methods take into account exogenous factors when predicting hourly rental and return for each rental station. Therefore, these two methods were employed to construct the predictive models in this study. The findings of this study can be utilized to operate the SBS systems more efficiently. Additionally, the cognitive characteristics of shared bicycle users regarding precipitation can be investigated through the results of this study, and provide insight into how shared bike users perceive precipitation when deciding to rent bicycles.

    Ⅲ. Analysis Framework

    1. Shared Bicycle Imbalance Problem and the Study Area

    Seoul, the administrative capital of South Korea, first introduced shared bicycles known as “Ttareungyi” in 2015 to reduce traffic congestion resulting from overdependence on private vehicles for short trips and to increase accessibility to public transit stations. However, the bicycle imbalance problem has become a major issue for SBS operators. The problem is predominantly prevalent in five districts in Seoul: Jongno-gu, Jung-gu, Yongsan-gu, Seongbuk-gu, and Dongdaemun-gu. Some stations have an excess supply of bicycles, while others have few, despite high demand for shared bicycles. Furthermore, at the stations in these districts, the number of relocation requests by users was also the largest compared to the number of rentals. For this reason, the abovementioned districts were selected as the spatial study scope for analysis.

    The five districts have an estimated population of 1.36 million people. The shared bicycle usage records in these districts, comprising approximately 2.3 million transactions, were obtained from the 263 shared bicycle rental stations within the spatial scope. The bicycle imbalance problem is most severe on weekdays. In particular, on weekdays, the imbalance between rental and return is most severe during commuting hours, from 8:00 a.m. to 10:00 a.m. and from 5:00 p.m. to 8:00 p.m. During this time window, there exist rental stations that have too many bicycles. In the case of Seongbuk-gu, there was a record of having 82 shared bicycles at a specific rental station during rush hours, while other stations had no available shared bicycles.

    The SBS operators have employed several approaches to address this issue, including identifying stations with excess bicycles and relocating them to stations with fewer bicycles using a truck. However, the algorithms used for relocation do not adequately consider precipitation, resulting in inefficiencies in system operation during rainy days. Research shows that shared bicycle use tends to decrease when it rains(Sears et al., 2012;Corcoran et al., 2014;Gebhart and Noland, 2014;Hyland et al., 2018;Kim, 2018;Sun et al., 2018;Sathishkumar and Cho, 2020). In the case of “Ttareungi”, the average hourly rental per station is 0.26 when it rains, whereas it is 1.32 when it does not rain, which is five times less than when it rains. This phenomenon is more pronounced at rental stations near subway stations, but its variation over time is notable(Kim, 2018;Eren and Uz, 2020).

    2. Data Description

    Real shared bicycle rental and return data, spanning from January 1 to December 31, 2019, were provided from Seoul-si. It was necessary to determine an appropriate spatiotemporal analysis unit and build a dataset accordingly for precise analysis. It is reasonable to use rental stations as spatial units to predict the demand for shared bicycles. Meanwhile, long-term temporal units, such as days or weeks, show almost identical numbers of rentals and returns, making it difficult to clearly identify the influence of precipitation on the demand for shared bicycles. Conversely, when a too short-term temporal unit is used, such as minutes, it appears that rentals and returns of shared bicycles hardly occur at some stations. Therefore, it is necessary to use hours as a temporal unit. The hourly rental and return amounts for each rental station in each district were aggregated from the original rental records. Hourly temperature and precipitation data were collected from the Korea Meteorological Administration to incorporate weather factors in predicting shared bicycle demand. The hourly temperature and precipitation data in the district to which the rental station belongs were combined with hourly rental and return amount data for each rental station. Only the weather factors are considered to minimize the effect of interactions between other influential factors.

    Descriptive statistics are analyzed to identify the characteristics of each influential factor, including temporal and weather factors. Generally, the number of rentals exceeds the number of returns, because cases of loss, breakdown, repair, or relocation are encoded as “rentals”. For each rental station, the average number of rentals per hour was 1.27, and the average number of returns per hour was 1.24. The temporal factors revealed that shared bicycle demand patterns are significantly different between weekdays and weekends. The average number of rentals per hour by a rental station on weekdays was 1.34, and the average number of returns per hour on weekends was 1.1. Rainfall and snowfall are significant factors contributing to reduced demand for shared bicycles. The average number of rainy days in Seoul in 2019 was 81.45, accounting for 22% of the entire year. The descriptive statistics of the dependent and independent variables are presented in <Table 1>.

    <Table 1>

    Variable Description

    Category Description Mean Std Type
    Dependent variable Number of rentals per hour of rental station (bicycles/hour) 1.27 2.12 Continuous
    Number of returns per hour of return station (bicycles/hour) 1.24 2.07 Continuous
    Independent variables Spatial Factors
    District - - Categorical
    Station - - Categorical
    Temporal Factors
    Month (m) 6.53 3.45 Integer
    Day (d) 15.72 8.80 Integer
    Time (h) 12.50 6.92 Integer
    Weekday or Weekend - - Categorical
    Day of week - - Categorical
    Weather Factors
    Temperature (℃) 13.60 10.50 Continuous
    Precipitation (mm/h) 0.11 0.91 Continuous

    Ⅳ. Methodology

    1. Workflow

    The workflow of this study comprises five steps. First, the base dataset was created by aggregating the hourly rental records of shared bicycles by individual rental stations. Second, precipitation-reflecting alternatives, which reflect users' cognitive characteristics regarding precipitation, were established. These alternatives employed two methods of the Korea Meteorological Administration: one is the numerical expression, and the other is the categorical or ordinal expression. Third, for each alternative, the Random Forest and the LSTM Ensemble methods were applied to construct models that predict hourly rentals and returns for each rental station. The root mean squared error (RMSE) was calculated as a prediction performance measure for each alternative, behavior (rental / return), and model, and then averaged by alternative. Fourth, the alternative with the best prediction performance was identified and finalized as the most effective method for predicting shared bicycle demand based on precipitation information. Finally, this study concludes by determining the consistency of the best model to reflect precipitation in terms of user behaviour and usage of the shared bicycles.

    2. Random Forest

    Decision Tree is a widely employed machine learning model for prediction. However, using a single Decision Tree tends to overfit the training data, and this problem is effectively mitigated by combining multiple of them. Random Forest is an ensemble technique that trains multiple decision trees through a bagging process. Bagging is a method of aggregating the results output from individual bootstrapped samples(Breiman, 1996).

    When the model is trained, some influential factors for the samples are randomly selected. This prevents overfitting and results in appropriate consideration of all influential factors(Breiman, 2001;Liaw and Wiener, 2002). Thus, Random Forest regression averages the performance measure values predicted from each Decision Tree. The Random Forest technique employed in this study to predict the shared bicycle demand of each rental station is illustrated in <Fig. 1>. Each rental station's hourly rentals and returns are considered output variables, while temporal and weather factors are considered input variables. Thus, this model predicts hourly rentals and returns for each rental station based on temporal factors, precipitation, and temperature at a point in time. K samples are generated through bootstrapping after dividing the total dataset into a training set and a test set. The Decision Tree is trained using randomly selected influential factors for each sample. Then, the future demand for shared bicycles is predicted through bagging.

    KITS-24-4-1_F1.jpg
    <Fig. 1>

    Random Forest Model for the Shared Bicycle Demand Prediction

    3. Long Short-Term Memory

    LSTM is a type of Recurrent Neural Network that memorizes information from both long-term and short-term contexts. The Recurrent Neural Network suffers from a vanishing gradient problem, whereby the slope at the initial time becomes very small as the time step increases(Hochreiter, 1991). LSTM overcomes this as it has a layer that includes a cell state, which stores information from previous time points and an input and output gate that handle new information(Hochreiter and Schmidhuber, 1997). The LSTM layer can release some information up to the previous time point through internal resources, including the “forget gate”(Gers et al., 2000). Therefore, the LSTM layer does not suffer from the vanishing gradient problem and achieves excellent performance, even for long-term data (Hochreiter and Schmidhuber, 1997).

    The LSTM technique employed to predict hourly rentals and returns for each shared bicycle rental station is illustrated in <Fig. 2>. The LSTM model captures both the endogenous and exogenous factors influencing shared bicycle demand, as the hourly rentals and returns for each rental station are influenced by usage at the previous timestamp and are also affected by weather conditions. The hourly rentals and returns for each rental station at a specific timestamp were selected as the output variable. Temporal and weather factors are included as input variables. Also, this study designs LSTM layers in the form of a one-way and many-to-one sequence model. The model is trained using the training set to predict a point in the future based on 24-hour data from the past that includes the present.

    KITS-24-4-1_F2.jpg
    <Fig. 2>

    LSTM Model for the Shared Bicycle Demand Prediction

    It is necessary to build a robust model to determine the superiority of prediction performance among various precipitation-reflecting alternatives. Therefore, we proposed the architecture illustrated in <Fig. 3>. It has an ensemble structure that combines four predictors. Each predictor contains two LSTM layers and two fully connected layers. As each predictor is trained on the training set and produces an output, the future demand is predicted by averaging the prediction values across all predictors.

    KITS-24-4-1_F3.jpg
    <Fig. 3>

    Architecture of LSTM Ensemble

    4. Experimental Setup

    To find a best method of reflecting precipitation that improves the performance of the shared bicycle demand prediction model, it is necessary to set realistic precipitation-reflecting alternatives that consider how users perceive and react to the precipitation. There are two considerations when setting up alternatives. First, we need to reflect precipitation over the next several hours, as shared bicycle users make decisions to rent shared bicycles considering near-future precipitation. When users rent shared bicycles, it is essential to determine the time period, as they are affected not only by current precipitation but also by the potential for future precipitation. Second, it is necessary to determine how to define precipitation intensity and establish a threshold, as shared bicycle users are more affected by precipitation intensity than by the amount of precipitation. Thus, it depends on how the precipitation is ‘expressed’, like quantitatively with a numeric scale or qualitatively with a ordinal scale. To sum up, establishing alternatives with the proper time period to be referenced and the way of expressing precipitation is key for accurately predicting shared bicycle demand.

    Most of Seoul’s shared bicycle users have been getting precipitation information through the news, the internet, and smartphones. Therefore, the decision to rent public bicycles is inevitably affected by the precipitation forecast method of the Korea Meteorological Administration. The Korea Meteorological Administration defines precipitation as the total amount of water in the form of rain, snow, and hail falling from the sky. Precipitation is typically aggregated by different units according to their specific purpose. First, daily precipitation (mm/d) is mainly used for recording. If the daily precipitation is 0.1 mm or more, it is recorded as a rainy day. Second, six-hour precipitation (mm/6h) is used for forecasting. A day is divided into four time windows: dawn (12 AM - 6 AM), morning (6 AM - 12 PM), afternoon (12 PM - 6 PM), and night (6 PM - 12 AM). Third, the hourly precipitation (mm/h) is typically used for recording and forecasting.

    The Korea Meteorological Administration provides precipitation in both numerical and ordinal scales. The numerical scale refers to hourly precipitation (mm/h) and daily precipitation (mm/d). The ordinal scale defines precipitation intensity by classifying it into several ordinal categories. If the hourly precipitation is 1-3 mm, it is referred to as light rain. People can still be found outside during this condition because they do not mind getting their clothes wet. If the hourly precipitation is 3-15 mm, it is referred to as medium rain. The falling rain is discerned with the eyes. If the hourly precipitation is 15-30 mm, it is referred to as heavy rain. In this situation, an umbrella and a raincoat become useless. Lastly, if the hourly precipitation is 30 mm or more, it is referred to as intense rain that can be critical to people's lives.

    <Table 2> and <Table 3> present the 15 precipitation-reflecting alternatives established in this study and the descriptive statistics of the precipitation variable for each alternative, respectively. Alternative 0 refers to a baseline scenario in which precipitation is not reflected in the shared bicycle demand prediction and was created to identify the superiority of precipitation-reflecting alternatives. Alternatives 1 and 2 reflect current timestamp’s precipitation (mm/h) in numerical and ordinal formats, respectively. Alternatives 3 and 4 reflect daily precipitation (mm/d) in numerical and ordinal formats, respectively. Alternative 5 reflects daily precipitation in a categorical format, with dummy variables encoded as either rainy or non-rainy days based on a threshold of 0.1 mm. Alternatives 6 and 7 reflect the six-hour precipitation from the current timestamp in numerical and ordinal formats, respectively. Alternatives 8 and 9 reflect two-hour precipitation from the current timestamp, which is the sum of precipitation at the current timestamp and the next hour, in numerical and ordinal formats, respectively. Alternatives 10 and 11 reflect the three-hour precipitation from the current timestamp, which is the sum of the precipitation at that specific timestamp and in the next two hours, in numerical and ordinal formats, respectively. Alternatives 8, 9, 10, and 11 investigate whether the possibility of future precipitation influences the rental decisions of shared bicycle users, in addition to the current status. Alternative 12 reflects the precipitation at the current timestamp and in the next hour, where the statuses of the current and next timestamps are treated as separate numerical variables. Alternative 13 reflects the precipitation at the current timestamp and in the next two hours, where the status of the current and next timestamps are treated as separate numerical variables. Alternatives 12 and 13 indicate whether the current precipitation amount and the possibility of future precipitation differently affect users’ decisions. Alternatives 14 and 15 reflect the sum of precipitation at the current timestamp and the previous hour in numerical and categorical formats, respectively. These investigate whether the possibility of past precipitation influences the rental decisions of shared bicycle users, in addition to the current status. For each alternative, a real-time shared bicycle demand prediction model is constructed using the Random Forest and LSTM techniques, and the average RMSE is then calculated to compare the prediction performances and identify the most effective way to incorporate precipitation information into the model.

    <Table 2>

    Analytical Alternatives Categorized by Precipitation Reflection

    Alt. Description of how precipitation was reflected Expression Note
    0 Not Considered -
    1 Numerical-format One-hour Precipitation Quantitative
    2 Ordinal-format One-hour Precipitation Qualitative
    3 Numerical-format Daily Precipitation Quantitative
    4 Ordinal-format Daily Precipitation Qualitative
    5 Whether Daily Precipitation Exists (threshold as 0.1 mm/day) Qualitative Rainy / Not rainy
    6 Numerical-format Next Six-hour Precipitation Quantitative
    7 Ordinal-format Next Six-hour Precipitation Qualitative
    8 Numerical-format Next Two-hour precipitation Quantitative
    9 Ordinal-format Next Two-hour precipitation Qualitative
    10 Numerical-format Next Three-hour precipitation Quantitative
    11 Ordinal-format Next Three-hour precipitation Qualitative
    12 Numerical-format Next One-hour Precipitation as Separative Variables Quantitative Current / Future
    13 Numerical-format Next Three-hour Precipitation as Separative Variables Quantitative Current / Future
    14 Numerical-format Previous Two-hour Precipitation Quantitative
    15 Ordinal-format Previous Two-hour Precipitation Qualitative
    Note: The time intervals of 'Next' and 'Previous' contain current timestamps.
    <Table 3>

    Descriptive Statistics of Precipitation Values

    Note: For the Alternatives 12 and 13, the current and next timestamps' were denoted as 'Current' and 'Next', respectively.

    Variable Type Mean Std. Proportion by intensity class (%)
    Notrainy Light Medium Heavy Intense
    1 Numerical 0.105 0.921 - - - - -
    2 Ordinal - - 95.39 3.80 0.72 0.06 0.01
    3 Numerical 2.534 8.885 - - - - -
    4 Ordinal - - 77.69 22.12 0.18 0 0
    5 Categorical - - 77.69 22.3
    6 Numerical 0.634 3.557 - - - - -
    7 Ordinal - - 89.38 9.80 0.81 0 0
    8 Numerical 0.211 1.353 - - - - -
    9 Ordinal - - 92.27 6.88 0.81 0.02 0
    10 Numerical 0.319 1.911 - - - - -
    11 Ordinal - - 91.12 8.09 0.77 0.01 0
    12(Current) Numerical 0.105 0.912 - - - - -
    12(Next) Numerical 0.105 0.912 - - - - -
    13(Current) Numerical 0.105 0.912 - - - - -
    13(Next) Numerical 0.211 1.353 - - - - -
    14 Numerical 0.211 1.579 - - - - -
    15 Ordinal - - 92.27 6.90 0.79 0.02 0

    V. Results and Discussions

    The Random Forest and LSTM models for sixteen precipitation alternatives were constructed. The dataset contains information on district, rental stations, month, day, hour, day of the week, weekdays and weekends, the number of rentals and returns, temperature, and precipitation. Training and test sets were created by dividing the original dataset into two sets, with a ratio of 7:3. For each alternative, an optimal demand prediction model was constructed through hyperparameter tuning using the training set, and its performance was evaluated based on the mean RMSE of the test set. The RMSE of each precipitation-reflecting alternative was compared to that of the alternative that did not consider precipitation, and the percentage improvement was calculated.

    The performance of Random Forest outputs for each alternative are presented in <Table 4>. The mean RMSEs range from 0 to 2, indicating low prediction errors. The RMSE of Alternative 8 is 1.436, showing the best predictive performance. This is 7.61% lower than that of Alternative 0, which does not consider precipitation in its modeling structure. This indicates that considering the precipitation of the current and next timestamps with a single variable is the most effective in predicting shared bicycle demand at the station level when Random Forest is employed.

    <Table 4>

    The Random Forest Analysis Results

    Alt. Type RMSE Mean RMSE Rank Improvement
    0 Rental 1.637 1.554 16 -
    Return 1.470
    1 Rental 1.500 1.455 9 6.34%
    Return 1.411
    2 Rental 1.500 1.454 8 6.45%
    Return 1.408
    3 Rental 1.539 1.495 12 3.81%
    Return 1.451
    4 Rental 1.588 1.522 15 2.07%
    Return 1.456
    5 Rental 1.563 1.507 13 3.02%
    Return 1.451
    6 Rental 1.494 1.436 2 7.60%
    Return 1.378
    7 Rental 1.654 1.517 14 2.41%
    Return 1.379
    8 Rental 1.489 1.436 1 7.61%
    Return 1.383
    9 Rental 1.489 1.438 3 7.45%
    Return 1.387
    10 Rental 1.494 1.441 5 7.27%
    Return 1.389
    11 Rental 1.490 1.439 4 7.43%
    Return 1.387
    12 Rental 1.521 1.444 6 7.08%
    Return 1.368
    13 Rental 1.519 1.447 7 6.09%
    Return 1.374
    14 Rental 1.534 1.477 10 4.94%
    Return 1.421
    15 Rental 1.532 1.479 11 4.86%
    Return 1.425

    The performance of LSTM outputs for each alternative is presented in <Table 5>. The mean RMSEs range from 0 to 2, indicating low prediction errors. The RMSEs of the Alternative 8 and Alternative 10 are 1.569 and 1.570, respectively, showing almost identical and the best predictive performance. These are 3.56% and 3.50% lower than that of Alternative 0, respectively, which does not consider precipitation in the modeling structure. This indicates that considering the precipitation of the current and next timestamps with a single variable is the most effective in predicting shared bicycle demand at the station level when LSTM is employed.

    <Table 5>

    The LSTM Analysis Results

    Alt. Type RMSE Mean RMSE Rank Improvement
    0 Rent 1.623 1.627 15 0%
    Return 1.631
    1 Rent 1.658 1.608 11 1.16%
    Return 1.558
    2 Rent 1.630 1.598 8 1.75%
    Return 1.567
    3 Rent 1.636 1.610 12 1.01%
    Return 1.585
    4 Rent 1.685 1.604 9 1.42%
    Return 1.522
    5 Rent 1.633 1.588 6 2.38%
    Return 1.543
    6 Rent 1.597 1.581 4 2.80%
    Return 1.566
    7 Rent 1.676 1.620 14 0.42%
    Return 1.564
    8 Rent 1.633 1.569 1 3.56%
    Return 1.507
    9 Rent 1.642 1.585 5 2.59%
    Return 1.527
    10 Rent 1.609 1.570 2 3.50%
    Return 1.531
    11 Rent 1.609 1.574 3 3.26%
    Return 1.539
    12 Rent 1.636 1.627 16 -0.02%
    Return 1.618
    13 Rent 1.646 1.596 7 1.89%
    Return 1.546
    14 Rent 1.662 1.611 13 1.01%
    Return 1.559
    15 Rent 1.641 1.604 10 1.42%
    Return 1.567

    To identify the best precipitation-reflecting alternative, the mean RMSEs of each alternative were averaged once again for the two models. The results are shown in <Table 6> and <Fig. 4>. Alternative 8 showed the highest predictive performance, with a mean RMSE of 1.503. This value is 5.51% lower than the mean RMSE of Alternative 0, which did not reflect precipitation. The predictive performance of the alternative that considers next two-hour precipitation in numerical format is the best. Alternatives that consider the next three-hour precipitation follow it, with generally great predictive performance.

    <Table 6>

    Mean RMSE by Models and Alternatives

    Alt. Mean RMSE Rank Improvement
    Random Forest LSTM Overall
    0 1.554 1.627 1.591 16 -
    1 1.455 1.608 1.532 8 3.71%
    2 1.454 1.598 1.526 7 4.06%
    3 1.495 1.610 1.553 13 2.39%
    4 1.522 1.604 1.563 14 1.73%
    5 1.507 1.588 1.548 12 2.70%
    6 1.436 1.581 1.509 4 5.16%
    7 1.517 1.620 1.568 15 1.38%
    8 1.436 1.569 1.503 1 5.53%
    9 1.438 1.585 1.511 5 4.97%
    10 1.411 1.570 1.506 2 6.29%
    11 1.439 1.574 1.506 3 5.28%
    12 1.444 1.627 1.536 9 3.46%
    13 1.447 1.596 1.521 6 4.34%
    14 1.477 1.611 1.544 11 2.92%
    15 1.479 1.604 1.541 10 3.08%
    KITS-24-4-1_F4.jpg
    <Fig. 4>

    Final Results

    Overall, the performance of alternatives that consider daily precipitation is the lowest, as Alternatives 3, 4, and 5 rank 13th, 14th, and 12th, respectively, among the sixteen alternatives. Moreover, the performance of alternatives that consider observed past precipitation is generally low, as Alternatives 14 and 15 rank 11th and 10th, respectively. These results indicate that the prediction model for hourly rentals and returns, which considers previous or long-term future precipitation, yields relatively low performance compared to one that considers short-term future precipitation. Consequently, it has been proven that considering the next two or three hours’ precipitation significantly increases the hourly shared bicycle demand on rainy days.

    It is necessary to determine whether the best way to reflect precipitation is consistent and not contradictory with real-world situations, such as the psychology and behaviors of shared bicycle users. This is crucial because travel time, travel distance, and the selection of departure and arrival stations by users vary according to their purpose of shared bicycle use, and ultimately, the decisions are influenced by the amount of precipitation.

    As shown in <Table 7>, which describes the characteristics of shared bicycle users in 2019, the average usage time and distance for one-day pass users were higher than those for regular pass users, whereas the rental ratio was the opposite. As shown in <Table 8>, which describes the trip purposes of shared bicycle users in 2019, regular pass users primarily used shared bicycles for commuting to workplaces and schools, whereas one-day pass users mainly used them for leisure and hobbies. It can be inferred that regular pass users primarily rent bicycles for short-distance travel to access nearby trunk transportation, while one-day pass users primarily rent bicycles for a relatively long time. Taken together, it can be inferred that regular pass users would be relatively less affected by the possibility of precipitation than one-day pass users, as most of them would ride bicycles in light rain. Considering the daily usage time limit for rented shared bicycles in Seoul is 2 hours, and the average usage time of one-day pass users is approximately 64 minutes, they would likely be affected by precipitation for the next 1 to 2 hours, especially.

    <Table 7>

    Daily Characteristics of Shared Bicycle Users in 2019

    Type Ratio of rental amount (%) Mean of rental amount Mean usage time (m) Mean travel distance (km)
    Regular pass One-hour Member 54.23 1.49 20.24 4.38
    Two-hour Member 24.98 1.48 43.2 7.36
    One-day pass One-hour Member 13.72 1.22 34.35 5.94
    Non-Member 1.50 1.15 41.21 6.17
    Two-hour Member 3.85 1.18 87.97 11.77
    Non-Member 0.46 1.12 92.74 11.41
    Etc. 1.27 1.27
    <Table 8>

    The Rental Purposes of Shared Bicycle Users in 2019

    Type Ratio of rental according to the purpose of use (%)
    Exercise Leisure Commuting School Shopping Etc.
    Total 17.0 26.8 36.3 7.8 5.0 5.1
    Regular pass 16.2 19.1 44.1 10.7 4.7 5.2
    One-day pass 19.2 47.2 15.6 7.3 5.8 4.9

    Another interesting finding is shown in <Table 9>. During weekends, the proportion of one-day pass users of the entire shared bicycle rentals significantly increases compared to weekdays. This is related to the high number of people who use shared bicycles for leisure on weekends(O'brien et al., 2014), and demonstrates that the alternative reflecting next two-hour precipitation is the best for capturing cognitive characteristics of shared bicycle users regarding precipitation. The prediction structure of hourly rentals and returns based on two-hour precipitation is illustrated in <Fig. 5>.

    <Table 9>

    Rental of Regular Pass Users and One-Day Pass Users on Weekdays and Weekends in 2019

    Type Regular pass One-day pass
    Number of rentals Ratio (%) Number of rentals Ratio (%)
    Weekday 11,676,685 77.37 2,347,968 58.95
    Weekend 3,415,182 22.63 1,634,959 41.05
    KITS-24-4-1_F5.jpg
    <Fig. 5>

    Learning Structure Reflecting Two-Hour Precipitation of the Model

    Ⅵ. Conclusion

    In this study, the most effective method for precipitation-based prediction of the hourly rentals and returns of shared bicycles by station was identified. The pattern of shared bicycle usage changes constantly when it rains. This has made it challenging to predict shared bicycle usage demand, resulting in bicycle imbalance problems and inefficiencies in long-term operation. For this reason, it is necessary to understand and predict shared bicycle rentals and returns at the station level with an appropriate precipitation-reflecting model. Based on the Korea Meteorological Administration's precipitation recording and forecasting methods, 15 precipitation-reflecting alternatives were established, representing the cognitive characteristics of shared bicycle users regarding precipitation by different means. For each alternative, Random Forest and LSTM techniques were applied to construct hourly rental and return prediction models for individual rental stations. The RMSEs of each alternative, behavior, and model were calculated and averaged by alternative. The alternative that reflected the sum of precipitation of the current and next hour output the lowest mean RMSE. This was 5.53% improved compared to the alternative that did not reflect precipitation. One-day pass users primarily use shared bicycles for leisure, hobbies, and exercise, with an average usage time of approximately 64 minutes, which is more than an hour. This indicates that they are greatly affected by the probability of precipitation after an hour. To summarize, one-day pass users are more influenced by precipitation than regular pass users, and primarily consider the precipitation for the next two hours, during which most of their activities are conducted.

    Based on the findings, several important implications were derived regarding the bicycle imbalance problem at stations. First, SBS operators should develop a system that incorporates real-time predicted future precipitation into their relocation strategies by linking it to the Korea Meteorological Administration database. By considering precipitation in predicting the rentals and returns of shared bicycles for each rental station by hour, the accuracy of prediction can be improved, and the issue will be efficiently managed. Secondly, predicted demand at each station provides insights into how much people want to use shared bicycles on rainy days. This information could help SBS operators make better use of bicycles on rainy days at rental stations frequently visited by regular pass users. The analysis revealed that these rental stations are primarily used by individuals commuting to work or school. These stations could be made more attractive by providing a roof to keep the bikes dry and a towel to wipe off the water. Besides, bicycles at these stations could be made more user-friendly by attaching a device that can hold the user's umbrella while they ride. Thirdly, operators could identify stations that do not require relocation during unfavorable weather conditions, based on efficiently predicted shared bicycle demand considering precipitation. As there is no need to supply redundant bicycles at rental stations whose rentals decrease significantly when it rains, operators could save resources and manpower.

    The contribution of this study is that it provides insight into how shared bicycle users perceive precipitation when making rental decisions. SBS operators can apply the analysis results to more accurately predict the hourly usage of shared bicycle rental stations during rainy conditions. This can be utilized to solve the bicycle imbalance problem that exists in many rental stations, effectively relocating bicycles to optimal locations. This would reduce the waste of manpower and operational costs of SBS, and improve the service level experienced by shared bicycle users. Furthermore, the framework of this study can be applied to analyze the cognitive characteristics of other public transportation users in relation to precipitation. This will enable more accurate predictions of public transport usage when it rains.

    This study has a limitation in that it relies solely on historical precipitation data, rather than the predicted data from the Korea Meteorological Administration. Therefore, there can be a slight discrepancy between the recorded precipitation used in this study and the forecasted precipitation from the Korea Meteorological Administration. For example, it may not rain, despite the Korea Meteorological Administration's forecast that it would, and vice versa. In this case, the forecasted future precipitation and the recorded actual precipitation at a specific timestamp may have differed from each other. Therefore, the cognitive characteristics of users regarding precipitation would be more clearly derived if data on predicted future precipitation at each timestamp were obtained and utilized. However, it is very difficult to utilize the predicted future precipitation by individual timestamps because the prediction changes every minute as the timestamps approach. Lastly, more insights can be derived if temperature, which is another crucial weather factor for shared bicycle demand, is analyzed in conjunction with precipitation. Consequently, the framework of this study can be further developed to explore methods for incorporating precipitation into improving the performance of other public transportation and personal mobility demand prediction models with various factors.

    ACKNOWLEDGEMENTS

    This study was funded by the 2023 research grant for sabbatical faculty from the University of Seoul.

    Figure

    KITS-24-4-1_F1.jpg

    Random Forest Model for the Shared Bicycle Demand Prediction

    KITS-24-4-1_F2.jpg

    LSTM Model for the Shared Bicycle Demand Prediction

    KITS-24-4-1_F3.jpg

    Architecture of LSTM Ensemble

    KITS-24-4-1_F4.jpg

    Final Results

    KITS-24-4-1_F5.jpg

    Learning Structure Reflecting Two-Hour Precipitation of the Model

    Table

    Variable Description

    Analytical Alternatives Categorized by Precipitation Reflection

    Descriptive Statistics of Precipitation Values

    Note: For the Alternatives 12 and 13, the current and next timestamps' were denoted as 'Current' and 'Next', respectively.

    The Random Forest Analysis Results

    The LSTM Analysis Results

    Mean RMSE by Models and Alternatives

    Daily Characteristics of Shared Bicycle Users in 2019

    The Rental Purposes of Shared Bicycle Users in 2019

    Rental of Regular Pass Users and One-Day Pass Users on Weekdays and Weekends in 2019

    Reference

    1. Breiman, L. ( 1996), “Bagging predictors”, Machine Learning, vol. 24, no. 2, pp.123-140.
    2. Breiman, L. ( 2001), “Random forests”, Machine learning, vol. 45, no. 1, pp.5-32.
    3. Chen, L., Zhang, D., Wang, L., Yang, D., Ma, X., Li, S., Wu, Z., Pan, G., Nguyen, T. M. T. and Jakubowicz, J. ( 2016), “Dynamic cluster-based over-demand prediction in bike sharing systems”, Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing(UbiComp '16), pp.841-852.
    4. Corcoran, J., Li, T., Rohde, D., Charles-Edwards, E. and Mateo-Babiano, D. ( 2014), “Spatio-temporal patterns of a Public Bicycle Sharing Program: The effect of weather and calendar events”, Journal of Transport Geography, vol. 41, pp.292-305.
    5. El-Assi, W., Mahmoud, M. S. and Habib, K. N. ( 2017), “Effects of built environment and weather on bike sharing demand: A station level analysis of commercial bike sharing in Toronto”, Transportation, vol. 44, no. 3, pp.589-613.
    6. Eren, E. and Uz, V. E. ( 2020), “A review on bike-sharing: The factors affecting bike-sharing demand”, Sustainable Cities and Society, vol. 54, p.101882.
    7. Faghih-Imani, A. and Eluru, N. ( 2016), “Incorporating the impact of spatio-temporal interactions on bicycle sharing system demand: A case study of New York CitiBike system”, Journal of Transport Geography, vol. 54, pp.218-227.
    8. Gao, X. and Lee, G. M. ( 2019), “Moment-based rental prediction for bicycle-sharing transportation systems using a hybrid genetic algorithm and machine learning”, Computers & Industrial Engineering, vol. 128, pp.60-69.
    9. Gebhart, K. and Noland, R. B. ( 2014), “The impact of weather conditions on bikeshare trips in Washington, DC”, Transportation, vol. 41, no. 6, pp.1205-1225.
    10. Gers, F. A., Schmidhuber, J. and Cummins, F. ( 2000), “Learning to forget: Continual prediction with LSTM”, Neural Computation, vol. 12, no. 10, pp.2451-2471.
    11. Guidon, S., Reck, D. J. and Axhausen, K. ( 2020), “Expanding a(n) (electric) bicycle-sharing system to a new city: Prediction of demand with spatial regression and random forests”, Journal of Transport Geography, vol. 84, p.102692.
    12. Hochreiter, S. and Schmidhuber, J. ( 1996), “LSTM can solve hard long time lag problems”, Advances in Neural Information Processing Systems, pp.473-479.
    13. Hochreiter, S. and Schmidhuber, J. ( 1997), “Long short-term memory”, Neural Computation, vol. 9, no. 8, pp.1735-1780.
    14. Hochreiter, S. ( 1991), Untersuchungen zu dynamischen neuronalen Netzen, Diploma, Technische Universität München.
    15. Hulot, P., Aloise, D. and Jena, S. D. ( 2018), “Towards station-level demand prediction for effective rebalancing in bike-sharing systems”, In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.378-386.
    16. Hyland, M., Hong, Z., De Farias Pinto, H. K. R. and Chen, Y. ( 2018), “Hybrid cluster-regression approach to model bikeshare station usage”, Transportation Research Part A: Policy and Practice, vol. 115, pp.71-89.
    17. Kim, K. ( 2018), “Investigation on the effects of weather and calendar events on bike-sharing according to the trip patterns of bike rentals of stations”, Journal of Transport Geography, vol. 66, pp.309-320.
    18. Liaw, A. and Wiener, M. ( 2002), “Classification and regression by random Forest”, R News, vol. 2, no. 3, pp.18-22.
    19. Lin, L., He, Z. and Peeta, S. ( 2018), “Predicting station-level hourly demand in a large-scale bike-sharing network: A graph convolutional neural network approach”, Transportation Research Part C: Emerging Technologies, vol. 97, pp.258-276.
    20. O'brien, O., Cheshire, J. and Batty, M. ( 2014), “Mining bicycle sharing data for generating insights into sustainable transport systems”, Journal of Transport Geography, vol. 34, pp.262-273.
    21. Pan, Y., Zheng, R. C., Zhang, J., Yao, X. ( 2019), “Predicting bike sharing demand using recurrent neural networks”, Procedia Computer Science, vol. 147, pp.562-566.
    22. Patil, A., Musale, K. and Rao, B. P. ( 2015), “Bike share demand prediction using RandomForests”, International Journal of Innovative Science, Engineering & Technology, vol. 2, no. 4, pp.1218-1223.
    23. Rudloff, C. and Lackner, B. ( 2014), “Modeling demand for bikesharing systems: Neighboring stations as source for demand and reason for structural breaks”, Transportation Research Record, vol. 2430, no. 1, pp.1-11.
    24. Sachdeva, P. and Sarvanan, K. N. ( 2017), “Prediction of Bike Sharing Demand”, Oriental Journal of Computer Science and Technology, vol. 10, no. 1, pp.219-226.
    25. Saneinejad, S., Roorda, M. J. and Kennedy, C. ( 2012), “Modelling the impact of weather conditions on active transportation travel behaviour”, Transportation Research Part D: Transport and Environment, vol. 17, no. 2, pp.129-137.
    26. Sathishkumar, V. E. and Cho, Y. ( 2020), “A rule-based model for Seoul Bike sharing demand prediction using weather data”, European Journal of Remote Sensing, vol. 53, no. sup 1, pp.166-183.
    27. Sears, J., Flynn, B. S., Aultman-Hall, L. and Dana, G. S. ( 2012), “To bike or not to bike: Seasonal factors for bicycle commuting”, Transportation Research Record, vol. 2314, no. 1, pp.105-111.
    28. Sun, F., Chen, P. and Jiao, J. ( 2018), “Promoting public bike-sharing: A lesson from the unsuccessful Pronto system”, Transportation Research Part D: Transport and Environment, vol. 63, pp.533-547.
    29. Xu, X., Ye, Z., Li, J. and Xu, M. ( 2018), “Understanding the Usage Patterns of Bicycle-Sharing Systems to Predict Users' Demand: A Case Study in Wenzhou, China”, Computational Intelligence and Neuroscience, vol. 2018, no. 3, pp.1-21.
    30. Zhang, J., Pan, X., Li, M. and Philip, S. Y. ( 2016), “Bicycle-sharing system analysis and trip prediction”, 2016 17th IEEE International Conference on Mobile Data Management(MDM), vol. 1, pp.174-179.

    저자소개

    Footnote