 Research
 Open Access
 Published:
A deep reinforcement learning model for resilient road network recovery under earthquake or flooding hazards
Journal of Infrastructure Preservation and Resilience volume 4, Article number: 8 (2023)
Abstract
As the backbone and the ‘blood vessel’ of modern cities, road networks provide critical support for community activities and economic growth, with their roles even more crucial due to the dramatic progress in urbanization. The service of road networks is subjected to the increasing frequency of highconsequence natural hazards such as earthquakes, floods, hurricanes, etc. Identifying resilient restoration sequences is essential to mitigate the disruption of such important infrastructure networks. This paper investigates a novel decisionsupport model to optimize postdisaster road network repair sequence. The model, named as GCNDRL model, integrates the advantages of deep reinforced learning (DRL) with graph convolutional neural network (GCN), two emerging artificial intelligence (AI) techniques to achieve efficient recovery of road network service. The model is applied to analyze two cases of community road networks in the US that are subjected to different types of hazards, i.e., earthquakes and flooding. The performance of repair sequence by the GCNDRL model is compared with two commonly used methods, i.e., repair sequence by the genetic algorithm and by prioritization based on graph importance with betweenness centrality. The results showed the decision sequence by GCNDRL model consistently achieved superior performance in road network restoration than the conventional methods. The AIbased decision model also features high computational efficiency since the GCNDRL model can be trained before the hazard. With a pretrained GCNDRL model, a close to optimal decisionmaking process can be made available rapidly for different types of new hazards, which is advantageous in efficiently responding to hazards when they happen. This study demonstrates the promise of a new AIbased decision support model to improve the resilience of road networks by enabling efficient posthazards recovery.
Introduction
Traffic networks are critical infrastructure systems for community activities and economic growth. However, they are vulnerable to natural hazards, such as earthquakes, floods, storm surges, etc. Rapid restoration of the traffic network is crucial for posthazard recovery of community life since the traffic network’s performance directly influences the restorations of other community activities. Given a large number of damaged roads and limited recovery resources after a hazard, determining a fast and efficient repairing sequence for traffic network service restoration is crucial yet challenging for decision makers.
The term ‘resilience’ is widely used in engineering to evaluate the system’s ability to withstand interruptions and recover. Rose [1] and Zhang et al. [2] gave a detailed definition of the resilience of traffic networks. Based on the definition, Fig. 1 is a common illustration that depicts a system performance timedependent curve before and after a hazard happens [3,4,5,6]. Before the hazard happens, the system experiences routine deterioration and maintenance, which largely determines its ability to withstand the hazards. Once the hazard happens, a significant performance drop can be observed. Sequential recovery is conducted after the hazard, labeled the ‘recovery phase’ in Fig. 1. Previous studies have widely used this performance trajectory to quantify system resilience. With the timedependent system performance trajectory, the area under the trajectory is a widely acceptable resilience index for the measurement of the system’s resilience, which can be defined as the accumulated system performance during the recovery phase (Eq. 1). Following Rose [1] and Zhant et al.’s [2] model, there have been many other developments in resilience frameworks. For example, another resilience framework was proposed by Renschler et al. (PEOPLES resistance framework), which considers more dimensions of community resilience by taking two integrals over time and space [7]. Recently, Sharma, Tabandeh, and Gardoni proposed multiple mathematical formulations that borrow concepts from probability theory, which can be used for various lifecycle trajectories with different trends and time spans [8]. A resilience quantification for interdependent infrastructure was also introduced [9]. A more detailed review of resilience was discussed by Koliou et al. [10].
where RI is the resilience index, \({t}_{0}\) is the recovery start time, \({t}_{r}\) is the recovery finish time. \(p\left(t\right)\) is the system performance which is dependent on time t.
Based on the research about road network performance and resilience quantification [11], finding the optimal decisions in road network emergency management is critical for achieving an efficient recovery process, which can generate a resilient recovery process. Existing approaches can be mainly categorized into three categories, i.e., the components rankingbased methods, the mathuristic optimization methods, and machine learningbased methods. For the components rankingbased method, the recovery sequence is determined based on the importance value assigned to each failure component. For example, Tang et al. [12] repaired the traffic monitoring sensors using ‘betweenness centrality’ to rank importance. In another example, Aydin et al. [13] used proximate resources, road hierarchy, and time required to rank each road segment. Although components rankingbased methods feature high computing efficiency, these methods are not resilienceoriented and cannot consider multiple types of information at the same time. Multiple factors such as locations of hospitals, shelters, or schools should be considered in the resilience framework to promote social justice and equity. Hence, a compotent rankingbased method cannot be used when multiple factors are considered in the system performance evaluation. On the other hand, the matheuristics optimization methods overcome this limitation by using global optimization algorithms such as the Genetic algorithm (GA) or MonteCarlo simulation methods. For example, Zhang et al. successfully used the Genetic Algorithm (GA) method for roadbridge network recovery after a seismic hazard [14]. Mixed Integer Programming is another method that has been widely used. Sharma, Tabandeh, and Gardoni proposed a multiscale optimization approach that can consider multiinfrastructure interdependence [9]. Although these methods can consider the influence of multiple factors on the system resilience as long as the equations are well designed, need massive samplings and significant computational time. Besides, another challenge is that these methods need to solve the problem with specific and known damage situations, which is impossible to be obtained before the hazard happens and cannot utilize the posthazard real world data. These two limitations made global optimization algorithms not suit for making fastresponding decisions in post hazards.
In recent years, machine learningbased decisionmaking techniques are emerging. For example, Zou and Chen used a deep ensemble assisted active learning approach to schedule the transportation network recovery with the consideration of multiclass users’ travel behavior [15]. Nozhati used a dynamic programming method to find the nearoptimal solution [16]. The deep reinforcement learning algorithm is seen as the most promising method. Although many studies have demonstrated its ability in tackling the optimization problems with high dimensional decision space and state space [17,18,19,20], few studies have used it for the emergency management, such as the decisions in the recovery process [21]. Additionally, most studies utilized the deep reinforcement learning with a given situation, the computation time is significantly long due to the large computation complexity. There is an urgent need for shorting the computation time to achieve a fast and smooth decision in the disaster management [22].
To overcome the tradeoff between the performance and computing efficiency of the abovementioned road recovery method, deep reinforcement learning (DRL), is utilized in this study. However, directly conducting DRL with Artificial Neural Network on traffic network is challenging due to its special graph structure. Hence, a graph convolutional neural network (GCN) based DRL method is proposed to determine the optimal restoration sequence of traffic networks, which is named GCNDRL model. The benefits of the proposed decisionmaking framework include: 1)it can be customized with multiple factors such as the location of emergency stations, road damage levels, and different repair time; 2) it utilizes the road network graph structure in the computing process, which does not require manually network embedding ; 3) it is a stepwise decisionmaking method so the realworld damage situation can be used as the new input into the framework even the provided repairing sequence is not strictly followed; and 4) it can provide a pretrained model, which can be used for a fast response after a new hazard happens. The organization of this study is summarized below. In Section 2, the system performance metric for road network is illustrated. It is noted that the method used in this study for road system performance is based on the previous studies but can easily incorporate a customized road system performance model. Section 3 describes the novel GCNDRL decision support model, the framework for model training, and the decisionmaking process based on the GCNDRL model. Sections 4 and 5 illustrated the applications of the proposed methods in two case studies of road networks respectively. Finally, Section 6 discussed the factors affecting the proposed model and summarized the major conclusions.
System performance metric
As illustrated in Fig. 1 and Eq. (1), any resilienceinformed decisionmaking requires the quantitive measurement metrics of system performance. The system performance metric proposed by Zhang and Wang [23] (p) is used in this study. However, it is noted that any timedependent performance measurement metric can be considered in the proposed framework. The metric developed by Zhang and Wang [23] is briefly described in this section. The system performance of the road network is quantified by the weighted summary of intersections’ average number of reliable independent pathways, i.e., the weight of each intersection and the average number of reliable independent paths through that intersection. The weight of each intersection is determined by its location. The average number of reliable independent pathways is determined by the independent paths, traffic flows, and road reliability of each road segment (R). The road reliability can be used to indicate the road’s damaged condition. It should be noted there are two main differences between the applied metric and the original reference. Firstly, a 1 km threshold is used to compute the weight of the intersection. Secondly, the traffic volume of each road is ignored due to the lack of posthazard traffic data support. The traffic flow is an important component in the resilience quantifying process [15, 24, 25]. However, because of the simplicity required of this study, as well as the lack of posthazard traffic data support, this parameter is ignored. The proposed framework is still applicable considering that the influence of traffic flow can be represented by the value of the average number of reliable independent pathways of each node.
The weight of each intersection is determined by its distance to the nearest emergency response facilities (Eq. 2). The original criteria is modified in this study to avoid a too large weight value when the shortest distance is much smaller than 1 (i.e., Eq. (3)).
where
\({w}_{i}\) is the intersection’s weight, \({{\varvec{D}}}_{{\varvec{i}}}\) is the distance set of intersection i to the predefined emergency respond facilities; \({\Omega }_{i}\) is the reciprocal of the distance between node i and its nearest emergency response facility. When the distance is less than 1 kilometer or the intersection itself is an emergency response facility, \({\Omega }_{i}\) equals 1.
At any given time t, the average number of reliable independent pathways of the intersection i is determined by Eq. 4.
where \({R}_{k}\left(i,j\right)\) is the reliability of the k^{th} independent path; \({v}_{k}\left(i,j\right)\) is the weight of k^{th} independent path.
The intersection’s average number of reliable independent pathways is determined based on independent pathways’ reliability and weight between any origindestination (OD) pairs. Mathematically, for any independent pathway between intersection i and j, its independent pathways’ reliability \({R}_{k}\left(i,j\right)\) can be determined by Eq. 5.
The weight of k^{th} independent path through the intersection can be determined by Eq. 6:
where \({R}_{k}\left(i,j\right)\) is the reliability of the k^{th} independent path. l is the road segment that belongs to the independent path and \({R}_{l}\) is its reliability after hazard. \({v}_{k}\left(i,j\right)\) is the weight of k^{th} independent path. \({K}_{\left(i,j\right)}\) is the number of all independent paths between node i and j. \({L}_{\mathrm{max}\left(i,j\right)}\) is the maximum length and \({L}_{{p}_{k\left(i,j\right)}}\) is the k^{th} length.
With the weight of each intersection and the average number of reliable independent paths through each intersection determined, the simplified system performance metric is derived from Eq. 7
where \(p\left(t\right)\) is the performance of the roadnetwork at time t, \({r}_{i}\) is the average number of reliable independent pathways; \({w}_{i}\) is the important weight of each node i;
GCNDRL decisionmaking framework for resilience road network restoration
The proposed decisionmaking framework as shown in Fig. 2 contains three main components, the proposed GCNDRL model, the training process, and the decisionmaking process. The proposed GCNDRL model contains a combined Graph Convolutional Neural network (GCN) and Artificial Neural Network (ANN). It is used to embed the current state of road network and output the ranking of available decisions. The training process is based on the conventional DRL training framework, which is a trialanderror process. The parameters inside the GCNDRL model are trained during the training process. The decisionmaking process is the decision process that is used to determine the repairing sequence. A detailed description of the architecture of the GCNDRL model, the training process, and the decisionmaking process will be explained in the following sections.
GCNDRL model architecture
The DRL is an advanced ML technique that integrates features of reinforcement learning (RL) and deep learning. The former is used to characterize a method that solves learning problems based on trialanderror search [26], while the DRL allows the agent to make decisions from unstructured large input data without manual intervention with the help of deep learning. The objective of DRL is to train a ‘deep Q function’ that can estimate the reward of each action. Then the decisions can be simply made by selecting the action with the largest reward. Moreover, it is commonly known that for a global optimization problem, local optimization often leads to a suboptimal result due to the future influence is not considered. To overcome the limitations, the reward value of each action by DRL simultaneously considers the instant reward and the future reward. Mathematically, the future reward value of each action at a specified state can be determined by Eq. 8 [27].
where \({Q}^{*}\left(s,a\right)\) is the reward value of action a when the traffic network state is s; \({\mathbb{E}}[\bullet ]\) denotes the mathematical expectation; R denotes the instant reward, i.e. the system performance instant improvement after taking action a; \({{arg \, max} \ Q^* {(s^{\prime},a^{\prime})}}\) denotes the optimal future reward where \(s^{\prime}\) is the state of the traffic network in the next step and \(a^{\prime}\) is the corresponding optimal action; \(\gamma\) is the return discount factor where 1 denotes only considering future reward and 0 denotes only considering the instant reward of the action.
Conventional DRL often uses the artificial neural network (ANN) as its ‘Deep Q function’ to estimate the reward value of each action under a given system state. Then the action with the highest reward is selected to achieve the globally optimal result (detailed training process refer to section 3.2). Figure 3 illustrates the considered traffic network state in this study, which is represented as a graph structure by \(G=(V, E)\). The graph structure is built based on the traffic network structure and the attribute of each node is represented by its average independent pathways (Eq. 4). Considering the influence of restoring a road segment that will be parsed along the edges rather than spread in Euclid space, a machine learning algorithm that can consider similar patter is more preferred. Hence the graph convolutional neural network (GCN) is applied as the key component of the proposed GCNDRL model.
The graph neural network is a state of art neural network that can directly operate on graph structure data. It has been applied in multiple domains and achieved promising results, such as in traffic networks, graph knowledge, and recommendation systems [28]. Previous studies mainly used it for graph classification and prediction tasks as reviewed by Zhou et al. [28]. As inspired by the conventional Convolutional Neural Network (CNN), the GCN convolutes the node features along the connected edges instead of within a Euclid space. Hence, the convoluting process is very similar to the influence parsing process of repairing a road segment, both of which are transferring along the edge and have impacts on the network nodes.
The detailed architecture of the proposed GCNDRL model is illustrated in Fig. 2 and also described here. The created GCNDRL model consists of two blocks (blue areas in Fig. 2), i.e., the GCN block and artificial neural network block. The GCN block transforms the node attribute from 1 dimension (number of independent pathways) to 128 dimensions, which means the output of the GCN block is a graph structure whose nodes have 128 dimensions. Two layers of GCN with sufficient neurons are used to ensure the node and network structure information can be sufficiently extracted during the convolution process. This process is similar to the normal CNN convolutional process except only the data of neighbors are convolved [28]. The graph convolution process is mathematically expressed in Eq. 9.
where \({H}^{l}\) is the \({l}^{th}\) layer of GCN neural network, when \(l=0\), \({H}^{0}=X\). X is the feature matrix of the graph whose dimension is \(N\times D\), N is the number of nodes, D is the features of each node. \(\widetilde{A}=A+I\), A is the representative description of the graph structure, an adjacency matrix is used in this study. I is the identity matrix of A.\(\widetilde{D}\) is the diagonal node degree matrix of \(\widetilde{A}\), \(\sigma \left(\bullet \right)\) denotes the activation function. Relu is used in this study. \({W}^{l}\) is the weight matrix of the \({l}^{th}\) layer.
To project the traffic network state into action space, the convolved values of all nodes are averaged and fed into an ANN model with two layers. The last layer (output layer) contains the same number of neurons to the action space. Hence the final output values correspond to the reward values of each action. Although not shown in the flow chart, a ReLu activation function is used between each layer and before the output layer to enhance the nonlinear ability of the neural network. The output layer from the ANN is the total reward (including instant and longterm rewards) corresponding to each action, from which the optimal action that leads to the highest reward can be selected.
GCNDRL model training process
The parameters (weights and bias) of the neurons in the GCNDRL model is initialized with random value before the training process. The commonly used training framework of DRL is adopted to train the GCNDRL model. The unique feature of the GCNDRL model is that the ‘deep Q function’ is based on the proposed GCNANN model as described in section 3.1. The proposed GCNDRL framework can be purposely trained as an AI agent to pick actions that give the highest reward. When the reward is set to correspond to system resilience, this will lead to a sequence of decisions that lead to fast system recovery under a hazard situation. The detailed process to apply the GCNDRL model to the posthazard recovery of a traffic network is described as follows.

(1)
The traffic network is initialized with a random damage scenario. Two parameters are also predefined in this stage, i.e., the reward discount factor \(\gamma\) in Eq. 8, the total training episodes m. The total training episodes m determines the total trials of the GCNDRL model. A larger training episode may provide better decisions but also take longer computing time.

(2)
Then the training process of the GCNDRL model initiates. The state of the traffic network, which is represented by the network structure and node’s average number of reliable independent pathways is fed into the GCNDRL model. The output of the GCNDRL model is the estimated future reward of all repairing decisions, i.e. the future reward of repairing each road segment. In the beginning, the parameters in the GCNDRL model have randomly initialized hence the future rewards of each decision are also random values.

(3)
After projecting the traffic network state into decisionreward space, the estimated best decision can be determined by EpsilonGreedy policy [29]. The repairing road is selected either by randomly sampling or by the GCNDRL model. In the training process, the probability that the decision made by GCNDRL model will gradually increase to 100%.

(4)
After selecting the decisions, the traffic network will be updated by reopening this road segment. Based on the decision and updated traffic network state (by Eqs. 2,3,4,5,6), two paths are conducted to determine this action’s ‘instant reward’ and ‘future reward’ as shown in Fig. 2. The ‘instant reward’ is determined by the traffic network state performance before and after taking the decision (orange color path). The ‘future reward’ is determined by the largest ‘future reward’ value after feeding the updated traffic network state into GCNDRL model.

(5)
With the determined ‘instant reward’ and ‘future reward’ of the next state, the real ‘future reward’ at the current state can be determined by Eq. 8. This value is used to train the GCNDRL model to tune the parameters (weight and bias) of each neuron.

(6)
During the training process, a list of repaired road segments is recorded. At each time step, the future rewards of the decisions in this list are set as 0.

(7)
Repeating the process (2) to (5) until all the road segments are recovered is defined as 1 episode.

(8)
Once the training process is finished for one recovery revolution, the value of resilience index can be calculated based on Eq. 1.

(9)
Regenerate a random damage scenario and feed it to step (1).

(10)
Repeat the process from (1) to (7) m times. The m is the predefined number of training episodes.
As can be seen, the estimated action reward value by the GCNDRL model begins from a random value. With the training process continuing, the reward values that are used to train the GCNDRL are closer to the true values hence the GCNDRL model is expected to make decisions better and better. After the training episodes exceed the predefined training number m, the pretrained GCNDRL model can be saved for future decisionmaking. Although the proposed framework is purposely used for training a decisionmaking model that can handle any damage scenarios, it can also be used for getting the optimal decisions for a specific damage scenario. To find the optimal decisions for a specific damage scenario, step (9) can be replaced by using the same initial damage situation in step (1). Reasonably, training a decisionmaking model for a specific damage scenario needs fewer training episodes than training that for any damage scenario.
Several hyperparameters are involved in the proposed GCNDRL model, including the layers of applied graph convolutional neural networks, the number of neurons used for each layer, the learning rate, optimization function, and reward discount rate (γ in Eq. 8). In this study, two layers of graph convolutional neural networks (each containing 128 neurons) are used to guarantee the model has enough nonlinear capability for network embedding. A higher number may increase the ability of the model’s nonlinearity ability but will also increase the computation time. A smaller number may decrease the model’s ability of decisionmaking under high dimensions of stateaction space. Additionally, a learning rate of 0.005 is used. This is because we prefer to use a larger number of training times with a smaller value of learning rate to achieve a smoother training process. The ’Adam’ Optimizer is selected in the experimental process. The reward discount value, γ, is determined at 0.5 after multiple times of experiments. It was observed that simply setting it to 0 (only considering instant reward) or 1 (only considering future reward) cannot achieve the highest resilience index value in the end. Although these parameters are selected based on multiple experiments, other parameter searching methods can be considered in engineering applications such as grid search and Bayesian optimization [30]. Moreover, the technique ‘Experience replay’ [31] is used in this study to achieve a smooth and stable training result. The key idea behind the ‘Experience replay’ is trying to use a random subset of multiple trials and the corresponding Q values to train the deep Q agent rather than only using the single most recent action. The agent (GCNbased DRL) is performed by python deep graph library [32] and PyTorch library [33].
GCNDRL for decision making
After building the GCNDRL model as described in Section 3.1 and conducting a comprehensive training process (Section 3.2), a pretrained GCNDRL model is available. Since the GCNDRL model is trained with random damage situations, it can be used for any new damage scenario to achieve a repairing sequence that gives the fast system recovery. With the trained GCNDRL model, each action is selected among the possible actions that give the maximum reward values. By repeatedly updating the traffic network state after each action, the final repairing sequence can be determined sequentially.
Case study I: Road network recovery sequences postearthquake
The application of the proposed GCNDRL decisionmaking framework is firstly illustrated by using a part of the road network from Pomona, California. This city is about 60 kilometers away from Reseda, Los Angeles, which is the epicenter of the 1994 Northbridge earthquake. The selected road network contains 93 junctions that are connected by 136 road segments as shown in Fig. 4. The city road network is abstracted by using the python library OSMnx [34]. The training process and performance of the proposed GCNDRL model is firstly illustrated by improving the repairing decisions over a specific road network damage situation. Then a universal GCNDRL model is trained with randomly generated damage situations. Two other methods for road network repairing decisions, i.e., the genetic algorithm method and centralitybased repair prioritization method, are compared over the performance of repairing decisions over the same damage situations. The following assumptions are made to recover from the earthquake hazard. Similar strategies have been widely used in previous studies [35, 36].

1)
The road segments are intact before the earthquake, with a reliability of 1.0 assigned. The reliability of a road segment is reduced based on the extent of damages.

2)
Due to the constraints such as budget, manpower, and other resources, we assume only one road segment is repaired at each time step. However, it should be noted that, multiple repairing teams are probable. They could also refer to this repairing sequence.

3)
After completing the repairing process, the reliability of repaired road segment is restored to 1.0

4)
The repair time (in days) for each damaged road segment is dependent on its reliability. In this study, the repair time from FEMA is adopted [37]. For the road segments whose reliability is below 0.2, the repair time is assumed as 7 days. For the road segments whose reliability is above 0.8, the repair time is assumed as 1 day. For the others, the repair time is assumed as 2 days. It should be noted that this repair time can be modified when more information is available.

5)
The road network performance index is computed and recorded after each damaged road segment is repaired.

6)
The road network restoration process continues until when all the damaged road segments are fixed and their reliability values are restored to 1.0
Initial damage situation under earthquake hazard
Seismic fragility curves are used for road system resilience assessment [38]. The fragility curve developed by HAZUS [39] is utilized to estimate road reliability after the earthquake. According to the HAZUS, the failure probability of a road section exceeding a given damage state can be modeled as a cumulative lognormal distribution function as shown in Eq. 10. Hence the reliability can be determined by Eq. 11.
where \({P}_{f}\left(S\right)\) is the probability at a given intensity measure value; \(\Phi\) is the standard normal cumulative distribution function; \(\beta\) is the standard deviation and \(\mu\) is the median parameter for seismic intensity measure.
In this study, the considered road segments are mainly urban roads with two traffic lanes and the ‘moderate damage level’ is considered. According to Argyroudis [40], the postearthquake permanent ground deformation (PGD) is a widely used intensity measure for pavement damage assessment and the parameters (\(\beta\) and \(\mu\)) are set as 0.7 and 0.30 meters respectively. The corresponding fragility curve is shown in Fig. 5.
GCNDRL training for specific damage situation
The seismic fragility curve in Fig. 5 was applied to generate the postearthquake damage conditions for the road network. To demonstrate the universality and robustness of the proposed decisionmaking framework, a random PGD value selected from a uniform distribution (0 to 1.2) is assigned for each road segment. Figure 6 shows a specific damage situation from one random sampling. The reliability of road segments varies from 0.1 to 0.95. In practical application, estimation of the reliability of each road segment can be improved with the known PGD values or onsite inspection by use of Eqs. 10 and 11. Besides, the ‘emergency response facilities’ for posthazard recovery are annotated by the red points. There are five emergency response facilities.
The GCNDRL model is firstly trained with the proposed training framework illustrated in Fig. 2 for this specific initial damage situation. The total number of training, m, is set as 500. The reward discount factor \(\gamma\) is set as 0.5, which means the model would equally consider the influence of instant reward and future reward. Moreover, the instant reward is defined as the improvement of the road network system performance by each repair action. Mathematically, the instant reward function is calculated by Eq. 12. The future reward is estimated by the GCNDRL model, which will gradually converge with the training process.
where \({R}_{t}\) is the instant reward of the action taken in time t, \(p\) is the system performance as stated in Eq. 7. \({T}_{t}\) is the repair time in assumption (4).
Figure 7 shows the training process of the proposed GCNDRL training framework. The total training time is about 27 hours when using a desktop with Intel i7 and Nvidia 2070. The system resilience index of each recovery round is recorded (as shown in Fig. 2) and plotted. To have a better visualization, the resilience indexes during the training process are smoothed by using the SavitzkyGolay filter [41]. As can be seen, the resilience index of each trial shows small values and a larger fluctuation at the initial training stage. The smoothed curve shows there is a steady increase of resilience index with the continuation of the training process. The final resilience index of the repairing sequence increased from around 150 to over 300 after 1,500 times of training times. The gradually increasing curve demonstrates the GCNDRL model is finding more and more optimized repairing sequences during the training time.
Determine repairing sequence for a specific damage situation
To compare the performance of the GCNDRLbased decision framework, another two decisionmaking strategies are utilized as a baseline comparison. These include the repair strategy based on genetic algorithm and repair strategy based on ranking the betweenness centrality [12]. These two strategies are chosen for comparison since they are the most common and convenient ways used for determining the network recovery sequence. The selected comparison methods are briefly described below. It should be noted there are many other repair decision strategies proposed in the previous studies.
Repair strategy based on genetic algorithm
Genetic algorithm is a welldeveloped method for global optimization. A conventional genetic algorithm for combinatorial optimization problems is utilized. The ‘OX’ crossover method is adopted as described by Moscato and Pablo [42]. A total of 7500 trials are used to obtain the final solution, including 10 populations with 750 generations. Hence, the total trial numbers of GA algorithm are 5 times larger than the GCNDRL model.
Repair strategy based on betweenness centrality
The betweenness centrality [43] is utilized to set the repairing prioritization for the damaged road segments. The betweenness centrality indicates the number of times a road segment is passed by all pairs of shortest paths. The higher the betweenness centrality of a road segment, the more important it is for the network connectivity. Mathematically, the betweenness centrality of a network edge can be expressed in Eq. 13 [44].
where \(V\) is the set of all nodes, \(\varepsilon (i,j)\) is the number of shortest paths between node i and j. \(\varepsilon (i,je)\) is the number of these paths that passing the road segment e.
To effectively obtain the betweenness centrality value for each road segment, Borgatti’s algorithm is adopted [45]. The final betweenness centrality map of the road segments for this case study is shown in Fig. 8.
The final performance of the three different repair decision strategies (i.e., GCNDRL, Genetic Algorithm, and Betweenness centrality) are compared from two major aspects, i.e., the efficiency in road network performance restoration and the computational efficiency. The road network restoration efficiency is measured by the final resilience index value (RI) and the time required to achieve certain levels of system performance. The timedependent road network system performance curves using repair decisions from these three decision strategies are shown in Fig. 9. It is noted that the recovery processes determined by genetic algorithmbased repair strategy are indicated by the shadowed area, with the upper and lower boundaries indicating the obtained best and worst performance of repairing sequences respectively.
The higher resilience index value corresponding to the system recovery curve, i.e., the curve with a higher undercurve area, indicates a higher resilience and therefore corresponds to a better repair strategy. Among the three methods compared, the repair sequence by the proposed GCNDRL model significantly overperforms the other two methods. The repair sequence prioritized based on the betweenness centrality only slightly underperforms the best repairing solution by the genetic algorithm, which is a global optimization method. Figure 9 also denotes the time required of different strategies for the system to recover 80% of its original performance. As can be seen, for the best repair sequence by the genetic algorithm, it takes about 580 days to achieve the 80% recovery while the worst scenario takes as long as 630 days. The repair sequence based on the betweenness centrality needs about 680 days to recover 80% of system performance, which is between the best and worst solution of genetic algorithm solutions. The repair sequence by the GCNDRL model only needs about 548 days to achieve 80% system performance, which significantly outperforms the other two decision strategies. The fast recovery ensures higher system resilience.
Performance of pretrained GCNDRL model in providing repair decisions on new damage situations
The previous comparison showed that the GCNDRL decision support framework achieves faster system performance recovery compared with alternative approaches such as the genetic algorithm and the betweenness centrality prioritized repairing. However, it was also observed the computational efficiency is relatively low due to the model training process. For the betweenness centralitybased, the iterating times is \(1\times M\) for the repairing sequence to be determined. The number of iterations required for the GCNDRL method model training and genetic algorithm is \(\times M\) , \(N\) is the number of trial times and M is the number of damaged roads. Consequently, the computational time significantly increases. This is also a common criticism for similar global optimization methods. The requirements on the timeconsuming training process potentially will limit the ability of the model for fastresponding when a hazard happens. However, unlike the genetic algorithms, the proposed framework allows training a universal GCNDRL model before the hazard happens via the procedures illustrated in Fig. 2. With a pretrained GCNDRL model, a closetooptimal road network repairing sequence can be quickly obtained for any new hazard damage situations without the need of additional training. This strategy will significantly reduce the computational time to deploy the GCNDRL model to meet the needs for emergency responses.
Analyses are conducted to illustrate the performance of the pretrained GCNDRL model to identify a resilient repair sequence under new damage situations. A GCNDRL model is firstly trained with data from different initial road damage situations (Fig. 2). The initialize process is conducted by repetitively assigning randomly generated PGD to each road segment and then computing its reliability based on the fragility curve (Fig. 5). The total number of training steps is increased to 10,000 due to the significant increment of state space. The major computational efforts are for network performance evaluation and neural network training. Correspondingly, the total training time required is around 12 days with a desktop computer without GPU acceleration (this time may vary with the configurations of the computer). The pretrained GCNDRL model is applied to analyze four other new damage scenarios as shown in Fig. 10.
The parameters of the pretrained GCNDRL model are saved and then loaded to handle the new damage scenarios. The repair sequence is determined by the GCNDRL by only applying the GCNDRL model with the inputs of the initial damage conditions of the road network. The other two repair decisionmaking methods are also used to obtain the final repairing sequences as well. The corresponding system recovery trajectories for each damage scenario based on the repair sequences by different decision strategies are compared in Fig. 11. It can be observed that the recovery process based on the pretrained GCNDRL model outperforms the other two decisionmaking methods significantly for all these different damage scenarios. The area under the recovery curve of GCNDRL model is much larger, which indicates a more resilient recovery process. Also, it took less amount of time to achieve 80% recovery of the road network performance by GCNDRL model than those by the other two methods.
Figure 12 summarizes the final system resilience index based on different repair decision models and the corresponding computational time. As can be seen from this figure, with the utilization of the pretrained model, the repair sequence by the GCNDRL model achieves the highest system resilience index with a low computational time. The repair sequence based on betweenness prioritization used the least amount of computational time, its performance in system recovery, however, is also the worst. The genetic algorithm took around 14 hours to finish the computing for each damage scenario.
Case study II: Rapid decisions for flood hazard
The road network recovery after flood hazard [46] is analyzed to further assess the performance of the GCNDRL model in its capability to determine the repair sequence when subjected to a different hazard. The road network used in this case study is a part of the University Heights, Cleveland, Ohio, USA. The road network contains 95 intersections connected by 141 road segments, as shown in Fig. 13.
The impacts of flooding on road network simulation
The capacity of the road sections is compromised by the flood. To evaluate its impacts on the road operation, road flooding conditions are required. Different flood diffusion models such as HECRAS model, ISIS model, MIKE model, etc have been proposed to predict the flood conditions at different locations [47]. In this study, the Susceptibleimpactedsusceptible (SIS) network diffusion model is used for the generation of flooding scenarios along different road sections [48]. There are two primary parameters in the SIS network diffusion model, i.e. average transition probabilities (\(\alpha\)) and recovery probabilities (\(\gamma\)), that dominates the flood diffusion process in the SIS diffusion model. The parameter \(\alpha\) describes the probability of one node falls into ‘flooded’ class if one of its neighbors is flooded. The parameter \(\beta\) describes the probability of one node recovered from the ‘flooded’ into ‘normal’. The same parameters \(\alpha (0.02)\) and \(\gamma (0.013)\) as proposed by Bahrulla [48] are used to analyze the impacts of the flood onroad sections. It should be noted that for cities with localized flood monitoring data, these two parameters can be further calibrated using the dataanalysis methods discussed in the original paper by Bahrulla [48]. For the impacts of flood inundation onroad section, the status of the road segments is either as ‘completely shut down or ‘completely open’ to traffic. Hence the reliability of each road segment during the flood is set as either 0 or 1. This is different from the continuous reliability values assigned to road sections postearthquakes based on their extent of damages.
The effects of flood on road network performance and resilience index
The road network performance and resilience for the road network under flood hazards are measured using the same quantification methods described in Section 2. Since the reliability of each road can only have the binary status of 0 or 1, the node performance can be simplified by the use of Eq. 14 and the system performance can be simplified as Eq. 15
where n is the number of nodes in the network. \(K(i,j)\) is the number of independent paths between node i and node j. \({r}_{i}(t)\) is the average number of reliable independent pathways of node i at time step t.
Training a universal GCNDRL model for road network recovery decisions
As the damage state of each road can either be 0 or 1, the initial situation can be modeled by assuming all the road segments are ‘damaged’. Hence the training process can approximately cover any new damage situations in realworld conditions. The number of training episode is set as 4,000. The state of the road network is represented by the node’s performance value \({r}_{i}\) as shown in Eq. 14 and the road network structure. The same reward function (Eq. 12) and reward discount value \(\gamma\) (0.5) are used in this case study.
The training process of the GCNDRL model is shown in Fig. 14. As mentioned, one episode corresponds to one round of the complete road network recovery process. The computation is performed on a Windows desktop with 16GB RAM, intel CORE i7 process and Nvidia 2060. The total training process took 43 hours and 32 minutes. The mean value of the resilience of the first 300 episodes is only 36.25, while the last 300 episodes achieve around the average value of 44. The variance of resilience during the learning process is relatively larger due to the enormous actions and state spaces since all road segments are assumed to be ‘damaged’ initially. The internal weights and biases of the pretrained GCNDRL model is saved for subsequent analyses.
The performance of pretrained GCNDRl under new flooding situations
New flooding situations of the road network are simulated by using the flood diffusion model. Four new flooding situations along the road network are simulated by using the flood diffusion model with flooding initialized randomly. The results of flood inundated road sections are shown in Fig. 15. The number of submerged road segments varies between 12 and 34 based on different flood scenarios.
Three decisionmaking methods are used to obtain the recovery sequence, i.e., prioritize where the pump should be deployed to open the road sections to traffic.

1)
The pretrained GCNDRL model that loads the ‘training experience’ to solve new flooding situations, is named the universal GCNDRL model.

2)
The GCNDRL model that is trained from scratch for each specific damage scenario, is named the floodspecific GCNDRL model.

3)
Betweenness centralitybased prioritization method. This method is used as a comparison benchmark since a strong relationship exists between the edge betweenness centrality and network connectivity. The previous study has also demonstrated the superiority of centralitybased recovery when applying on a planar network [49].
The road network resilience index values by using the best road section recovery sequences from these three different decision strategies are shown in Fig. 16 a). The results show that all these three decision methods (i.e., the universal GCNDRL model, specified GCNDRL model, and betweenness centrality prioritized based model) all achieved a similar system resilience index. The results make sense since the betweenness centrality is highly correlated to the graph connectivity of the road network and no other operational parameters are considered in this case study. Therefore, the recovery sequence based on betweenness centrality ranking approximates the optimal solution. Among these three methods, the floodspecified GCNDRL model slightly outperforms the rest two models under the first and third flood situations. However, when considering the computational time needed, the floodspecific GCNDRL method requires 2 to 3 hours to train the model, which is a much longer time than the time needed to use the universal GCNDRL model. By contrast, the pretrained universal GCNDRL model and the graphtheory method based on the betweenness centrality only takes around 9 seconds to obtain the final solution and lead to recovery sequences that give high system resilience index values. The result demonstrates that pretrained GCNDRL model can be deployed for emergency decisions for rapid responses after a hazard.
Conclusions
In this article, a novel GCNDRL model is developed to determine the optimal recovery sequence of road networks subjected to different types of natural hazards. The proposed decisionmaking framework allows the GCNDRL model to be trained before the hazard happens by letting the model freely explore the different damage scenarios. The performance of the decision support model is evaluated by its application to posthazard recovery of two testbed road networks subjected to earthquake and flood respectively. The results from both case studies demonstrated that the GCNDRL model can be trained using randomly generated damage scenarios before the hazard happens and can be applied immediately to determine the optimal road network recovery sequence for rapid resilient posthazard responses. The model possesses several unique features. First, as a resilienceinformed global optimization method, it achieves a better decision sequence than ranking repair sequences based on the betweenness centrality, especially when multiple factors are considered in network performance evaluation. It also shows higher computing efficiency than the genetic algorithm. Secondly, the graph reading ability of the proposed method can utilize the road network structure directly without any manually embedding. Lastly, the decisionmaking time can be significantly saved by using a pretrained machine learning model, which shows the potential that the model can be trained before a hazard happens with a supercomputer.
The GCNDRL model provides a novel decisionsupport tool to assist emergency management decisionmakers. While the model is demonstrated on a small road network, it can be readily extended for a larger scale of network by using more advanced computer configurations such as multiple graphics processing units (GPUs).
Availability of data and materials
The data and code are available upon reasonable request.
References
Rose A (2003) Defining and measuring economic resilience to earthquakes. Res Prog Accomplishments 2004:41–54
Zhang X, MillerHooks E, Denny K (2015) Assessing the role of network topology in transportation network resilience. J Transp Geography 46:35–45
Holling CS (1973) Holling Resilience and stability of ecological systems. Ann Rev Ecol Syst 4(1):1–23
Leichenko Robin (2011) Climate change and urban resilience. Curr Opin Environ Sustainability 3(3):164–168
Sterbenz JP, Hutchison D, Çetinkaya EK, Jabbar A, Rohrer JP, Schöller M, Smith P (2010) Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines. Comput Netw. 54(8):1245–65
Pang Y, Wang X (2021a) CloudIDAMSA conversion of fragility curves for efficient and highfidelity resilience assessment. J Struct Eng 147;(5):04021049. https://doi.org/10.1061/(ASCE)ST.1943541X.0002998
Renschler CS, Frazier AE, Arendt LA, Cimellaro GP, Reinhorn AM, Bruneau M (2010) Developing the “PEOPLES” resilience framework for defining and measuring disaster resilience at the community scale, 9th US and 10th Canadian Conference on Earthquake Engineering, Toronto, Canada
Sharma Neetesh, Tabandeh Armin, Gardoni Paolo (2018) Resilience analysis: a mathematical formulation to model resilience of engineering systems. Sustainable Resilient Infrastructure 3(2):49–67
Sharma Neetesh, Tabandeh Armin, Gardoni Paolo (2020) Regional resilience analysis: A multiscale approach to optimize the resilience of interdependent infrastructure. ComputAided Civil Infrastructure Eng 35(12):1315–1330
Koliou M, van de Lindt JW, McAllister TP, Ellingwood BR, Dillard M, Cutler H (2020) State of the research in community resilience: Progress and challenges. Sustainable Resilient Infrastructure 5(3):131–51
Ip WH, Wang D (2011) Resilience and friability of transportation networks: evaluation, analysis and optimization. IEEE Syst J 5(2):189–98
Tang Junqing, Wan Li, Nochta Timea, Schooling Jennifer, Yang Tianren (2020) Exploring resilient observability in trafficmonitoring sensor networks: a study of spatialtemporal vehicle patterns. ISPRS Int J GeoInform 9(4):247
Aydin NY, Duzgun HS, Heinimann HR, Wenzel F, Gnyawali KR (2018) Framework for improving the resilience and recovery of transportation networks under geohazard risks. Int J Dis Risk Reduc. 31:832–43
Zhang Weili, Wang Naiyu, Nicholson Charles (2017) Resiliencebased postdisaster recovery strategies for roadbridge networks. Structure Infrastructure Eng 13(11):1404–1413
Zou Q, Chen S (2021) Resiliencebased recovery scheduling of transportation network in mixed traffic environment: a deepensembleassisted active learning approach. Reliability Eng Syst Safety 215:107800
Nozhati S, Sarkale Y, Ellingwood B, Chong EK, Mahmoud H (2019) Nearoptimal planning using approximate dynamic programming to enhance posthazard community resilience management. Reliability Eng Syst Safety 181:116–26
Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang YC, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun Surv Tutorials 21(4):3133–74
Polydoros AS, Nalpantidis L (2017) Survey of modelbased reinforcement learning: Applications on robotics. J Intelligent Robotic Syst. 86(2):153–73
Mahmud Mufti (2018) Mohammed Shamim Kaiser, Amir Hussain, and Stefano Vassanelli, Applications of deep learning and reinforcement learning to biological data. IEEE transactions on neural networks and learning systems 29(6):2063–2079
Andriotisand CP, Papakonstantinou KG (2019) Managing engineering systems with large state and action spaces through deep reinforcement learning. Reliability Eng Syst Safety 191:106483
Sun W, Bocchini P, Davison BD (2020) Applications of artificial intelligence for disaster management. Natural Hazards 103:2631–2689
Arslan M, Roxin AM, Cruz C, Ginhac D (2017) A review on applications of big data for disaster management. In: Proc. 13th Int. Conf. SignalImage Technol. Internet Based Syst. SITIS 2018. pp. 370–375. https://doi.org/10.1109/SITIS.2017.67
Zhang W, Wang N (2016) Resiliencebased risk mitigation for road networks. Structural Safety 62:57–65
Li S, Ma Z, Teo KL (2020) A new model for road network repair after natural disasters: Integrating logistics support scheduling with repair crew scheduling and routing activities. Comput Industrial Eng 145:106506
Hackl J, Adey BT, Lethanh N (2018) Determination of nearoptimal restoration programs for transportation networks following natural hazard events using simulated annealing. ComputAided Civil Infrastructure Eng 33(8):618–37
Yu RF, He Y (2019) Reinforcement learning and deep reinforcement learning, Deep Reinforcement Learning for Wireless Networks, Springer, Cham pp. 15–19
Christopher JC (1992) Watkins and peter dayan. QLearning. Machine Learning. 8(3):279–92
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M. Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434, 2018
Sutton RS. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Adv Neural Inform Process Syst. 1996:1038–1044
Bergstra J, Komer B, Eliasmith C, Yamins D, Cox DD (2015) Hyperopt: a python library for model selection and hyperparameter optimization. Computat Sci Discov 8(1):014008
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S (2015) Humanlevel control through deep reinforcement learning. Nature 518(7540):529–33
Wang M, Yu L, Zheng D, Gan Q, Gai Y, Ye Z, Li M, Zhou J, Huang Q, Ma C, Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315, 2019
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, et al (2019) PyTorch: An Imperative Style, HighPerformance Deep Learning Library. In: Wallach H, Larochelle H, Beygelzimer A, d’AlchéBuc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems 32. Curran Associates, Inc.: Red Hook pp. 8024–8035
G Boeing (2017) OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks. Comput Environ Urban Syst 65:126–139
Almoghathawi Y, Barker K, Albert LA (2019) Resiliencedriven restoration model for interdependent infrastructure networks. Reliability Eng Syst Safety. 1(185):12–23
Liu W, Song Z, Ouyang M, Li J (2020) Recoverybased seismic resilience enhancement strategies of water distribution networks. Reliability Eng Syst Safety 1(203):107088
FEMA H (2020) HAZUS earthquake model, technical manual. Federal Emergency Management Agency–FEMA, Washington DC EE. UU
Pang Y, Wang X (2021) CloudIDAMSA Conversion of Fragility Curves for Efficient and HighFidelity Resilience Assessment. J Structural Eng 147(5):04021049. https://doi.org/10.1061/(ASCE)ST.1943541X.0002998
HAZUSMH NI (2004) Users’s manual and technical manuals. report prepared for the federal emergency management agency. National institute of building sciences. Federal Emergency Management Agency (FEMA), Washington, DC
Argyroudis S, Kaynia AM (2014) Fragility functions of highway and railway infrastructure. InSYNERG: Typology definition and fragility functions for physical elements at seismic risk. Springer, Dordrecht, pp 299–326
Marcel Abraham Savitzkyand, Golay JE (1964) Smoothing and differentiation of data by simplified least squares procedures. Analytical chemistry 36(8):1627–1639
Moscato P (1989) On genetic crossover operators for relative order preservation. C3P Rep 778:825
Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40:35–41
U Brandes (2008) On variants of shortestpath betweenness centrality and their generic computation. Soc Netw 30(2):136–145
U Brandes (2001) A faster algorithm for betweenness centrality. J Mathematical Sociol 25(2):163–177
Trenberth KE (2011) Changes in precipitation with climate change. Climate Res 47(1–2):123–138
Nkwunonwo UC, Whitworth M, Baily B (2020) A review of the current status of flood modelling for urban flood risk management in the developing countries. Scientific African 7:e00269
Abdulla B, Kiaghadi A, Rifai HS, Birgisson B (2020) Characterization of vulnerability of road networks to fluvial flooding using SIS network diffusion model. J Infrastructure Preserv Resilience 1(1):1–3
Bhatia U, Sela L, Ganguly AR (2020) Hybrid method of recovery: combining topology and optimization for transportation systems. J Infrastructure Syst 26(3):04020024
Funding
This research is partially supported by the US National Science Foundation. Grant No. 1638320.
Author information
Authors and Affiliations
Contributions
Xiong (Bill) Yu: envision the research, guide research activities; Xudong Fan: conduct analyses; Xijin Zhang: provide assistance; Xiaowei Wang: assist with proofreading. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
No human subject or animals are involved in the study.
Competing interests
N/A
Consent for publication
The authors consent the publications of this paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fan, X., Zhang, X., Wang, X. et al. A deep reinforcement learning model for resilient road network recovery under earthquake or flooding hazards. J Infrastruct Preserv Resil 4, 8 (2023). https://doi.org/10.1186/s4306502300072x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4306502300072x
Keywords
 Road network
 Deep reinforcement learning
 Graph convolutional neural network
 Infrastructure resilience
 Decision support
 Recovery