 Research
 Open access
 Published:
Machine learningassisted optimal schedule of underground water pipe inspection
Journal of Infrastructure Preservation and Resilience volumeÂ 4, ArticleÂ number:Â 20 (2023)
Abstract
There are over 2.2 million miles of underground water pipes serving the cities in the United States. Many are in poor conditions and deteriorate rapidly. Failures of these pipes could cause enormous financial losses to the customers and communities. Inspection provides crucial information for pipe condition assessment and maintenance plan; it, however, is very expensive for underground pipes due to accessibility issues. Therefore, water agencies commonly face the challenge to 1) decide whether it is worthwhile to schedule expensive water pipe inspections under financial constraints, and 2) if so, how to optimize the inspection schedule to maximize its value. This study leverages the physical model and databased ML (ML) models for underground water pipe failure prediction to shed light on these two important questions for decision making. Analyses are firstly conducted to assess the value of water pipe inspection. Results by use of a physicalbased failure model and Monte Carlo simulations indicate that by inspecting pipeâ€™s condition, i.e., assessment of pipeâ€™s erosion depth, the uncertainty of water pipe failure prediction can be narrowed down by 51%. For optimal inspection schedule, an artificial neural network (ANN) model, trained with historical inspection data, is evaluated for its performance in forecasting the future pipe failure probability. The results showed that a biased pipe failure prediction can occur under limited rounds of inspection. However, incorporating more rounds of inspection allows to predict the pipe failure conditions over its life cycle. From this, an optimal inspection plan can be proposed to achieve the maximum benefits of inspection in uncertainty reduction. A few salient results from the analyses include 1) the optimal schedule for inspection is not necessarily equal in the time interval, 2) by setting the goal of uncertainty reduction, an optimal inspection schedule can be obtained, where ML (ML) model augmented by continuously training with inspection data allows to reliably predict water pipe failure conditions over its life cycle. While this study focuses on underground pipe inspection, the general observations and methodology are applicable to optimize the inspection of other types of infrastructure as well.
Introduction
Over 2.2 million miles of water pipes are buried under the U.S. cities, delivering reliable water to millions of people. However, many of them are in a poor condition and are deteriorating rapidly [5]. The failure of these pipes could cause enormous financial losses to the local business and communities. There are more than 700 water main breaks every day in Canada and USA [25], which results in huge economic and social losses. In 2009, the American Society of Civil Engineers issued a USA Infrastructure Report Card and gave a D to drinking water and wastewater infrastructure, which recently grows to C in 2021 [23]. As stated by the American Water Work Association (AWWA), we stand today at the dawn of a new era, the replacement era, for water utilities. These replacement costs combined with projected expansion will cost more than $ 1 trillion over the next couple of decades [7].
The failure of water pipes can incur enormous losses, including water containment, water shortage, and financial losses with associated societal or environmental impacts. At present, only 47% of water utilities have used a pipe failure model for the water pipe replacement strategies [8]. Part of the reason is the lack of sufficient historical data for developing a more accurate pipe failure prediction model [40]. In addition, the influence of a limited number of inspection records on the maintenance plan is still unclear. Therefore, it is to the benefit of society to understand the value of inspections and to ensure the inspection schedule can maximize the benefit of inspections. This study demonstrates that the value of inspection is to reduce the uncertainty in pipe condition assessment and that ML model provides a way to incorporate inspection data to optimize the inspection period.
The existing prediction models for water pipe conditions (i.e., failure probability), are classified into three categories, i.e., physicsbased models, statistical models, and ML based models [10, 49]. Physical models consider the physical mechanisms which contribute to the water pipe failure. Three physical aspects are often considered by a physical model, i.e., the material properties and structural design of pipe, the internal and external loads to the pipe, and material deterioration including corrosion affected by the environment and service time [31]. When the residual structural capacity canâ€™t support the internal and external loads, the water pipe fails. The books by Young and Trott [52] provide well explanation of the mechanical behavior of buried water pipe. Ahammed and Melchers [3] used SpanglerWatkins inplane pipesoil interaction to obtain an estimated failure probability in steel pipe, and also used the firstorder secondâ€“moment (FOSM) method to the prediction of the failure probability. Pandey [28] and Valor et al. [46] presented a method where the failure probability is obtained by Monte Carlo simulation.
Statistical models typically analyze data from historical records and try to find the trend by curve fitting with a mathematical equation. Yamijala et al. [50] compared four types of statistical models for the pipe failure probability at different ages. The existing statistical models including time linear ordinary least squares regression, time exponential ordinary least squares regression. The logistic Gaussian linear model is believed to have a better ability to the regression and prediction of water pipe reliability [50]. Keliner and Rajani also summarized a lot of work and historical data [22]. The statistical models are found economically viable approach for the smaller distribution water main. Both statistical and physical models need to be validated and improved with more data. Both of them have their own drawbacks. For example, statistical model tries to regression the historical data by certain limited mathematical equations, and physical model requires a deep understanding of the mechanism about water pipe failure.
Machine Learning (ML) is an emerging method for the prediction of structural failures [45, 48], geology [39], and underground structures [20]. For example, Ren et al. [33] predicted the corrosion rate by using back propagation neural network. Peng et al. [29] developed a model for predicting the failure rate of oil and gas pipelines by fuzzy neural network. Tabesh et al. [42] applied the ANN, ANFIS and Nonlinear Regression methods to assess the pipe failure rate of water distribution networks, and found that the ANN is the most robust method. Sadiq et al. [35] use the ANN model to predict the water pipe condition where the relationships among variables are unknown. Sawhney and Mund [37] added that the ANN is useful to represent problems where solutions are not clearly identified. Rajani and Kleiner [31] applied the ANN model to water distribution network. Thomas [43] used an ANN model in multicriteria decision making and prediction problems. An AHP model was also developed to find out the key factors that influence the failure of water pipelines, then used ANN to predict the failure [13, 14]. Fan et al. [17] also considered five different ML algorithm for the pipe failure prediction.
Although with different methods have been developed for assisting the maintenance decisions, a rich historical dataset is required for the model calibration or training. However, such datasets that includes pipe condition at different ages and environments are difficult to obtain, especially for small utilities who are just beginning to record their assets. Therefore, it is important that the decisionmaking process can be calibrated by accumulated inspection data. However, few studies have considered the value of inspections to the MLbased prediction models and how to optimize inspection intervals.
This paper aims to quantify the benefits of conducting pipe inspection and ML support of the development of optimal inspection intervals. Mont Carlo simulations were conducted with a widely accepted physicsbased model to generate data about the future pipe conditions. Inspection at a given service time was assumed to capture the pipe conditions. A ML model, Artificial Neural Network, is trained with inspection data and evaluated for its capability to forecast the future water pipe failure probability. The results unveil a few interesting findings. Firstly, conducting inspection could significantly reduce the uncertainty range for the pipe performance forecasting. Secondly, not all inspections conducted at the same time interval bring equal value. For example, inspection at the early stage and final stage of pipe service life may only add limited values to the performance forecasting. Based on the goal of uncertainty control on pipe condition assessment, an optimal inspection plan can be determined based on continuous training of the ANN model with additional pipe inspection data.
Methodology
Using a simulationbased method is a common approach in order to generate a sufficient dataset. In this study, we used a widely accepted physicalbased water pipe failure model to generate the pipe samples. The pipeâ€™s service life is assumed as 100Â years. FigureÂ 1 shows the main flowchart of this study. The water pipeâ€™s failure physical model is used to generate samples for the ML modelâ€™s training process (from year 1 to year i). After that, the ML model is trained and used as a failure prediction model. The failure probabilities of the pipe from year \(i+1\) to year 100 are computed using the randomly sampled physical factors. Finally, the prediction results of the ML model are compared with that of the physical model to evaluate the value of inspections.
Applied pipe failure physical model
In the current US design standard for castiron pipe [6], the pipe is assumed as a rigid body that carry all the internal and external loads. Schlick [38] conducted an experiment which showed that the probability of failure of a grey castiron pipe can be calculated by a parabolic relationship between the inside pressure and external loads. Underground water pipes usually support several loads, as studied by Rajani and Kleiner [31]. The failure types of pipes are categorized into three major categories by Oâ€™Day et al. [27], i.e. 1) Circumferential breaks, caused by longitudinal stresses; 2) longitudinal breaks, caused by transverse stresses (or hoop stress); and 3) split bell, caused by transverse stresses on the pipe joint.
Schematic of these loads are illustrated in Fig.Â 2. In this study, two types of external stresses are analyzed, i.e., hoop stress and axial stress. The formulations used to calculate the external stresses are summarized in the following context.
Water pipe external stress analysis
In this study, a total of four types of stresses are considered, i.e., the stress of internal fluid pressure, the stress of soil pressure, the stress of frost load, and the stress of traffic load. The computation of these stresses is firstly introduced below.
Stress by the internal fluid pressure [30]
where p is the internal pipe pressure, D is the nominal pipe diameter, t is the pipe wall thickness
Stress by soil pressure [41]
where \({K}_{m}\) is the bending moment coefficient, \(\gamma\) is the unit weight of soil, \({B}_{d}\) is the width of ditch, \({K}_{d}\) is the deflection coefficient, \({E}_{p}\) is the pipe material elastic modulus, \({C}_{d}\) is the calculation coefficient
Stress by frost load [32]
where \({f}_{frost}\) is frost load multiplier, Ïƒ_{s} is the stress by soil pressure.
Stress by traffic loads [2]
where \({I}_{c}\) is the impact factor, \(A\) is the effective length of pipe, \(F\) is the wheel load traffic, \({C}_{t}\) is the surface load coefficient
Therefore, the total hoop or circumferential stress is calculated by assuming the stresses by these different loads are superimposed,
The total axial stress is calculated by considering the stress due to temperature gradient.
where \({\sigma }_{F}{\prime}=\frac{p}{2}\times \left(\frac{D}{t}1\right){\nu }_{p}\) is the stress due to internal fluid pressure, \({\sigma }_{T}={E}_{p}{\alpha }_{p}\Delta T\) is the stress caused by temperature difference [30].
The axial stress due to deflection of pipe was not considered. It is noted that when the support structure failed, the stress due to bending moment can be significant and even cause the failure of water pipe.
Water pipe residual yield strength
The resistance of pipe failure is highly related to the effective structural thickness of pipe wall. For this purpose, the surface corrosion model is utilized to describe the reduction of pipe wall thickness due to corrosion. The wall thickness of water pipe continue decreases over time due to corrosion. The corrosion rate has been studied extensively. For example, Doyle et al [15, 16, 19, 21, 24, 26] surveyed the condition of buried water pipes. The results indicate that the condition of water pipe is highly related to soil characteristics.
In the paper, an empirical twophase corrosion model was used in estimation of corrosion depth [36].
where \(d\) is the depth of corrosion (mm), \(a\) is the final pitting rate constant, \(b\) is the pitting depth scaling constant, \(c\) is corrosion rate inhibition factor.
FigureÂ 3 shows one example of prediction corrosion depth by Eq.Â (7), by using of parameters listed in Table 1. The corrosion of metal pipe is due to the establishment of anodic and cathodic areas [12, 34]. The anodic area was established by local environment initially, such as a crack in the iron oxide layer. Then the cathode will be established somewhere near the pit. Subsequently, there will be anions such as \({\mathrm{OH}}^{}\) and \({\mathrm{CI}}^{}\) movements from the anode to the cathode. However, with the movement of these anions, a layer of ferrous hydroxide \([{\text{Fe}}({\text{OH}}{)}_{3}]\) will be generated. And later an intermediate layer of magnetite \(\left[{\text{F}}{\text{e}}_{3}{\mathrm{O}}_{4}\right]\) will form. This layer of magnetite will stop the anions move from anodic area to the cathodic area. So the corrosion rate will be high at the early stage and then decreases over time [36]. Therefore, the development of corrosion by the twophase model (Eq.Â (7), Fig.Â 3) is consistent with the physicchemical process associated with the field corrosion.
The residual yield strength of a water pipe can be determined by the empirical relationship proposed by Rajani et al. [30], Eq.Â (8):
where \(\beta ={a}_{1}{\left(\frac{d}{{t}_{res}}\right)}^{{b}_{1}}\), \(\alpha\),Â \(S\) are constants used in fracture toughness equations;Â \(\beta\) is the geometric factor for a doubleedge notched tensile specimen; \({a}_{n}\) is the lateral dimension of pit; \({K}_{q}\) is the provisional fracture toughness, \({a}_{1}\), \({b}_{1}\) are constants for determining the geometric factorÂ \(\beta\), \(d\) is the depth of corrosion pit which can be estimated by Eq.Â (7).
Water pipe failure criteria
Failure of a water pipe occurs when either its hoop stress or axial stress exceed its residual yield strength [36]. By introducing the concept of factor of safety (FOS), the water pipe failure criteria can also be written as Eq.Â (9). When FOS is larger than 1, the pipe is assumed to be safe; otherwise, the pipe is assumed to fail.
where Ïƒ_{Y}, Ïƒ_{X}, Ïƒ_{Î¸} are determined by Eqs. (5), (6) and (8) respectively.
Water pipe failure probability
There are significant uncertainties with the parameters used to determine the hoop stress, axial stress and residual yield strength. The FOS and consequent probability of water pipe failure are of stochastic nature. Such problems are typically analyzed by methods such as the Monte Carlo simulation [28], MeanValue First Order Second Moment, Advanced First Order Second Moment, First order reliability methods [9, 11], Rosenbluethâ€™s Points Estimation, or Harrâ€™s Point Estimation [44].
Monte Carlo simulation is an effective method to model the stochastic process. It is utilized in this study. The variables required to determine the failure conditions of water pipe is assumed to follow special statistic distributions. The distribution of these variables are listed in Table 1, modified from Sadiq et al. [36].
FigureÂ 4 shows the flow chat of using Monte Carlo simulation to determine distribution of FOS and failure probability of a single pipe. For each loop, the values of considered variables are randomly generated using the probability distribution parameters as noted in Table 1. Based on the random generated value, the FOS is computed by Eq.Â 9. After iterating N times, the times of failures, n, can be recorded and the failure probability of this pipe can be computed by Eq.Â 10.
where n is the number of failure times in the iteration, and N is the total number of iterations.
Artificial Neuron Network (ANN)
Artificial Neural Network (ANN) is a widely used ML model. Its architecture includes interconnected neurons in the input layers, hidden layers and output layer, which determines its overall performance [1]. Increasing the number of neurons and hidden layers can improve the ability of ANN model to describe nonlinear relationships. It, however, also increase the computational demand and potentially lead to overfitting. A conceptual architecture of a neural network is shown in Fig.Â 5.
The input layer consists of i neurons, which are corresponding to the number of input features. The hidden layers provide the capability to model the complex nonlinear relationships which are finetuned with the training data. The output layer consists of one neuron which is used to classify the output as leaking or not leaking.
The hidden layers include fully connected neurons, the output of each neuron is written as Eq.Â 11.
where \({y}_{k}\) is the output of each neuron at the hidden layer, \({x}_{r,k}\) is the output of the last layer, for the first layer of neural network, \({x}_{r,k}\) is the sample data. \({\omega }_{r,k}\) is the weight of that neuron and b is the bias of that neuron, which are trained with the training datasets by the backpropagation algorithm. \(f(\cdot)\) is the activation function used to increase the nonlinear property during the propagation. In this study, the â€˜ReLUâ€™ function is used as the activation function of the hidden layer [4].
The output of the last hidden layer is then transferred into the neurons in the output layer, whose actions is written as below.
where \({y}_{k}\) is the output of the last hidden layer, and \({y}_{z}\) is the output of the output layer. \(\omega\) and \(b\) are the weight and bias as described before. \(g(\cdot)\) is the sigmoid transfer function defined as Eq.Â 13
The ANN model in this article is built and trained with TensorFlow in python environment, which learns the relationship between the output and input by a training process to classify the observed data into leaking and nonleaking situations. More detailed mathematical information about ANN can be found at [18].
Results and discussion
A pipe with 6Â m effective length, 20Â cm internal diameter, and 1Â cm original wall thickness is considered as the testbed in this study. The Monte Carlo simulations are repeated 1,000 times for each year. Hence the Monte Carlo simulation provides 1,000 random FOS value for each year, from which the failure probability of each year and the evolution of failure conditions (mean and standard deviation) with service time can be calculated. In this section, the value of inspection is firstly studied. The inspections data samples are assumed based on the accuracy of inspection tools and are generated by Monte Carlo simulation. After that, an ANN model is evaluated in its capability to predict the pipeâ€™s failure probability over time using the inspection data. The prediction result of the ANN model is used to demonstrate the importance of an optimal maintenance by comparing it with the theoretically ground truth.
Illustration of the value of pipe inspection
The corrosion inspection is one of the most inspection items in the pipe maintenance procedures. Multiple inspection methods, such as Magnetic flux leakage (MFL), Circumferential MFL, Triaxial MFL, and Ultrasonics, have been used for the pipeline deterioration inspection [47]. To quantify the value of pipe inspections, we assume the corrosion depth is inspected at a specific year, T0. The inspected value is then used as the initial value for Monta Carlo simulation to determine the following yearsâ€™ pipe thickness (Eq.Â 14). In other words, we assume the corrosion depth is a determined value at year T0 by inspection, which is used to predict the pipe conditions during the subsequent years via the Monte Carlo simulations.
where \({d}_{T}\) is the pipe corrosion depth at year T (>â€‰T0), \({d}_{tT0}\) is the inspected corrosion depth at T0, other parameters, a, T, b, and c are the same with Eq.Â 7.
The distribution of the FOS at each year, with no inspection or with inspection at Year 20, is computed by Monte Carlo Simulations. For each year, the mean value, 10% quantile, and 90% quantile are computed and recorded. FigureÂ 6 shows the computed distribution of FOS over 100Â years. The solid lines are the prediction results assuming the pipe is not inspected. As can be seen in Fig.Â 5, the overall FOS values decrease with the service life increases. After the 70^{th} year of service, the mean value of FOS is around 1, which implies that there is about 50% probability that the pipe would fail. Assume the average acceptable FOS of 1.5, the corresponding year for the water pipe to reach the threshold is around 37Â years.
To assess the effects of inspection, it is assumed that due to a more corrosive underground environment, the pipeâ€™s corrosion depth determined by inspection at 20^{th} year is 5Â mm, which is slightly higher than the average value of 4.02Â mm. The FOS of the pipe for the subsequently years calculated by the procedures (as illustrated in Fig.Â 3) after incorporating the inspection data is shown in Fig.Â 6 by the dash lines. Both the mean value and ranges of FOS after considering the inspection are shown in the figure.
The results in Fig.Â 6 indicate if using average FOS of 1.5 as an acceptable threshold for pipe replacement, the corresponding year for the water pipe to reach the threshold FOS is around 30Â years. The result indicates that incorporation of inspection data would forecast unacceptable pipeline failure 8Â year earlier than without inspection. From practice perspective, this information will be helpful for agencies to implement preventative maintenance such as corrosion protection measures or replacing the pipe sections before it fails. The final decision is also dependent upon the financial constraints and societal impacts of such actions.
The immediate value of the inspection is studied by comparing the FOS distributions at the 21^{st} year, which is the next year after the inspection. Both of the forecast FOS distributions follow lognormal distributions. As shown in Fig.Â 7, compared with without inspection, the possible FOS values of pipe predicted with inspection became more concentrated, and variations are reduced.
FigureÂ 8 compares the standard deviation of FOS over time after inspection. The results show that inspection helps to narrow down the uncertainty range of pipe condition forecast, or FOS. For example, at the end of 40^{th} year, the variance of the FOS without inspection is 0.967. While with inspection data, the prediction variance in FOS is 0.475. This clearly shows that inspection data significantly reduces the variations or uncertainties in the FOS.
Overall, the results indicate inspection collects pipe condition data and provide more accurate prediction of its future conditions. As shown in Figs. 5, 6 and 7 shows that value of inspection is to reduce the range of uncertainty in forecasting the future conditions of the pipe. The variations in the model forecast are reduced (as much as 50.8% in this case) in this case after incorporating inspection data. The reduced variation in model forecast will help reducing the uncertainty during the decisionmaking process. The value of inspection gradually decreases over time, as indicated by that the range of forecast FOS uncertainty continues to increase over time. At certain time, another round of inspection is needed to reduce the uncertainty on pipe condition assessment.
It is also noted that, the inspection program analyzed here only considered the depth of corrosion, which is related to the capacity of the pipe to resist failure. Other parameters related to service load, such as internal water pressure, frost effects, and traffic loads, are not included. This information, if available, can be included to further reduce the uncertainty in the forecast of water pipe failure.
ML model with inspection data for reliable forecast of future pipe conditions
The previous discussions show inspection bring values in reducing the uncertainty in underground water pipe condition assessment. The natural next question is how to effectively incorporate inspection data for pipe failure probability prediction. We explore datadriven ML models to facilitate such purpose. The advantage of ML model is that it can extract the features from the complex dataset without requiring a predefined physicsbased model. The datadriven nature requires data for ML model training and validation. An Artificial Neuron Network (ANN) is used in this study due to its simplicity.
A few assumptions are made in this study to evaluate the ability of ANN model in predicting the pipe failure probability at different ages. Firstly, we assume the inspection datasets are obtained from a large water system. Specifically, we assume that there are 1,000 pipes with the same initial physical installed in the water system, and the inspection process could provide values of all the variables according to the distribution listed in Table 1. We also assume the physicsbased model introduced in Illustration of the value of pipe inspection section could reflect the pipe failure mechanism accurately. Hence the corresponding pipesâ€™ status can be determined by the Monte Carlo simulation via the physicsbased model. With the first assumption, the inspection will allow to capture the condition data for the 1,000 pipes, which would provide enough number of data samples for the ANN training and validation. The second assumption allows to use the average pipe conditions predicted by the physicsbased model as a ground truth for evaluating the ML model performance.
To train the ANN model, we firstly generate the pseudo inspection dataset that contains the variable values and pipe status based on the assumptions. An ANN model with optimal structures is obtained after hyperparameter optimization. The ANN model contains 1 input layer, 1 output layer, and 6 hidden layers. The activation function â€˜ReLUâ€™ is selected for the input layer and the hidden layers. The â€˜Sigmoidâ€™ function was used in the output layer as the activation function. The input layer contains 21 neurons that corresponding to all the pipe variables listed in Table 1. The output layer contains 1 neuron which classify the condition of the pipe as either â€˜failureâ€™ or â€˜functionalâ€™ based on its FOS. The hyperparameters of the considered ANN model are determined based on a trialanderror process. It should be noted some techniques, such as gridsearching and Bayesian optimization, can also be used to determine these hyperparameters [51]. We split the generated pseudo inspection data into training and testing dataset with a ratio of 7:3. The ANN modelâ€™s accuracy is evaluated by comparing the prediction results of testing set and the pipe status of the generated dataset. The accuracy of the trained ML achieves 99.1%. This high accuracy is because the ML model can well learn the pattern of the predefined physical model.
The process of predicting the pipe failure probability assumes that the variables (except pipe thickness) in the future continue to follow the same distributions of inspection records which are also the predefined distributions in Table 1. Therefore, for each year with no inspection, 1,000 sets of random variables are generated except the pipeâ€™s current thickness. The pipeâ€™s thickness data is replaced with the pipeâ€™s remaining thickness based on the results of nearest previous inspection. Thereafter, the generated datasets are fed into the trained ANN model. Accordingly, the model will predict either the pipe is failure or functional based on the inputs. The failure probability of the pipe can be computed by dividing the number of failures with the total number of samples (1,000 samples), as shown in Eq.Â 10.
The ground truth curve is obtained directly from the Monte Carlo simulation with physicsbased model. And it is used as the baseline for comparison to verify the accuracy of the predicted results by ANN. The baseline is shown as â€˜original dataâ€™ in Fig.Â 8. To evaluate the value of inspection, we firstly assume the inspection is conducted regularly at 10years interval. For example, if the inspection is conducted over 40year period, pipe condition data is assumed to be collected at 10^{th}, 20^{th}, 30^{th} and 40^{th} and are used for ANN model training. The trained ANN model is used to predict pipe failure probability for the subsequent years. Similar analyses are conducted for ANN model trained with inspection data collected at 10Â year time interval over 50Â years, 60Â years, or 70Â years of pipe service; the results are compared with the ground truth (Fig.Â 9).
FigureÂ 9 shows the comparison results of predicted pipe conditions (i.e., its failure probability) by ML models trained with different years of inspection data. The comparison with baseline ground truth curve shows that the more training data obtained from inspection, the closer the forecast by the ANN model to the ground truth. For example, the ANN model trained with 40Â years of data from 4 rounds of inspections at 10year interval can predict the pipe failure probability within 4% of the ground truth values in the subsequent 20Â years (40 to 60Â years). The deviation from the ground truth curve increases in the subsequent years. If the next round of inspection data (i.e., inspection at 50^{th} year) is available, the further trained ANN model could predict within 2% of the ground truth values for the next 20Â years. Incorporating more inspections, similar trends are observed. Interestingly, for this case, ANN model trained with 7 rounds of inspection data (10^{th} year, 20^{th} year,30^{th} year, 40^{th} year, 50^{th} year, 60^{th} year, 70^{th} year) is able to predict the failure probability that nearly overlap with the ground truth curve. That is, 7 rounds of inspection data (up to 70^{th} year) would allow to accurately predict the pipe failure probability over the life span of 100Â years.
The observations imply that ML model trained with pipe inspection data is able to provide reliable forecast of its failure probability over certain years. The reliability of ML model prediction is further increased by incorporation of more inspection data. Therefore, the value of inspection is to further extend the range of years in reliable pipe condition prediction. It is also noted that inspection data beyond certain years of service (70Â years in this case) does not bring added value. This might be attributed to the fact that the corrosion of water pipe has gone into steady deterioration rate at that stage. Therefore, there should be an optimal inspection strategy, in terms of scheduling the inspection, that brings in the maximum value of the inspection data over the service life of the water pipe. This is further discussed in the following context.
Optimal inspection schedule based on ML model
The previous analyses indicate ML model trained with inspection data can predict future pipe conditions. The inaccuracy of its performance, however, can grow over the years in the future. Therefore, an optimal inspection schedule could provide acceptable reliability in pipe condition prediction over the whole service life of water pipe. The term â€˜optimalâ€™ in this case refers to the inspection schedule that using the minimal number of inspections to constraint the uncertainty in future condition prediction to be withinâ€‰Â±â€‰5% of the true value based on prior inspection data. The idea of optimal inspection interval from ML model is illustrated in this section. The studied pipe is assumed to have the same design parameters and will be used for 100Â years. An ANN model is used for predicting the pipeâ€™s failure probability in its remaining life. The failure probability by Monte Carla simulation with physicsbased model is used as the baseline (true value).
FigureÂ 10 shows the pipeâ€™s ground truth curve and the prediction results of the ANN model. Since the failure probability is low for the first 20Â years, we assume the inspection is not necessary until the 20^{th} year. Therefore, the first inspection is arranged at the 20^{th} year as shown in Fig.Â 9. After the inspection, the initial condition data (1^{st} year) and the inspection data at the 20^{th} year are used to train the first ANN model. FigureÂ 10 shows the first trained ANN quickly deviates from the ground truth curve in the subsequent years after year 20. Assuming the prediction bias should be controlled within 5% error range of the ground truth, the next inspection date is expected to be arranged at 25^{th} year. With the years of data (data from the 1^{st} year and inspection data from both the 20^{th} and 25^{th} year), the prediction accuracy of new ANN model can be within 5% for the next 15Â years. Therefore, the third inspection can then be made at the 40^{th} year. With this timeâ€™s inspection data, the new forecast accuracy is ensured for the next 20Â years. The fourth inspection can then be done at 60^{th} year to cover another 20Â years. Finally, the fifth inspection can be done at the 80^{th} year, which will allow the ML model to cover the remaining service life of the water pipe up to 100Â year. Overall, 5 rounds of inspections will be sufficient to provide reliable forecast (within 5% error) over the 100year life cycle of the water pipe.
The results shown in Fig.Â 9 indicate that the optimal inspection schedule is not evenly spaced, or the value the inspection data at different years is not equal. A few observations are summarized below.

a)
Firstly, no inspection is needed during the first 20Â years because of the low probability of failure. The inspection data does not bring much added value for ML model.

b)
Second, a more frequent inspection (between 20 to 40Â years) is needed due to the rapid change of water pipeâ€™s failure probability. In other words, the inspection data brings higher value in ANN model training for future pipe condition prediction.

c)
Third, regular inspection with approximate fixed time intervals can be used when pipes between 40â€“80Â years old. It is because on the one hand, the ANN model has received enough amount of data to capture the changing trend of the pipe failure probability. On the other hand, the corrosion rate of the pipe is becoming steady.

d)
Finally, no value of inspection at the final stage of the water pipe is observed (80â€“100Â years), the ANN model is able to accurately predict the pipe failure probability to the end of service life. This is possibly because the corrosion rate of pipe has become stable at this stage.
Overall, the results indicate that the inspection can efficiently narrow down the pipe failure uncertainty, as shown in Fig.Â 6. Using more times of inspections can also reduce the prediction error of the MLbased failure prediction models. The results also indicate that the value of inspections is not equally contributing to the model prediction. For example, the optimal inspection schedule reduces the number of inspections from 10 times (for fixed time inspection at 10year interval) to 5 times over the 100year service life, as illustrated in Figs.Â 9 and 10 respectively.
Conclusion
The deteriorating of water distribution pipes requires a proactive plan for the maintenance, retrofit and renewal. Inspection plays an important role in supporting these decisions. The complex stochastic nature of infrastructure deterioration presents a major challenge to forecast its performance. ML (ML) can potentially provide an important tool to uncover the value from inspection data. Analyses are conducted in this paper by using underground water pipe as the testbed. The results show that the value of inspection is to reduce the uncertainty in the forecast of pipe conditions or its factor of safety. The analyses also show that inspection at different time does not bring equal value, i.e., the optimal schedule for inspection is not necessarily equal in the time interval. An optimal inspection schedule can be designed based on a preset acceptable reliability level of ML model (ANN in this case) for future pipe condition forecast. ML model trained with data collected from the optimal inspection schedule can provide costeffective and reliable forecast of pipe failure probability throughout its service period.
This study illustrates the value and impacts of inspection data on the development of optimal water pipe maintenance strategies. Due to limited pipe failure data samples available, a physicsbased model is used to generate data needed for machine learning model training. It should be noted that the physicsbased model might or might not sufficiently represent what happen in the real world. Realworld failure records of the pipes should be incorporated when utilizing the framework proposed in this study. However, the proposed framework of incorporating inspection data and ML model to optimal inspection schedule is applicable.
Availability of data and materials
Data is available upon request from the corresponding author.
References
Abokifa AA, Haddad K, Lo C, Biswas P (2018) Realtime identification of cyberphysical attacks on water distribution systems via machine learningbased anomaly detection techniques. J Water Resour Plan Manag 145(1):04018089
Ahammed M, Melchers R (1994) Reliability of underground pipelines subject to corrosion. J Transp Eng 120(6):989â€“1002
Ahammed M, Melchers R (1997) Probabilistic analysis of underground pipelines subject to combined stresses and corrosion. Eng Struct 19(12):988â€“994
Apicella A, Donnarumma F, IsgrÃ² F, Prevete R (2021) A survey on modern trainable activation functions. Neural Netw 138:14â€“32
ASCE (2021). 2021 Infrastructure report card. www.infrastructurereportcard.org
American Water Works Association (1967) USA standard for thickness design of castiron pipe. Standard H167. New York
American Water Works Association (2001) Dawn of the replacement era: reinvesting in drinking water infrastructure: an analysis of twenty utilitiesâ€™ needs for repair and replacement of drinking water infrastructure. American Water Works Association, New York
Campanella K, Andreasen C, Diba A, Himmelberger H, Leighton J, Santini J, Vause K (2016) 2015 establishing the level of progress in utility asset management survey results. Proc Water Environ Fed 1:462â€“490
Davis P, Burn S, Moglia M, Gould S (2007) A physical probabilistic model to predict failure rates in buried PVC pipelines. Reliab Eng Syst Saf 92(9):1258â€“1266
Dawood T, Elwakil E, Novoa HM, GÃ¡rate Delgado JF (2020) Water pipe failure prediction and risk models: stateoftheart review. Can J Civ Eng 47(10):1117â€“1127
De Silva D, Moglia M, Davis P, Burn S (2002) Condition assessment and probabilistic analysis to estimate failure rates in buried pipelines. In: Proceedings of ASTT 5th Conference
Denison I, Darnielle R (1939) Observations on the behavior of steel corroding under cathodic control in soils. Trans Electrochem Soc 76(1):199â€“214
Dey PK (2003) Analytic hierarchy process analyzes risk of operating crosscountry petroleum pipelines in India. Nat Hazard Rev 4(4):213â€“221
Dey PK (2004) Decision support system for inspection and maintenance: a case study of oil pipelines. IEEE Trans Eng Manage 51(1):47â€“56
Doyle G, Seica MV, Grabinsky MW (2003) The role of soil in the external corrosion of cast iron water mains in Toronto, Canada. Can Geotech J 40(2):225â€“236
Ewing S (1932) Rough correlation between corrosiveness and resistivity for alkali soils. Oil Gas J 30:29
Fan X, Wang X, Zhang X, Yu, X. (2022) Machine learning based water pipe failure prediction: the effects of engineering, geology, climate and socioeconomic factors. Reliab Eng Syst Saf 219:108185
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: A tutorial. Computer 29(3):31â€“44
Kajiyama F, Koyama Y (1997) Statistical analyses of field corrosion data for ductile cast iron pipes buried in sandy marine sediments. Corrosion 53(2):156â€“162
Kamran M, Ullah B, Ahmad M, Sabri MMS (2022) Application of KNNbased isometric mapping and fuzzy cmeans algorithm to predict shortterm rockburst risk in deep underground projects
Katano Y, Miyata K, Shimizu H, Isogai T (2003) Predictive model for pit growth on underground pipes. Corrosion 59(2):155â€“161
Kleiner Y, Rajani B (2001) Comprehensive review of structural deterioration of water mains: statistical models. Urban Water 3(3):131â€“150
Lehman M (2022) The American Society of Civil Engineersâ€™ report card on Americaâ€™s infrastructure. In: Women in infrastructure. Springer, p 5â€“21
Logan K, Koenig E (1939) A comparison of methods for estimation of the corrosivity of soils. Oil Gas J 38:130
Najjaran H, Sadiq R, Rajani B (2004) Modeling pipe deterioration using soil propertiesan application of fuzzy logic expert system. In: Pipeline engineering and construction: whatâ€™s on the horizon? p 1â€“10
Nicholas D, Ferguson P (2005) Accurate prediction of cast iron watermain performance using linear polarisation resistance (LPR) methodology. In: NZWWA 2005 Conference, Auckland, NZ
Oâ€™Day DK, Weiss R, Chiavari S, Blair D, Clark RM, Association AWW (1988) Water main evaluation for rehabilitation/replacement, EPA
Pandey MD (1998) Probabilistic models for condition assessment of oil and gas pipelines. NDT E Int 31(5):349â€“358
Peng XY, Zhang P, Chen LQ (2009) Longdistance oil/gas pipeline failure rate prediction based on fuzzy neural network model. In: Computer Science and Information Engineering, 2009 WRI World Congress on, IEEE
Rajani B (2000) Investigation of grey cast iron water mains to develop a methodology for estimating service life. American Water Works Association
Rajani B, Kleiner Y (2001) Comprehensive review of structural deterioration of water mains: physically based models. Urban Water 3(3):151â€“164
Rajani B, Makar J (2000) A methodology to estimate remaining service life of grey cast iron water mains. Can J Civ Eng 27(6):1259â€“1272
Ren CY, Qiao W, Tian X (2012) Natural gas pipeline corrosion rate prediction model based on BP neural network. In: Fuzzy engineering and operations research. Springer, p 449455
Rossum JR (1969) Prediction of pitting rates in ferrous metals from soil parameters. J Am Water Works Assoc 61(6):305â€“310
Sadiq R, Kleiner Y, Rajani B (2004) Fuzzy cognitive maps for decision support to maintain water quality in ageing water mains. In: 4th international conference on decisionmaking in urban and civil engineering, Porto, Portugal
Sadiq R, Rajani B, Kleiner Y (2004) Probabilistic risk analysis of corrosion associated failures in cast iron water mains. Reliab Eng Syst Saf 86(1):1â€“10
Sawhney A, Mund A (2002) Adaptive probabilistic neural networkbased crane type selection system. J Constr Eng Manag 128(3):265â€“273
Schlick W (1940) Supporting strength of cast iron pipe for gas and water services. Bulletin 146
Shahani NM, Kamran M, Zheng X, Liu C (2022) Predictive modeling of drilling rate index using machine learning approaches: LSTM, simple RNN, and RFA. Pet Sci Technol 40(5):534â€“555
Snider B, McBean EA (2020) Improving urban water security through pipebreak prediction models: machine learning or survival analysis. J Environ Eng 146(3):04019129
Spangler MG, Handy RL (1973) Soil engineering
Tabesh M, Soltani J, Farmani R, Savic D (2009) Assessing pipe failure rate and mechanical reliability of water distribution networks using datadriven modeling. J Hydroinf 11(1):1â€“17
Thomas L (2000) Fundamentals of decision making and priority theory. RWS Publications, Pittsburgh, p 21
Tsai CW, Franceschini S (2005) Evaluation of probabilistic point estimate methods in uncertainty analysis for environmental engineering applications. J Environ Eng 131(3):387â€“395
Ullah B, Kamran M, Rui Y (2022) Predictive modeling of shortterm rockburst for the stability of subsurface structures using machine learning approaches: TSNE, KMeans clustering and XGBoost. Mathematics 10(3):449
Valor A, Caleyo F, Hallen JM, VelÃ¡zquez JC (2013) Reliability assessment of buried pipelines based on different corrosion rate models. Corros Sci 66:78â€“87
Vanaei H, Eslami A, Egbewande A (2017) A review on pipeline corrosion, inline inspection (ILI), and corrosion growth rate models. Int J Press Vessels Pip 149:43â€“54
Wang X, Mazumder RK, Salarieh B, Salman AM, Shafieezadeh A, Li Y (2022) Machine learning for risk and resilience assessment in structural engineering: progress and future trends. J Struct Eng 148(8):03122003
Wilson D, Filion Y, Moore I (2017) Stateoftheart review of water pipe failure prediction models and applicability to largediameter mains. Urban Water J 14(2):173â€“184
Yamijala S, Guikema SD, Brumbelow K (2009) Statistical models for the analysis of water distribution system pipe break data. Reliab Eng Syst Saf 94(2):282â€“293
Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295â€“316
Young OC, Trott J (2014) Buried rigid pipes. CRC Press
Funding
The research is partially supported by US National Science Foundation, grant No. 1638320.
Author information
Authors and Affiliations
Contributions
Xiong Yu: envision the overarching goal of student, guide the study and analyses, proofread the manuscript. Xudong Fan: implement the research plan, draft the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
N/A. No human or animal studies are involved.
Competing interests
The authors declare no competing interests.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fan, X., Yu, X. Machine learningassisted optimal schedule of underground water pipe inspection. J Infrastruct Preserv Resil 4, 20 (2023). https://doi.org/10.1186/s43065023000865
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1186/s43065023000865