Skip to main content

Machine learning-assisted optimal schedule of underground water pipe inspection


There are over 2.2 million miles of underground water pipes serving the cities in the United States. Many are in poor conditions and deteriorate rapidly. Failures of these pipes could cause enormous financial losses to the customers and communities. Inspection provides crucial information for pipe condition assessment and maintenance plan; it, however, is very expensive for underground pipes due to accessibility issues. Therefore, water agencies commonly face the challenge to 1) decide whether it is worthwhile to schedule expensive water pipe inspections under financial constraints, and 2) if so, how to optimize the inspection schedule to maximize its value. This study leverages the physical model and data-based ML (ML) models for underground water pipe failure prediction to shed light on these two important questions for decision making. Analyses are firstly conducted to assess the value of water pipe inspection. Results by use of a physical-based failure model and Monte Carlo simulations indicate that by inspecting pipe’s condition, i.e., assessment of pipe’s erosion depth, the uncertainty of water pipe failure prediction can be narrowed down by 51%. For optimal inspection schedule, an artificial neural network (ANN) model, trained with historical inspection data, is evaluated for its performance in forecasting the future pipe failure probability. The results showed that a biased pipe failure prediction can occur under limited rounds of inspection. However, incorporating more rounds of inspection allows to predict the pipe failure conditions over its life cycle. From this, an optimal inspection plan can be proposed to achieve the maximum benefits of inspection in uncertainty reduction. A few salient results from the analyses include 1) the optimal schedule for inspection is not necessarily equal in the time interval, 2) by setting the goal of uncertainty reduction, an optimal inspection schedule can be obtained, where ML (ML) model augmented by continuously training with inspection data allows to reliably predict water pipe failure conditions over its life cycle. While this study focuses on underground pipe inspection, the general observations and methodology are applicable to optimize the inspection of other types of infrastructure as well.


Over 2.2 million miles of water pipes are buried under the U.S. cities, delivering reliable water to millions of people. However, many of them are in a poor condition and are deteriorating rapidly [5]. The failure of these pipes could cause enormous financial losses to the local business and communities. There are more than 700 water main breaks every day in Canada and USA [25], which results in huge economic and social losses. In 2009, the American Society of Civil Engineers issued a USA Infrastructure Report Card and gave a D- to drinking water and wastewater infrastructure, which recently grows to C- in 2021 [23]. As stated by the American Water Work Association (AWWA), we stand today at the dawn of a new era, the replacement era, for water utilities. These replacement costs combined with projected expansion will cost more than $ 1 trillion over the next couple of decades [7].

The failure of water pipes can incur enormous losses, including water containment, water shortage, and financial losses with associated societal or environmental impacts. At present, only 47% of water utilities have used a pipe failure model for the water pipe replacement strategies [8]. Part of the reason is the lack of sufficient historical data for developing a more accurate pipe failure prediction model [40]. In addition, the influence of a limited number of inspection records on the maintenance plan is still unclear. Therefore, it is to the benefit of society to understand the value of inspections and to ensure the inspection schedule can maximize the benefit of inspections. This study demonstrates that the value of inspection is to reduce the uncertainty in pipe condition assessment and that ML model provides a way to incorporate inspection data to optimize the inspection period.

The existing prediction models for water pipe conditions (i.e., failure probability), are classified into three categories, i.e., physics-based models, statistical models, and ML based models [10, 49]. Physical models consider the physical mechanisms which contribute to the water pipe failure. Three physical aspects are often considered by a physical model, i.e., the material properties and structural design of pipe, the internal and external loads to the pipe, and material deterioration including corrosion affected by the environment and service time [31]. When the residual structural capacity can’t support the internal and external loads, the water pipe fails. The books by Young and Trott [52] provide well explanation of the mechanical behavior of buried water pipe. Ahammed and Melchers [3] used Spangler-Watkins in-plane pipe-soil interaction to obtain an estimated failure probability in steel pipe, and also used the first-order second–moment (FOSM) method to the prediction of the failure probability. Pandey [28] and Valor et al. [46] presented a method where the failure probability is obtained by Monte Carlo simulation.

Statistical models typically analyze data from historical records and try to find the trend by curve fitting with a mathematical equation. Yamijala et al. [50] compared four types of statistical models for the pipe failure probability at different ages. The existing statistical models including time linear ordinary least squares regression, time exponential ordinary least squares regression. The logistic Gaussian linear model is believed to have a better ability to the regression and prediction of water pipe reliability [50]. Keliner and Rajani also summarized a lot of work and historical data [22]. The statistical models are found economically viable approach for the smaller distribution water main. Both statistical and physical models need to be validated and improved with more data. Both of them have their own drawbacks. For example, statistical model tries to regression the historical data by certain limited mathematical equations, and physical model requires a deep understanding of the mechanism about water pipe failure.

Machine Learning (ML) is an emerging method for the prediction of structural failures [45, 48], geology [39], and underground structures [20]. For example, Ren et al. [33] predicted the corrosion rate by using back propagation neural network. Peng et al. [29] developed a model for predicting the failure rate of oil and gas pipelines by fuzzy neural network. Tabesh et al. [42] applied the ANN, ANFIS and Nonlinear Regression methods to assess the pipe failure rate of water distribution networks, and found that the ANN is the most robust method. Sadiq et al. [35] use the ANN model to predict the water pipe condition where the relationships among variables are unknown. Sawhney and Mund [37] added that the ANN is useful to represent problems where solutions are not clearly identified. Rajani and Kleiner [31] applied the ANN model to water distribution network. Thomas [43] used an ANN model in multi-criteria decision making and prediction problems. An AHP model was also developed to find out the key factors that influence the failure of water pipelines, then used ANN to predict the failure [13, 14]. Fan et al. [17] also considered five different ML algorithm for the pipe failure prediction.

Although with different methods have been developed for assisting the maintenance decisions, a rich historical dataset is required for the model calibration or training. However, such datasets that includes pipe condition at different ages and environments are difficult to obtain, especially for small utilities who are just beginning to record their assets. Therefore, it is important that the decision-making process can be calibrated by accumulated inspection data. However, few studies have considered the value of inspections to the ML-based prediction models and how to optimize inspection intervals.

This paper aims to quantify the benefits of conducting pipe inspection and ML support of the development of optimal inspection intervals. Mont Carlo simulations were conducted with a widely accepted physics-based model to generate data about the future pipe conditions. Inspection at a given service time was assumed to capture the pipe conditions. A ML model, Artificial Neural Network, is trained with inspection data and evaluated for its capability to forecast the future water pipe failure probability. The results unveil a few interesting findings. Firstly, conducting inspection could significantly reduce the uncertainty range for the pipe performance forecasting. Secondly, not all inspections conducted at the same time interval bring equal value. For example, inspection at the early stage and final stage of pipe service life may only add limited values to the performance forecasting. Based on the goal of uncertainty control on pipe condition assessment, an optimal inspection plan can be determined based on continuous training of the ANN model with additional pipe inspection data.


Using a simulation-based method is a common approach in order to generate a sufficient dataset. In this study, we used a widely accepted physical-based water pipe failure model to generate the pipe samples. The pipe’s service life is assumed as 100 years. Figure 1 shows the main flowchart of this study. The water pipe’s failure physical model is used to generate samples for the ML model’s training process (from year 1 to year i). After that, the ML model is trained and used as a failure prediction model. The failure probabilities of the pipe from year \(i+1\) to year 100 are computed using the randomly sampled physical factors. Finally, the prediction results of the ML model are compared with that of the physical model to evaluate the value of inspections.

Fig. 1
figure 1

Overall flowchart of this study

Applied pipe failure physical model

In the current US design standard for cast-iron pipe [6], the pipe is assumed as a rigid body that carry all the internal and external loads. Schlick [38] conducted an experiment which showed that the probability of failure of a grey cast-iron pipe can be calculated by a parabolic relationship between the inside pressure and external loads. Underground water pipes usually support several loads, as studied by Rajani and Kleiner [31]. The failure types of pipes are categorized into three major categories by O’Day et al. [27], i.e. 1) Circumferential breaks, caused by longitudinal stresses; 2) longitudinal breaks, caused by transverse stresses (or hoop stress); and 3) split bell, caused by transverse stresses on the pipe joint.

Schematic of these loads are illustrated in Fig. 2. In this study, two types of external stresses are analyzed, i.e., hoop stress and axial stress. The formulations used to calculate the external stresses are summarized in the following context.

Fig. 2
figure 2

Different types of loads on pipe and the corresponding failure modes Rajani and Kleiner [31]

Water pipe external stress analysis

In this study, a total of four types of stresses are considered, i.e., the stress of internal fluid pressure, the stress of soil pressure, the stress of frost load, and the stress of traffic load. The computation of these stresses is firstly introduced below.

Stress by the internal fluid pressure [30]

$$\begin{array}{c}{\sigma }_{F}=\frac{pD}{2t}\end{array}$$

where p is the internal pipe pressure, D is the nominal pipe diameter, t is the pipe wall thickness

Stress by soil pressure [41]

$$\begin{array}{c}{\sigma }_{S}=\frac{3{K}_{m}\gamma {B}_{d}^{2}{C}_{d}{E}_{p}tD}{{E}_{p}{t}^{3}+3{K}_{d}P{D}^{3}}\end{array}$$

where \({K}_{m}\) is the bending moment coefficient, \(\gamma\) is the unit weight of soil, \({B}_{d}\) is the width of ditch, \({K}_{d}\) is the deflection coefficient, \({E}_{p}\) is the pipe material elastic modulus, \({C}_{d}\) is the calculation coefficient

Stress by frost load [32]

$$\begin{array}{c}{\sigma }_{L}={f}_{frost}{\sigma }_{S}\end{array}$$

where \({f}_{frost}\) is frost load multiplier, σs is the stress by soil pressure.

Stress by traffic loads [2]

$$\begin{array}{c}{\sigma }_{V}=\frac{3{K}_{m}{I}_{c}{C}_{t}F{E}_{p}tD}{A\left({E}_{p}{t}^{3}+3{K}_{d}p{D}^{3}\right)}\end{array}$$

where \({I}_{c}\) is the impact factor, \(A\) is the effective length of pipe, \(F\) is the wheel load traffic, \({C}_{t}\) is the surface load coefficient

Therefore, the total hoop or circumferential stress is calculated by assuming the stresses by these different loads are superimposed,

$$\begin{array}{c}{\sigma }_{\theta }={\sigma }_{F}+{\sigma }_{S}+{\sigma }_{L}+{\sigma }_{V}\end{array}$$

The total axial stress is calculated by considering the stress due to temperature gradient.

$$\begin{array}{c}{\sigma }_{X}={\sigma }_{T}+\left({\sigma }_{F}^{\mathrm{^{\prime}}}+{\sigma }_{S}+{\sigma }_{L}+{\sigma }_{V}\right){\nu }_{p}\end{array}$$

where \({\sigma }_{F}{\prime}=\frac{p}{2}\times \left(\frac{D}{t}-1\right){\nu }_{p}\) is the stress due to internal fluid pressure, \({\sigma }_{T}=-{E}_{p}{\alpha }_{p}\Delta T\) is the stress caused by temperature difference [30].

The axial stress due to deflection of pipe was not considered. It is noted that when the support structure failed, the stress due to bending moment can be significant and even cause the failure of water pipe.

Water pipe residual yield strength

The resistance of pipe failure is highly related to the effective structural thickness of pipe wall. For this purpose, the surface corrosion model is utilized to describe the reduction of pipe wall thickness due to corrosion. The wall thickness of water pipe continue decreases over time due to corrosion. The corrosion rate has been studied extensively. For example, Doyle et al [15, 16, 19, 21, 24, 26] surveyed the condition of buried water pipes. The results indicate that the condition of water pipe is highly related to soil characteristics.

In the paper, an empirical two-phase corrosion model was used in estimation of corrosion depth [36].


where \(d\) is the depth of corrosion (mm), \(a\) is the final pitting rate constant, \(b\) is the pitting depth scaling constant, \(c\) is corrosion rate inhibition factor.

Figure 3 shows one example of prediction corrosion depth by Eq. (7), by using of parameters listed in Table 1. The corrosion of metal pipe is due to the establishment of anodic and cathodic areas [12, 34]. The anodic area was established by local environment initially, such as a crack in the iron oxide layer. Then the cathode will be established somewhere near the pit. Subsequently, there will be anions such as \({\mathrm{OH}}^{-}\) and \({\mathrm{CI}}^{-}\) movements from the anode to the cathode. However, with the movement of these anions, a layer of ferrous hydroxide \([{\text{Fe}}({\text{OH}}{)}_{3}]\) will be generated. And later an intermediate layer of magnetite \(\left[{\text{F}}{\text{e}}_{3}{\mathrm{O}}_{4}\right]\) will form. This layer of magnetite will stop the anions move from anodic area to the cathodic area. So the corrosion rate will be high at the early stage and then decreases over time [36]. Therefore, the development of corrosion by the two-phase model (Eq. (7), Fig. 3) is consistent with the physic-chemical process associated with the field corrosion.

Fig. 3
figure 3

Two-phase corrosion model

Table 1 Summary of the probability distribution of input variables for Monte Carlo simulation

The residual yield strength of a water pipe can be determined by the empirical relationship proposed by Rajani et al. [30], Eq. (8):

$$\begin{array}{c}{\sigma }_{Y}=\frac{\alpha {K}_{q}}{\beta {\left(\frac{d}{{t}_{res}\sqrt{{a}_{n}}}\right)}^{S}}\end{array}$$

where \(\beta ={a}_{1}{\left(\frac{d}{{t}_{res}}\right)}^{{b}_{1}}\), \(\alpha\)\(S\) are constants used in fracture toughness equations; \(\beta\) is the geometric factor for a double-edge notched tensile specimen; \({a}_{n}\) is the lateral dimension of pit; \({K}_{q}\) is the provisional fracture toughness, \({a}_{1}\), \({b}_{1}\) are constants for determining the geometric factor \(\beta\), \(d\) is the depth of corrosion pit which can be estimated by Eq. (7).

Water pipe failure criteria

Failure of a water pipe occurs when either its hoop stress or axial stress exceed its residual yield strength [36]. By introducing the concept of factor of safety (FOS), the water pipe failure criteria can also be written as Eq. (9). When FOS is larger than 1, the pipe is assumed to be safe; otherwise, the pipe is assumed to fail.

$$\begin{array}{c}FOS=\mathit{min}\left(\frac{{\sigma }_{Y}}{{\sigma }_{X}},\frac{{\sigma }_{Y}}{{\sigma }_{\theta }}\right)FOS=\mathit{min}\left(\frac{{\sigma }_{Y}}{{\sigma }_{X}},\frac{{\sigma }_{Y}}{{\sigma }_{\theta }}\right)\end{array}$$

where σY, σX, σθ are determined by Eqs. (5), (6) and (8) respectively.

Water pipe failure probability

There are significant uncertainties with the parameters used to determine the hoop stress, axial stress and residual yield strength. The FOS and consequent probability of water pipe failure are of stochastic nature. Such problems are typically analyzed by methods such as the Monte Carlo simulation [28], Mean-Value First Order Second Moment, Advanced First Order Second Moment, First order reliability methods [9, 11], Rosenblueth’s Points Estimation, or Harr’s Point Estimation [44].

Monte Carlo simulation is an effective method to model the stochastic process. It is utilized in this study. The variables required to determine the failure conditions of water pipe is assumed to follow special statistic distributions. The distribution of these variables are listed in Table 1, modified from Sadiq et al. [36].

Figure 4 shows the flow chat of using Monte Carlo simulation to determine distribution of FOS and failure probability of a single pipe. For each loop, the values of considered variables are randomly generated using the probability distribution parameters as noted in Table 1. Based on the random generated value, the FOS is computed by Eq. 9. After iterating N times, the times of failures, n, can be recorded and the failure probability of this pipe can be computed by Eq. 10.


where n is the number of failure times in the iteration, and N is the total number of iterations.

Fig. 4
figure 4

Flowchart of Monte Carlo simulation with physics-based model to determine pipe failure probability

Artificial Neuron Network (ANN)

Artificial Neural Network (ANN) is a widely used ML model. Its architecture includes interconnected neurons in the input layers, hidden layers and output layer, which determines its overall performance [1]. Increasing the number of neurons and hidden layers can improve the ability of ANN model to describe nonlinear relationships. It, however, also increase the computational demand and potentially lead to overfitting. A conceptual architecture of a neural network is shown in Fig. 5.

Fig. 5
figure 5

Schematic of ANN architecture

The input layer consists of i neurons, which are corresponding to the number of input features. The hidden layers provide the capability to model the complex non-linear relationships which are fine-tuned with the training data. The output layer consists of one neuron which is used to classify the output as leaking or not leaking.

The hidden layers include fully connected neurons, the output of each neuron is written as Eq. 11.

$$\begin{array}{c}{y}_{k}=f\left(\sum\limits_{r=1}^{I}{x}_{r,k}{\omega }_{r,k}+b\right)\end{array}$$

where \({y}_{k}\) is the output of each neuron at the hidden layer, \({x}_{r,k}\) is the output of the last layer, for the first layer of neural network, \({x}_{r,k}\) is the sample data. \({\omega }_{r,k}\) is the weight of that neuron and b is the bias of that neuron, which are trained with the training datasets by the back-propagation algorithm. \(f(\cdot)\) is the activation function used to increase the nonlinear property during the propagation. In this study, the ‘ReLU’ function is used as the activation function of the hidden layer [4].

The output of the last hidden layer is then transferred into the neurons in the output layer, whose actions is written as below.

$$\begin{array}{c}{y}_{z}=g\left({y}_{k}\omega +b\right)\end{array}$$

where \({y}_{k}\) is the output of the last hidden layer, and \({y}_{z}\) is the output of the output layer. \(\omega\) and \(b\) are the weight and bias as described before. \(g(\cdot)\) is the sigmoid transfer function defined as Eq. 13


The ANN model in this article is built and trained with TensorFlow in python environment, which learns the relationship between the output and input by a training process to classify the observed data into leaking and non-leaking situations. More detailed mathematical information about ANN can be found at [18].

Results and discussion

A pipe with 6 m effective length, 20 cm internal diameter, and 1 cm original wall thickness is considered as the testbed in this study. The Monte Carlo simulations are repeated 1,000 times for each year. Hence the Monte Carlo simulation provides 1,000 random FOS value for each year, from which the failure probability of each year and the evolution of failure conditions (mean and standard deviation) with service time can be calculated. In this section, the value of inspection is firstly studied. The inspections data samples are assumed based on the accuracy of inspection tools and are generated by Monte Carlo simulation. After that, an ANN model is evaluated in its capability to predict the pipe’s failure probability over time using the inspection data. The prediction result of the ANN model is used to demonstrate the importance of an optimal maintenance by comparing it with the theoretically ground truth.

Illustration of the value of pipe inspection

The corrosion inspection is one of the most inspection items in the pipe maintenance procedures. Multiple inspection methods, such as Magnetic flux leakage (MFL), Circumferential MFL, Tri-axial MFL, and Ultrasonics, have been used for the pipeline deterioration inspection [47]. To quantify the value of pipe inspections, we assume the corrosion depth is inspected at a specific year, T0. The inspected value is then used as the initial value for Monta Carlo simulation to determine the following years’ pipe thickness (Eq. 14). In other words, we assume the corrosion depth is a determined value at year T0 by inspection, which is used to predict the pipe conditions during the subsequent years via the Monte Carlo simulations.


where \({d}_{T}\) is the pipe corrosion depth at year T (> T0), \({d}_{tT0}\) is the inspected corrosion depth at T0, other parameters, a, T, b, and c are the same with Eq. 7.

The distribution of the FOS at each year, with no inspection or with inspection at Year 20, is computed by Monte Carlo Simulations. For each year, the mean value, 10% quantile, and 90% quantile are computed and recorded. Figure 6 shows the computed distribution of FOS over 100 years. The solid lines are the prediction results assuming the pipe is not inspected. As can be seen in Fig. 5, the overall FOS values decrease with the service life increases. After the 70th year of service, the mean value of FOS is around 1, which implies that there is about 50% probability that the pipe would fail. Assume the average acceptable FOS of 1.5, the corresponding year for the water pipe to reach the threshold is around 37 years.

Fig. 6
figure 6

Development of pipe failure probability (indicated by the mean and range of factor of safety (FOS) over time without inspection (solid lines) or with inspection at Year 20

To assess the effects of inspection, it is assumed that due to a more corrosive underground environment, the pipe’s corrosion depth determined by inspection at 20th year is 5 mm, which is slightly higher than the average value of 4.02 mm. The FOS of the pipe for the subsequently years calculated by the procedures (as illustrated in Fig. 3) after incorporating the inspection data is shown in Fig. 6 by the dash lines. Both the mean value and ranges of FOS after considering the inspection are shown in the figure.

The results in Fig. 6 indicate if using average FOS of 1.5 as an acceptable threshold for pipe replacement, the corresponding year for the water pipe to reach the threshold FOS is around 30 years. The result indicates that incorporation of inspection data would forecast unacceptable pipeline failure 8 year earlier than without inspection. From practice perspective, this information will be helpful for agencies to implement preventative maintenance such as corrosion protection measures or replacing the pipe sections before it fails. The final decision is also dependent upon the financial constraints and societal impacts of such actions.

The immediate value of the inspection is studied by comparing the FOS distributions at the 21st year, which is the next year after the inspection. Both of the forecast FOS distributions follow lognormal distributions. As shown in Fig. 7, compared with without inspection, the possible FOS values of pipe predicted with inspection became more concentrated, and variations are reduced.

Fig. 7
figure 7

Distribution of predicted pipe failure condition (i.e., FOS) at the 21st year without inspection or with inspection at Year 20

Figure 8 compares the standard deviation of FOS over time after inspection. The results show that inspection helps to narrow down the uncertainty range of pipe condition forecast, or FOS. For example, at the end of 40th year, the variance of the FOS without inspection is 0.967. While with inspection data, the prediction variance in FOS is 0.475. This clearly shows that inspection data significantly reduces the variations or uncertainties in the FOS.

Fig. 8
figure 8

The variations of predicted pipe failure probability FOS without or with inspection (at Year 20)

Overall, the results indicate inspection collects pipe condition data and provide more accurate prediction of its future conditions. As shown in Figs. 5, 6 and 7 shows that value of inspection is to reduce the range of uncertainty in forecasting the future conditions of the pipe. The variations in the model forecast are reduced (as much as 50.8% in this case) in this case after incorporating inspection data. The reduced variation in model forecast will help reducing the uncertainty during the decision-making process. The value of inspection gradually decreases over time, as indicated by that the range of forecast FOS uncertainty continues to increase over time. At certain time, another round of inspection is needed to reduce the uncertainty on pipe condition assessment.

It is also noted that, the inspection program analyzed here only considered the depth of corrosion, which is related to the capacity of the pipe to resist failure. Other parameters related to service load, such as internal water pressure, frost effects, and traffic loads, are not included. This information, if available, can be included to further reduce the uncertainty in the forecast of water pipe failure.

ML model with inspection data for reliable forecast of future pipe conditions

The previous discussions show inspection bring values in reducing the uncertainty in underground water pipe condition assessment. The natural next question is how to effectively incorporate inspection data for pipe failure probability prediction. We explore data-driven ML models to facilitate such purpose. The advantage of ML model is that it can extract the features from the complex dataset without requiring a predefined physics-based model. The data-driven nature requires data for ML model training and validation. An Artificial Neuron Network (ANN) is used in this study due to its simplicity.

A few assumptions are made in this study to evaluate the ability of ANN model in predicting the pipe failure probability at different ages. Firstly, we assume the inspection datasets are obtained from a large water system. Specifically, we assume that there are 1,000 pipes with the same initial physical installed in the water system, and the inspection process could provide values of all the variables according to the distribution listed in Table 1. We also assume the physics-based model introduced in Illustration of the value of pipe inspection section could reflect the pipe failure mechanism accurately. Hence the corresponding pipes’ status can be determined by the Monte Carlo simulation via the physics-based model. With the first assumption, the inspection will allow to capture the condition data for the 1,000 pipes, which would provide enough number of data samples for the ANN training and validation. The second assumption allows to use the average pipe conditions predicted by the physics-based model as a ground truth for evaluating the ML model performance.

To train the ANN model, we firstly generate the pseudo inspection dataset that contains the variable values and pipe status based on the assumptions. An ANN model with optimal structures is obtained after hyperparameter optimization. The ANN model contains 1 input layer, 1 output layer, and 6 hidden layers. The activation function ‘ReLU’ is selected for the input layer and the hidden layers. The ‘Sigmoid’ function was used in the output layer as the activation function. The input layer contains 21 neurons that corresponding to all the pipe variables listed in Table 1. The output layer contains 1 neuron which classify the condition of the pipe as either ‘failure’ or ‘functional’ based on its FOS. The hyperparameters of the considered ANN model are determined based on a trial-and-error process. It should be noted some techniques, such as grid-searching and Bayesian optimization, can also be used to determine these hyperparameters [51]. We split the generated pseudo inspection data into training and testing dataset with a ratio of 7:3. The ANN model’s accuracy is evaluated by comparing the prediction results of testing set and the pipe status of the generated dataset. The accuracy of the trained ML achieves 99.1%. This high accuracy is because the ML model can well learn the pattern of the predefined physical model.

The process of predicting the pipe failure probability assumes that the variables (except pipe thickness) in the future continue to follow the same distributions of inspection records which are also the predefined distributions in Table 1. Therefore, for each year with no inspection, 1,000 sets of random variables are generated except the pipe’s current thickness. The pipe’s thickness data is replaced with the pipe’s remaining thickness based on the results of nearest previous inspection. Thereafter, the generated datasets are fed into the trained ANN model. Accordingly, the model will predict either the pipe is failure or functional based on the inputs. The failure probability of the pipe can be computed by dividing the number of failures with the total number of samples (1,000 samples), as shown in Eq. 10.

The ground truth curve is obtained directly from the Monte Carlo simulation with physics-based model. And it is used as the baseline for comparison to verify the accuracy of the predicted results by ANN. The baseline is shown as ‘original data’ in Fig. 8. To evaluate the value of inspection, we firstly assume the inspection is conducted regularly at 10-years interval. For example, if the inspection is conducted over 40-year period, pipe condition data is assumed to be collected at 10th, 20th, 30th and 40th and are used for ANN model training. The trained ANN model is used to predict pipe failure probability for the subsequent years. Similar analyses are conducted for ANN model trained with inspection data collected at 10 year time interval over 50 years, 60 years, or 70 years of pipe service; the results are compared with the ground truth (Fig. 9).

Fig. 9
figure 9

Results of predicted pipe failure probability with ANN model trained with different years of data collected with a fixed 10-year time inspection interval, with results of physics-based model as the ground truth

Figure 9 shows the comparison results of predicted pipe conditions (i.e., its failure probability) by ML models trained with different years of inspection data. The comparison with baseline ground truth curve shows that the more training data obtained from inspection, the closer the forecast by the ANN model to the ground truth. For example, the ANN model trained with 40 years of data from 4 rounds of inspections at 10-year interval can predict the pipe failure probability within 4% of the ground truth values in the subsequent 20 years (40 to 60 years). The deviation from the ground truth curve increases in the subsequent years. If the next round of inspection data (i.e., inspection at 50th year) is available, the further trained ANN model could predict within 2% of the ground truth values for the next 20 years. Incorporating more inspections, similar trends are observed. Interestingly, for this case, ANN model trained with 7 rounds of inspection data (10th year, 20th year,30th year, 40th year, 50th year, 60th year, 70th year) is able to predict the failure probability that nearly overlap with the ground truth curve. That is, 7 rounds of inspection data (up to 70th year) would allow to accurately predict the pipe failure probability over the life span of 100 years.

The observations imply that ML model trained with pipe inspection data is able to provide reliable forecast of its failure probability over certain years. The reliability of ML model prediction is further increased by incorporation of more inspection data. Therefore, the value of inspection is to further extend the range of years in reliable pipe condition prediction. It is also noted that inspection data beyond certain years of service (70 years in this case) does not bring added value. This might be attributed to the fact that the corrosion of water pipe has gone into steady deterioration rate at that stage. Therefore, there should be an optimal inspection strategy, in terms of scheduling the inspection, that brings in the maximum value of the inspection data over the service life of the water pipe. This is further discussed in the following context.

Optimal inspection schedule based on ML model

The previous analyses indicate ML model trained with inspection data can predict future pipe conditions. The inaccuracy of its performance, however, can grow over the years in the future. Therefore, an optimal inspection schedule could provide acceptable reliability in pipe condition prediction over the whole service life of water pipe. The term ‘optimal’ in this case refers to the inspection schedule that using the minimal number of inspections to constraint the uncertainty in future condition prediction to be within ± 5% of the true value based on prior inspection data. The idea of optimal inspection interval from ML model is illustrated in this section. The studied pipe is assumed to have the same design parameters and will be used for 100 years. An ANN model is used for predicting the pipe’s failure probability in its remaining life. The failure probability by Monte Carla simulation with physics-based model is used as the baseline (true value).

Figure 10 shows the pipe’s ground truth curve and the prediction results of the ANN model. Since the failure probability is low for the first 20 years, we assume the inspection is not necessary until the 20th year. Therefore, the first inspection is arranged at the 20th year as shown in Fig. 9. After the inspection, the initial condition data (1st year) and the inspection data at the 20th year are used to train the first ANN model. Figure 10 shows the first trained ANN quickly deviates from the ground truth curve in the subsequent years after year 20. Assuming the prediction bias should be controlled within 5% error range of the ground truth, the next inspection date is expected to be arranged at 25th year. With the years of data (data from the 1st year and inspection data from both the 20th and 25th year), the prediction accuracy of new ANN model can be within 5% for the next 15 years. Therefore, the third inspection can then be made at the 40th year. With this time’s inspection data, the new forecast accuracy is ensured for the next 20 years. The fourth inspection can then be done at 60th year to cover another 20 years. Finally, the fifth inspection can be done at the 80th year, which will allow the ML model to cover the remaining service life of the water pipe up to 100 year. Overall, 5 rounds of inspections will be sufficient to provide reliable forecast (within 5% error) over the 100-year life cycle of the water pipe.

Fig. 10
figure 10

Illustration of pipe condition forecast (within ± 5% error) with optimal inspection schedule covering the 100-year life of water pipe

The results shown in Fig. 9 indicate that the optimal inspection schedule is not evenly spaced, or the value the inspection data at different years is not equal. A few observations are summarized below.

  1. a)

    Firstly, no inspection is needed during the first 20 years because of the low probability of failure. The inspection data does not bring much added value for ML model.

  2. b)

    Second, a more frequent inspection (between 20 to 40 years) is needed due to the rapid change of water pipe’s failure probability. In other words, the inspection data brings higher value in ANN model training for future pipe condition prediction.

  3. c)

    Third, regular inspection with approximate fixed time intervals can be used when pipes between 40–80 years old. It is because on the one hand, the ANN model has received enough amount of data to capture the changing trend of the pipe failure probability. On the other hand, the corrosion rate of the pipe is becoming steady.

  4. d)

    Finally, no value of inspection at the final stage of the water pipe is observed (80–100 years), the ANN model is able to accurately predict the pipe failure probability to the end of service life. This is possibly because the corrosion rate of pipe has become stable at this stage.

Overall, the results indicate that the inspection can efficiently narrow down the pipe failure uncertainty, as shown in Fig. 6. Using more times of inspections can also reduce the prediction error of the ML-based failure prediction models. The results also indicate that the value of inspections is not equally contributing to the model prediction. For example, the optimal inspection schedule reduces the number of inspections from 10 times (for fixed time inspection at 10-year interval) to 5 times over the 100-year service life, as illustrated in Figs. 9 and 10 respectively.


The deteriorating of water distribution pipes requires a proactive plan for the maintenance, retrofit and renewal. Inspection plays an important role in supporting these decisions. The complex stochastic nature of infrastructure deterioration presents a major challenge to forecast its performance. ML (ML) can potentially provide an important tool to uncover the value from inspection data. Analyses are conducted in this paper by using underground water pipe as the testbed. The results show that the value of inspection is to reduce the uncertainty in the forecast of pipe conditions or its factor of safety. The analyses also show that inspection at different time does not bring equal value, i.e., the optimal schedule for inspection is not necessarily equal in the time interval. An optimal inspection schedule can be designed based on a pre-set acceptable reliability level of ML model (ANN in this case) for future pipe condition forecast. ML model trained with data collected from the optimal inspection schedule can provide cost-effective and reliable forecast of pipe failure probability throughout its service period.

This study illustrates the value and impacts of inspection data on the development of optimal water pipe maintenance strategies. Due to limited pipe failure data samples available, a physics-based model is used to generate data needed for machine learning model training. It should be noted that the physics-based model might or might not sufficiently represent what happen in the real world. Real-world failure records of the pipes should be incorporated when utilizing the framework proposed in this study. However, the proposed framework of incorporating inspection data and ML model to optimal inspection schedule is applicable.

Availability of data and materials

Data is available upon request from the corresponding author.


  1. Abokifa AA, Haddad K, Lo C, Biswas P (2018) Real-time identification of cyber-physical attacks on water distribution systems via machine learning-based anomaly detection techniques. J Water Resour Plan Manag 145(1):04018089

    Article  Google Scholar 

  2. Ahammed M, Melchers R (1994) Reliability of underground pipelines subject to corrosion. J Transp Eng 120(6):989–1002

    Article  Google Scholar 

  3. Ahammed M, Melchers R (1997) Probabilistic analysis of underground pipelines subject to combined stresses and corrosion. Eng Struct 19(12):988–994

    Article  Google Scholar 

  4. Apicella A, Donnarumma F, Isgrò F, Prevete R (2021) A survey on modern trainable activation functions. Neural Netw 138:14–32

    Article  MATH  Google Scholar 

  5. ASCE (2021). 2021 Infrastructure report card.

  6. American Water Works Association (1967) USA standard for thickness design of cast-iron pipe. Standard H1-67. New York

  7. American Water Works Association (2001) Dawn of the replacement era: reinvesting in drinking water infrastructure: an analysis of twenty utilities’ needs for repair and replacement of drinking water infrastructure. American Water Works Association, New York

    Google Scholar 

  8. Campanella K, Andreasen C, Diba A, Himmelberger H, Leighton J, Santini J, Vause K (2016) 2015 establishing the level of progress in utility asset management survey results. Proc Water Environ Fed 1:462–490

    Article  Google Scholar 

  9. Davis P, Burn S, Moglia M, Gould S (2007) A physical probabilistic model to predict failure rates in buried PVC pipelines. Reliab Eng Syst Saf 92(9):1258–1266

    Article  Google Scholar 

  10. Dawood T, Elwakil E, Novoa HM, Gárate Delgado JF (2020) Water pipe failure prediction and risk models: state-of-the-art review. Can J Civ Eng 47(10):1117–1127

    Article  Google Scholar 

  11. De Silva D, Moglia M, Davis P, Burn S (2002) Condition assessment and probabilistic analysis to estimate failure rates in buried pipelines. In: Proceedings of ASTT 5th Conference

  12. Denison I, Darnielle R (1939) Observations on the behavior of steel corroding under cathodic control in soils. Trans Electrochem Soc 76(1):199–214

    Article  Google Scholar 

  13. Dey PK (2003) Analytic hierarchy process analyzes risk of operating cross-country petroleum pipelines in India. Nat Hazard Rev 4(4):213–221

    Article  Google Scholar 

  14. Dey PK (2004) Decision support system for inspection and maintenance: a case study of oil pipelines. IEEE Trans Eng Manage 51(1):47–56

    Article  Google Scholar 

  15. Doyle G, Seica MV, Grabinsky MW (2003) The role of soil in the external corrosion of cast iron water mains in Toronto, Canada. Can Geotech J 40(2):225–236

    Article  Google Scholar 

  16. Ewing S (1932) Rough correlation between corrosiveness and resistivity for alkali soils. Oil Gas J 30:29

    Google Scholar 

  17. Fan X, Wang X, Zhang X, Yu, X. (2022) Machine learning based water pipe failure prediction: the effects of engineering, geology, climate and socio-economic factors. Reliab Eng Syst Saf 219:108185

    Article  Google Scholar 

  18. Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: A tutorial. Computer 29(3):31–44

    Article  Google Scholar 

  19. Kajiyama F, Koyama Y (1997) Statistical analyses of field corrosion data for ductile cast iron pipes buried in sandy marine sediments. Corrosion 53(2):156–162

    Article  Google Scholar 

  20. Kamran M, Ullah B, Ahmad M, Sabri MMS (2022) Application of KNN-based isometric mapping and fuzzy c-means algorithm to predict short-term rockburst risk in deep underground projects

  21. Katano Y, Miyata K, Shimizu H, Isogai T (2003) Predictive model for pit growth on underground pipes. Corrosion 59(2):155–161

    Article  Google Scholar 

  22. Kleiner Y, Rajani B (2001) Comprehensive review of structural deterioration of water mains: statistical models. Urban Water 3(3):131–150

    Article  Google Scholar 

  23. Lehman M (2022) The American Society of Civil Engineers’ report card on America’s infrastructure. In: Women in infrastructure. Springer, p 5–21

  24. Logan K, Koenig E (1939) A comparison of methods for estimation of the corrosivity of soils. Oil Gas J 38:130

    Google Scholar 

  25. Najjaran H, Sadiq R, Rajani B (2004) Modeling pipe deterioration using soil properties-an application of fuzzy logic expert system. In: Pipeline engineering and construction: what’s on the horizon? p 1–10

  26. Nicholas D, Ferguson P (2005) Accurate prediction of cast iron watermain performance using linear polarisation resistance (LPR) methodology. In: NZWWA 2005 Conference, Auckland, NZ

  27. O’Day DK, Weiss R, Chiavari S, Blair D, Clark RM, Association AWW (1988) Water main evaluation for rehabilitation/replacement, EPA

  28. Pandey MD (1998) Probabilistic models for condition assessment of oil and gas pipelines. NDT E Int 31(5):349–358

    Article  Google Scholar 

  29. Peng X-Y, Zhang P, Chen L-Q (2009) Long-distance oil/gas pipeline failure rate prediction based on fuzzy neural network model. In: Computer Science and Information Engineering, 2009 WRI World Congress on, IEEE

  30. Rajani B (2000) Investigation of grey cast iron water mains to develop a methodology for estimating service life. American Water Works Association

  31. Rajani B, Kleiner Y (2001) Comprehensive review of structural deterioration of water mains: physically based models. Urban Water 3(3):151–164

    Article  Google Scholar 

  32. Rajani B, Makar J (2000) A methodology to estimate remaining service life of grey cast iron water mains. Can J Civ Eng 27(6):1259–1272

    Article  Google Scholar 

  33. Ren C-Y, Qiao W, Tian X (2012) Natural gas pipeline corrosion rate prediction model based on BP neural network. In: Fuzzy engineering and operations research. Springer, p 449-455

  34. Rossum JR (1969) Prediction of pitting rates in ferrous metals from soil parameters. J Am Water Works Assoc 61(6):305–310

    Article  Google Scholar 

  35. Sadiq R, Kleiner Y, Rajani B (2004) Fuzzy cognitive maps for decision support to maintain water quality in ageing water mains. In: 4th international conference on decision-making in urban and civil engineering, Porto, Portugal

  36. Sadiq R, Rajani B, Kleiner Y (2004) Probabilistic risk analysis of corrosion associated failures in cast iron water mains. Reliab Eng Syst Saf 86(1):1–10

    Article  Google Scholar 

  37. Sawhney A, Mund A (2002) Adaptive probabilistic neural network-based crane type selection system. J Constr Eng Manag 128(3):265–273

    Article  Google Scholar 

  38. Schlick W (1940) Supporting strength of cast iron pipe for gas and water services. Bulletin 146

  39. Shahani NM, Kamran M, Zheng X, Liu C (2022) Predictive modeling of drilling rate index using machine learning approaches: LSTM, simple RNN, and RFA. Pet Sci Technol 40(5):534–555

    Article  Google Scholar 

  40. Snider B, McBean EA (2020) Improving urban water security through pipe-break prediction models: machine learning or survival analysis. J Environ Eng 146(3):04019129

    Article  Google Scholar 

  41. Spangler MG, Handy RL (1973) Soil engineering

  42. Tabesh M, Soltani J, Farmani R, Savic D (2009) Assessing pipe failure rate and mechanical reliability of water distribution networks using data-driven modeling. J Hydroinf 11(1):1–17

    Article  Google Scholar 

  43. Thomas L (2000) Fundamentals of decision making and priority theory. RWS Publications, Pittsburgh, p 21

    Google Scholar 

  44. Tsai CW, Franceschini S (2005) Evaluation of probabilistic point estimate methods in uncertainty analysis for environmental engineering applications. J Environ Eng 131(3):387–395

    Article  Google Scholar 

  45. Ullah B, Kamran M, Rui Y (2022) Predictive modeling of short-term rockburst for the stability of subsurface structures using machine learning approaches: T-SNE, K-Means clustering and XGBoost. Mathematics 10(3):449

    Article  Google Scholar 

  46. Valor A, Caleyo F, Hallen JM, Velázquez JC (2013) Reliability assessment of buried pipelines based on different corrosion rate models. Corros Sci 66:78–87

    Article  Google Scholar 

  47. Vanaei H, Eslami A, Egbewande A (2017) A review on pipeline corrosion, in-line inspection (ILI), and corrosion growth rate models. Int J Press Vessels Pip 149:43–54

    Article  Google Scholar 

  48. Wang X, Mazumder RK, Salarieh B, Salman AM, Shafieezadeh A, Li Y (2022) Machine learning for risk and resilience assessment in structural engineering: progress and future trends. J Struct Eng 148(8):03122003

    Article  Google Scholar 

  49. Wilson D, Filion Y, Moore I (2017) State-of-the-art review of water pipe failure prediction models and applicability to large-diameter mains. Urban Water J 14(2):173–184

    Article  Google Scholar 

  50. Yamijala S, Guikema SD, Brumbelow K (2009) Statistical models for the analysis of water distribution system pipe break data. Reliab Eng Syst Saf 94(2):282–293

    Article  Google Scholar 

  51. Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316

    Article  Google Scholar 

  52. Young OC, Trott J (2014) Buried rigid pipes. CRC Press

    Google Scholar 

Download references


The research is partially supported by US National Science Foundation, grant No. 1638320.

Author information

Authors and Affiliations



Xiong Yu: envision the overarching goal of student, guide the study and analyses, proofread the manuscript. Xudong Fan: implement the research plan, draft the manuscript.

Corresponding author

Correspondence to Xiong Yu.

Ethics declarations

Ethics approval and consent to participate

N/A. No human or animal studies are involved.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, X., Yu, X. Machine learning-assisted optimal schedule of underground water pipe inspection. J Infrastruct Preserv Resil 4, 20 (2023).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: