Machine learning-assisted optimal schedule of underground water pipe inspection

There are over 2.2 million miles of underground water pipes serving the cities in the United States. Many are in poor conditions and deteriorate rapidly. Failures of these pipes could cause enormous financial losses to the customers and communities. Inspection provides crucial information for pipe condition assessment and maintenance plan; it, however, is very expensive for underground pipes due to accessibility issues. Therefore, water agencies commonly face the challenge to 1) decide whether it is worthwhile to schedule expensive water pipe inspections under financial constraints, and 2) if so, how to optimize the inspection schedule to maximize its value. This study leverages the physical model and data-based ML (ML) models for underground water pipe failure prediction to shed light on these two important questions for decision making. Analyses are firstly conducted to assess the value of water pipe inspection. Results by use of a physical-based failure model and Monte Carlo simulations indicate that by inspecting pipe’s condition, i.e., assessment of pipe’s erosion depth, the uncertainty of water pipe failure prediction can be narrowed down by 51%. For optimal inspection schedule, an artificial neural network (ANN) model, trained with historical inspection data, is evaluated for its performance in forecasting the future pipe failure probability. The results showed that a biased pipe failure prediction can occur under limited rounds of inspection. However, incorporating more rounds of inspection allows to predict the pipe failure conditions over its life cycle. From this, an optimal inspection plan can be proposed to achieve the maximum benefits of inspection in uncertainty reduction. A few salient results from the analyses include 1) the optimal schedule for inspection is not necessarily equal in the time interval, 2) by setting the goal of uncertainty reduction, an optimal inspection schedule can be obtained, where ML (ML) model augmented by continuously training with inspection data allows to reliably predict water pipe failure conditions over its life cycle. While this study focuses on underground pipe inspection, the general observations and methodology are applicable to optimize the inspection of other types of infrastructure as well.


Introduction
Over 2.2 million miles of water pipes are buried under the U.S. cities, delivering reliable water to millions of people.However, many of them are in a poor condition and are deteriorating rapidly [5].The failure of these pipes could cause enormous financial losses to the local business and communities.There are more than 700 water main breaks every day in Canada and USA [25], which results in huge economic and social losses.In 2009, the American Society of Civil Engineers issued a USA Infrastructure Report Card and gave a D-to drinking water and wastewater infrastructure, which recently grows to C-in 2021 [23].As stated by the American Water Work Association (AWWA), we stand today at the dawn of a new era, the replacement era, for water utilities.These replacement costs combined with projected expansion Page 2 of 14 Fan and Yu J Infrastruct Preserv Resil (2023) 4:20 will cost more than $ 1 trillion over the next couple of decades [7].
The failure of water pipes can incur enormous losses, including water containment, water shortage, and financial losses with associated societal or environmental impacts.At present, only 47% of water utilities have used a pipe failure model for the water pipe replacement strategies [8].Part of the reason is the lack of sufficient historical data for developing a more accurate pipe failure prediction model [40].In addition, the influence of a limited number of inspection records on the maintenance plan is still unclear.Therefore, it is to the benefit of society to understand the value of inspections and to ensure the inspection schedule can maximize the benefit of inspections.This study demonstrates that the value of inspection is to reduce the uncertainty in pipe condition assessment and that ML model provides a way to incorporate inspection data to optimize the inspection period.
The existing prediction models for water pipe conditions (i.e., failure probability), are classified into three categories, i.e., physics-based models, statistical models, and ML based models [10,49].Physical models consider the physical mechanisms which contribute to the water pipe failure.Three physical aspects are often considered by a physical model, i.e., the material properties and structural design of pipe, the internal and external loads to the pipe, and material deterioration including corrosion affected by the environment and service time [31].When the residual structural capacity can't support the internal and external loads, the water pipe fails.The books by Young and Trott [52] provide well explanation of the mechanical behavior of buried water pipe.Ahammed and Melchers [3] used Spangler-Watkins in-plane pipe-soil interaction to obtain an estimated failure probability in steel pipe, and also used the first-order second-moment (FOSM) method to the prediction of the failure probability.Pandey [28] and Valor et al. [46] presented a method where the failure probability is obtained by Monte Carlo simulation.
Statistical models typically analyze data from historical records and try to find the trend by curve fitting with a mathematical equation.Yamijala et al. [50] compared four types of statistical models for the pipe failure probability at different ages.The existing statistical models including time linear ordinary least squares regression, time exponential ordinary least squares regression.The logistic Gaussian linear model is believed to have a better ability to the regression and prediction of water pipe reliability [50].Keliner and Rajani also summarized a lot of work and historical data [22].The statistical models are found economically viable approach for the smaller distribution water main.Both statistical and physical models need to be validated and improved with more data.
Both of them have their own drawbacks.For example, statistical model tries to regression the historical data by certain limited mathematical equations, and physical model requires a deep understanding of the mechanism about water pipe failure.
Machine Learning (ML) is an emerging method for the prediction of structural failures [45,48], geology [39], and underground structures [20].For example, Ren et al. [33] predicted the corrosion rate by using back propagation neural network.Peng et al. [29] developed a model for predicting the failure rate of oil and gas pipelines by fuzzy neural network.Tabesh et al. [42] applied the ANN, ANFIS and Nonlinear Regression methods to assess the pipe failure rate of water distribution networks, and found that the ANN is the most robust method.Sadiq et al. [35] use the ANN model to predict the water pipe condition where the relationships among variables are unknown.Sawhney and Mund [37] added that the ANN is useful to represent problems where solutions are not clearly identified.Rajani and Kleiner [31] applied the ANN model to water distribution network.Thomas [43] used an ANN model in multi-criteria decision making and prediction problems.An AHP model was also developed to find out the key factors that influence the failure of water pipelines, then used ANN to predict the failure [13,14].Fan et al. [17] also considered five different ML algorithm for the pipe failure prediction.
Although with different methods have been developed for assisting the maintenance decisions, a rich historical dataset is required for the model calibration or training.However, such datasets that includes pipe condition at different ages and environments are difficult to obtain, especially for small utilities who are just beginning to record their assets.Therefore, it is important that the decision-making process can be calibrated by accumulated inspection data.However, few studies have considered the value of inspections to the ML-based prediction models and how to optimize inspection intervals.
This paper aims to quantify the benefits of conducting pipe inspection and ML support of the development of optimal inspection intervals.Mont Carlo simulations were conducted with a widely accepted physics-based model to generate data about the future pipe conditions.Inspection at a given service time was assumed to capture the pipe conditions.A ML model, Artificial Neural Network, is trained with inspection data and evaluated for its capability to forecast the future water pipe failure probability.The results unveil a few interesting findings.Firstly, conducting inspection could significantly reduce the uncertainty range for the pipe performance forecasting.Secondly, not all inspections conducted at the same time interval bring equal value.For example, inspection at the early stage and final stage of pipe service life may only add limited values to the performance forecasting.Based on the goal of uncertainty control on pipe condition assessment, an optimal inspection plan can be determined based on continuous training of the ANN model with additional pipe inspection data.

Methodology
Using a simulation-based method is a common approach in order to generate a sufficient dataset.In this study, we used a widely accepted physical-based water pipe failure model to generate the pipe samples.The pipe's service life is assumed as 100 years.Figure 1 shows the main flowchart of this study.The water pipe's failure physical model is used to generate samples for the ML model's training process (from year 1 to year i).
After that, the ML model is trained and used as a failure prediction model.The failure probabilities of the pipe from year i + 1 to year 100 are computed using the randomly sampled physical factors.Finally, the prediction results of the ML model are compared with that of the physical model to evaluate the value of inspections.

Applied pipe failure physical model
In the current US design standard for cast-iron pipe [6], the pipe is assumed as a rigid body that carry all the internal and external loads.Schlick [38] conducted an experiment which showed that the probability of failure of a grey cast-iron pipe can be calculated by a parabolic relationship between the inside pressure and external loads.Underground water pipes usually support several loads, as studied by Rajani and Kleiner [31].The failure types of pipes are categorized into three major categories by O'Day et al. [27], i.e. 1) Circumferential breaks, caused by longitudinal stresses; 2) longitudinal breaks, caused by transverse stresses (or hoop stress); and 3) split bell, caused by transverse stresses on the pipe joint.
Schematic of these loads are illustrated in Fig. 2. In this study, two types of external stresses are analyzed, i.e., hoop stress and axial stress.The formulations used to calculate the external stresses are summarized in the following context.

Water pipe external stress analysis
In this study, a total of four types of stresses are considered, i.e., the stress of internal fluid pressure, the stress Fig. 1 Overall flowchart of this study Fig. 2 Different types of loads on pipe and the corresponding failure modes Rajani and Kleiner [31] of soil pressure, the stress of frost load, and the stress of traffic load.The computation of these stresses is firstly introduced below.
Stress by the internal fluid pressure [30] where p is the internal pipe pressure, D is the nominal pipe diameter, t is the pipe wall thickness Stress by soil pressure [41] where K m is the bending moment coefficient, γ is the unit weight of soil, B d is the width of ditch, K d is the deflec- tion coefficient, E p is the pipe material elastic modulus, C d is the calculation coefficient Stress by frost load [32] where f frost is frost load multiplier, σ s is the stress by soil pressure.
Stress by traffic loads [2] where I c is the impact factor, A is the effective length of pipe, F is the wheel load traffic, C t is the surface load coefficient Therefore, the total hoop or circumferential stress is calculated by assuming the stresses by these different loads are superimposed, The total axial stress is calculated by considering the stress due to temperature gradient.
where σ F ′ = p 2 × D t − 1 ν p is the stress due to internal fluid pressure, σ T = −E p α p �T is the stress caused by temperature difference [30].
The axial stress due to deflection of pipe was not considered.It is noted that when the support structure failed, the stress due to bending moment can be significant and even cause the failure of water pipe.

Water pipe residual yield strength
The resistance of pipe failure is highly related to the effective structural thickness of pipe wall.For this purpose, the surface corrosion model is utilized to describe the reduction of pipe wall thickness due to corrosion.The wall thickness of water pipe continue decreases over time (1) due to corrosion.The corrosion rate has been studied extensively.For example, Doyle et al [15,16,19,21,24,26] surveyed the condition of buried water pipes.The results indicate that the condition of water pipe is highly related to soil characteristics.
In the paper, an empirical two-phase corrosion model was used in estimation of corrosion depth [36].
where d is the depth of corrosion (mm), a is the final pit- ting rate constant, b is the pitting depth scaling constant, c is corrosion rate inhibition factor.Figure 3 shows one example of prediction corrosion depth by Eq. ( 7), by using of parameters listed in Table 1.The corrosion of metal pipe is due to the establishment of anodic and cathodic areas [12,34].The anodic area was established by local environment initially, such as a crack in the iron oxide layer.Then the cathode will be established somewhere near the pit.Subsequently, there will be anions such as OH − and CI − movements from the anode to the cathode.However, with the movement of these anions, a layer of ferrous hydroxide [Fe(OH) 3 ] will be generated.And later an intermediate layer of magnetite [Fe 3 O 4 ] will form.This layer of magnetite will stop the anions move from anodic area to the cathodic area.So the corrosion rate will be high at the early stage and then decreases over time [36].Therefore, the development of corrosion by the two-phase model (Eq.( 7), Fig. 3) is consistent with the physic-chemical process associated with the field corrosion.
The residual yield strength of a water pipe can be determined by the empirical relationship proposed by Rajani et al. [30], Eq. ( 8): , α, S are constants used in fracture toughness equations; β is the geometric factor for a dou- ble-edge notched tensile specimen; a n is the lateral dimension of pit; K q is the provisional fracture tough- ness, a 1 , b 1 are constants for determining the geometric factor β , d is the depth of corrosion pit which can be esti- mated by Eq. (7).

Water pipe failure criteria
Failure of a water pipe occurs when either its hoop stress or axial stress exceed its residual yield strength [36].By introducing the concept of factor of safety (FOS), the water pipe failure criteria can also be written as Eq. ( 9).When FOS is larger than 1, the pipe is assumed to be safe; otherwise, the pipe is assumed to fail.(7) where σ Y , σ X , σ θ are determined by Eqs. ( 5), ( 6) and ( 8) respectively. (

Water pipe failure probability
There are significant uncertainties with the parameters used to determine the hoop stress, axial stress and residual yield strength.The FOS and consequent probability of   [44].Monte Carlo simulation is an effective method to model the stochastic process.It is utilized in this study.The variables required to determine the failure conditions of water pipe is assumed to follow special statistic distributions.The distribution of these variables are listed in Table 1, modified from Sadiq et al. [36].
Figure 4 shows the flow chat of using Monte Carlo simulation to determine distribution of FOS and failure probability of a single pipe.For each loop, the values of considered variables are randomly generated using the probability distribution parameters as noted in Table 1.
Based on the random generated value, the FOS is computed by Eq. 9.After iterating N times, the times of failures, n, can be recorded and the failure probability of this pipe can be computed by Eq. 10.
where n is the number of failure times in the iteration, and N is the total number of iterations.

Artificial Neuron Network (ANN)
Artificial Neural Network (ANN) is a widely used ML model.Its architecture includes interconnected neurons in the input layers, hidden layers and output layer, which determines its overall performance [1].Increasing the number of neurons and hidden layers can improve the ability of ANN model to describe nonlinear relationships.It, however, also increase the computational demand and potentially lead to overfitting.A conceptual architecture of a neural network is shown in Fig. 5.The input layer consists of i neurons, which are corresponding to the number of input features.The hidden layers provide the capability to model the complex non-linear relationships which are fine-tuned with the training data.The output layer consists of one neuron which is used to classify the output as leaking or not leaking.
The hidden layers include fully connected neurons, the output of each neuron is written as Eq.11.
where y k is the output of each neuron at the hidden layer, x r,k is the output of the last layer, for the first layer of neu- ral network, x r,k is the sample data.ω r,k is the weight of that neuron and b is the bias of that neuron, which are trained with the training datasets by the back-propagation algorithm.f (•) is the activation function used to increase the nonlinear property during the propagation.In this study, the 'ReLU' function is used as the activation function of the hidden layer [4].
The output of the last hidden layer is then transferred into the neurons in the output layer, whose actions is written as below.
where y k is the output of the last hidden layer, and y z is the output of the output layer.ω and b are the weight and bias as described before.g(•) is the sigmoid transfer func- tion defined as Eq. 13 (11) The ANN model in this article is built and trained with TensorFlow in python environment, which learns the relationship between the output and input by a training process to classify the observed data into leaking and non-leaking situations.More detailed mathematical information about ANN can be found at [18].

Results and discussion
A pipe with 6 m effective length, 20 cm internal diameter, and 1 cm original wall thickness is considered as the testbed in this study.The Monte Carlo simulations are repeated 1,000 times for each year.Hence the Monte Carlo simulation provides 1,000 random FOS value for each year, from which the failure probability of each year and the evolution of failure conditions (mean and standard deviation) with service time can be calculated.In this section, the value of inspection is firstly studied.The inspections data samples are assumed based on the accuracy of inspection tools and are generated by Monte Carlo simulation.After that, an ANN model is evaluated in its capability to predict the pipe's failure probability over time using the inspection data.The prediction result of the ANN model is used to demonstrate the importance of an optimal maintenance by comparing it with the theoretically ground truth.

Illustration of the value of pipe inspection
The corrosion inspection is one of the most inspection items in the pipe maintenance procedures.Multiple inspection methods, such as Magnetic flux leakage (MFL), Circumferential MFL, Tri-axial MFL, and Ultrasonics, have been used for the pipeline deterioration inspection [47].To quantify the value of pipe inspections, we assume the corrosion depth is inspected at a specific year, T0.The inspected value is then used as the initial value for Monta Carlo simulation to determine the following years' pipe thickness (Eq.14).In other words, we assume the corrosion depth is a determined value at year T0 by inspection, which is used to predict the pipe conditions during the subsequent years via the Monte Carlo simulations.
where d T is the pipe corrosion depth at year T (> T0), d tT 0 is the inspected corrosion depth at T0, other parameters, a, T, b, and c are the same with Eq. 7.
The distribution of the FOS at each year, with no inspection or with inspection at Year 20, is computed by Monte Carlo Simulations.For each year, the mean value, 10% quantile, and 90% quantile are computed and recorded.Figure 6 shows the computed distribution of FOS over 100 years.The solid lines are the prediction results assuming the pipe is not inspected.As can be seen in Fig. 5, the overall FOS values decrease with the service life increases.After the 70 th year of service, the mean value of FOS is around 1, which implies that there is about 50% probability that the pipe would fail.Assume the average acceptable FOS of 1.5, the corresponding year for the water pipe to reach the threshold is around 37 years.
To assess the effects of inspection, it is assumed that due to a more corrosive underground environment, the pipe's corrosion depth determined by inspection at 20 th year is 5 mm, which is slightly higher than the average value of 4.02 mm.The FOS of the pipe for the subsequently years calculated by the procedures (as (14) illustrated in Fig. 3) after incorporating the inspection data is shown in Fig. 6 by the dash lines.Both the mean value and ranges of FOS after considering the inspection are shown in the figure .The results in Fig. 6 indicate if using average FOS of 1.5 as an acceptable threshold for pipe replacement, the corresponding year for the water pipe to reach the threshold FOS is around 30 years.The result indicates that incorporation of inspection data would forecast unacceptable pipeline failure 8 year earlier than without inspection.From practice perspective, this information will be helpful for agencies to implement preventative maintenance such as corrosion protection measures or replacing the pipe sections before it fails.The final decision is also dependent upon the financial constraints and societal impacts of such actions.
The immediate value of the inspection is studied by comparing the FOS distributions at the 21 st year, which is the next year after the inspection.Both of the forecast FOS distributions follow lognormal distributions.As shown in Fig. 7, compared with without inspection, the possible FOS values of pipe predicted with inspection became more concentrated, and variations are reduced.
Figure 8 compares the standard deviation of FOS over time after inspection.The results show that inspection helps to narrow down the uncertainty range of pipe condition forecast, or FOS.For example, at the end of 40 th year, the variance of the FOS without inspection is 0.967.While with inspection data, the prediction variance in FOS is 0.475.This clearly shows that inspection Overall, the results indicate inspection collects pipe condition data and provide more accurate prediction of its future conditions.As shown in Figs. 5, 6 and 7 shows that value of inspection is to reduce the range of uncertainty in forecasting the future conditions of the pipe.The variations in the model forecast are reduced (as much as 50.8% in this case) in this case after incorporating inspection data.The reduced variation in model forecast will help reducing the uncertainty during the decision-making process.The value of inspection gradually decreases over time, as indicated by that the range of forecast FOS uncertainty continues to increase over time.At certain time, another round of inspection is needed to reduce the uncertainty on pipe condition assessment.
It is also noted that, the inspection program analyzed here only considered the depth of corrosion, which is related to the capacity of the pipe to resist failure.Other parameters related to service load, such as internal water pressure, frost effects, and traffic loads, are not included.This information, if available, can be included to further reduce the uncertainty in the forecast of water pipe failure.

ML model with inspection data for reliable forecast of future pipe conditions
The previous discussions show inspection bring values in reducing the uncertainty in underground water pipe condition assessment.The natural next question is how to effectively incorporate inspection data for pipe failure probability prediction.We explore data-driven ML models to facilitate such purpose.The advantage of ML model is that it can extract the features from the complex dataset without requiring a predefined physics-based model.
The data-driven nature requires data for ML model training and validation.An Artificial Neuron Network (ANN) is used in this study due to its simplicity.A few assumptions are made in this study to evaluate the ability of ANN model in predicting the pipe failure probability at different ages.Firstly, we assume the inspection datasets are obtained from a large water system.Specifically, we assume that there are 1,000 pipes with the same initial physical installed in the water system, and the inspection process could provide values of all the variables according to the distribution listed in Table 1.We also assume the physics-based model introduced in Illustration of the value of pipe inspection section could reflect the pipe failure mechanism accurately.Hence the corresponding pipes' status can be determined by the Monte Carlo simulation via the physics-based model.With the first assumption, the inspection will allow to capture the condition data for the 1,000 pipes, which would provide enough number of data samples for the ANN training and validation.The second assumption allows to use the average pipe conditions predicted by the physics-based model as a ground truth for evaluating the ML model performance.
To train the ANN model, we firstly generate the pseudo inspection dataset that contains the variable values and pipe status based on the assumptions.An ANN model with optimal structures is obtained after hyperparameter optimization.The ANN model contains 1 input layer, 1 output layer, and 6 hidden layers.The activation function 'ReLU' is selected for the input layer and the hidden layers.The 'Sigmoid' function was used in the output layer as the activation function.The input layer contains 21 neurons that corresponding to all the pipe variables listed in Table 1.The output layer contains 1 neuron which classify the condition of the pipe as either 'failure' or 'functional' based on its FOS.The hyperparameters of the considered ANN model are determined based on a trialand-error process.It should be noted some techniques, such as grid-searching and Bayesian optimization, can also be used to determine these hyperparameters [51].We split the generated pseudo inspection data into training and testing dataset with a ratio of 7:3.The ANN model's accuracy is evaluated by comparing the prediction results of testing set and the pipe status of the generated dataset.The accuracy of the trained ML achieves 99.1%.This high accuracy is because the ML model can well learn the pattern of the predefined physical model.
The process of predicting the pipe failure probability assumes that the variables (except pipe thickness) in the future continue to follow the same distributions of inspection records which are also the predefined distributions in Table 1.Therefore, for each year with no inspection, 1,000 sets of random variables are generated except the pipe's current thickness.The pipe's thickness data is replaced with the pipe's remaining thickness based on the results of nearest previous inspection.Thereafter, the generated datasets are fed into the trained ANN model.Accordingly, the model will predict either the pipe is failure or functional based on the inputs.The failure probability of the pipe can be computed by dividing the number of failures with the total number of samples (1,000 samples), as shown in Eq. 10.
The ground truth curve is obtained directly from the Monte Carlo simulation with physics-based model.And it is used as the baseline for comparison to verify the accuracy of the predicted results by ANN.The baseline is shown as 'original data' in Fig. 8.To evaluate the value of inspection, we firstly assume the inspection is conducted regularly at 10-years interval.For example, if the inspection is conducted over 40-year period, pipe condition data is assumed to be collected at 10 th , 20 th , 30 th and 40 th and are used for ANN model training.The trained ANN model is used to predict pipe failure probability for the subsequent years.Similar analyses are conducted for ANN model trained with inspection data collected at 10 year time interval over 50 years, 60 years, or 70 years of pipe service; the results are compared with the ground truth (Fig. 9).
Figure 9 shows the comparison results of predicted pipe conditions (i.e., its failure probability) by ML models trained with different years of inspection data.The comparison with baseline ground truth curve shows that the more training data obtained from inspection, the closer the forecast by the ANN model to the ground truth.For example, the ANN model trained with 40 years of data from 4 rounds of inspections at 10-year interval can predict the pipe failure probability within 4% of the ground truth values in the subsequent 20 years (40 to 60 years).The deviation from the ground truth curve increases in the subsequent years.If the next round of inspection data (i.e., inspection at 50 th year) is available, the further trained ANN model could predict within 2% of the ground truth values for the next 20 years.Incorporating more inspections, similar trends are observed.Interestingly, for this case, ANN model trained with 7 rounds of inspection data (10 th year, 20 th year,30 th year, 40 th year, 50 th year, 60 th year, 70 th year) is able to predict the failure probability that nearly overlap with the ground truth curve.That is, 7 rounds of inspection data (up to 70 th year) would allow to accurately predict the pipe failure probability over the life span of 100 years.
The observations imply that ML model trained with pipe inspection data is able to provide reliable forecast of its failure probability over certain years.The reliability of ML model prediction is further increased by incorporation of more inspection data.Therefore, the value of inspection is to further extend the range of years in reliable pipe condition prediction.It is also noted that inspection data beyond certain years of service (70 years in this case) does not bring added value.This might be attributed to the fact that the corrosion of water pipe has gone into steady deterioration rate at that stage.Therefore, there should be an optimal inspection strategy, in terms of scheduling the inspection, that brings in the maximum value of the inspection data over the service life of the water pipe.This is further discussed in the following context.

Optimal inspection schedule based on ML model
The previous analyses indicate ML model trained with inspection data can predict future pipe conditions.The inaccuracy of its performance, however, can grow over the years in the future.Therefore, an optimal inspection schedule could provide acceptable reliability in pipe condition prediction over the whole service life of water pipe.The term 'optimal' in this case refers to the inspection schedule that using the minimal number of inspections to constraint the uncertainty in future condition prediction to be within ± 5% of the true value based on prior inspection data.The idea of optimal inspection interval from ML model is illustrated in this section.The studied pipe is assumed to have the same design parameters and will be used for 100 years.An ANN model is used for predicting the pipe's failure probability in its remaining life.The failure probability by Monte Carla simulation with physics-based model is used as the baseline (true value).
Figure 10 shows the pipe's ground truth curve and the prediction results of the ANN model.Since the failure probability is low for the first 20 years, we assume the inspection is not necessary until the 20 th year.Therefore, the first inspection is arranged at the 20 th year as shown in Fig. 9.After the inspection, the initial condition data (1 st year) and the inspection data at the 20 th year are used to train the first ANN model.Figure 10 shows the first trained ANN quickly deviates from the ground truth curve in the subsequent years after year 20.Assuming the prediction bias should be controlled within 5% error range of the ground truth, the next inspection date is expected to be arranged at 25 th year.With the years of data (data from the 1 st year and inspection data from both the 20 th and 25 th year), the prediction accuracy of new ANN model can be within 5% for the next 15 years.Therefore, the third inspection can then be made at the 40 th year.With this time's inspection data, the new forecast accuracy is ensured for the next 20 years.The fourth inspection can then be done at 60 th year to cover another 20 years.Finally, the fifth inspection can be done at the 80 th year, which will allow the ML model to cover the remaining service life of the water pipe up to 100 year.Overall, 5 rounds of inspections will be sufficient to The results shown in Fig. 9 indicate that the optimal inspection schedule is not evenly spaced, or the value the inspection data at different years is not equal.A few observations are summarized below.Overall, the results indicate that the inspection can efficiently narrow down the pipe failure uncertainty, as shown in Fig. 6.Using more times of inspections can also reduce the prediction error of the ML-based failure prediction models.The results also indicate that the value of inspections is not equally contributing to the model prediction.For example, the optimal inspection schedule reduces the number of inspections from 10 times (for fixed time inspection at 10-year interval) to 5 times over the 100-year service life, as illustrated in Figs. 9 and 10 respectively.

Conclusion
The deteriorating of water distribution pipes requires a proactive plan for the maintenance, retrofit and renewal.Inspection plays an important role in supporting these decisions.The complex stochastic nature of infrastructure deterioration presents a major challenge to forecast its performance.ML (ML) can potentially provide an important tool to uncover the value from inspection data.Analyses are conducted in this paper by using underground water pipe as the testbed.The results show that the value of inspection is to reduce the uncertainty in the forecast of pipe conditions or its factor of safety.The analyses also show that inspection at different time does not bring equal value, i.e., the optimal schedule for inspection is not necessarily equal in the time interval.An optimal inspection schedule can be designed based on a pre-set acceptable reliability level of ML model (ANN in this case) for future pipe condition forecast.ML model trained with data collected from the optimal inspection schedule can provide cost-effective and reliable forecast of pipe failure probability throughout its service period.
This study illustrates the value and impacts of inspection data on the development of optimal water pipe Fig. 10 Illustration of pipe condition forecast (within ± 5% error) with optimal inspection schedule covering the 100-year life of water pipe

Fig. 4
Fig. 4 Flowchart of Monte Carlo simulation with physics-based model to determine pipe failure probability

Fig. 6
Fig. 6 Development of pipe failure probability (indicated by the mean and range of factor of safety (FOS) over time without inspection (solid lines) or with inspection at Year 20

Fig. 7 Fig. 8
Fig. 7 Distribution of predicted pipe failure condition (i.e., FOS) at the 21st year without inspection or with at Year 20

Fig. 9
Fig. 9 Results of predicted pipe failure probability with ANN model trained with different years of data collected with a fixed 10-year time inspection interval, with results of physics-based model as the ground truth a) Firstly, no inspection is needed during the first 20 years because of the low probability of failure.The inspection data does not bring much added value for ML model.b) Second, a more frequent inspection (between 20 to 40 years) is needed due to the rapid change of water pipe's failure probability.In other words, the inspection data brings higher value in ANN model training for future pipe condition prediction.c) Third, regular inspection with approximate fixed time intervals can be used when pipes between 40-80 years old.It is because on the one hand, the ANN model has received enough amount of data to capture the changing trend of the pipe failure probability.On the other hand, the corrosion rate of the pipe is becoming steady.d) Finally, no value of inspection at the final stage of the water pipe is observed (80-100 years), the ANN model is able to accurately predict the pipe failure probability to the end of service life.This is possibly because the corrosion rate of pipe has become stable at this stage.

Table 1
Summary of the probability distribution of input variables for Monte Carlo simulation [9,11]pipe failure are of stochastic nature.Such problems are typically analyzed by methods such as the Monte Carlo simulation[28], Mean-Value First Order Second Moment, Advanced First Order Second Moment, First order reliability methods[9,11], Rosenblueth's Points Estimation, or Harr's Point Estimation