 Research
 Open access
 Published:
Modeling retroreflectivity degradation of pavement markings across the US with advanced machine learning algorithms
Journal of Infrastructure Preservation and Resilience volume 5, Article number: 3 (2024)
Abstract
Retroreflectivity is the primary metric that controls the visibility of pavement markings during nighttime and in adverse weather conditions. Maintaining the minimum level of retroreflectivity as specified by Federal Highway Administration (FHWA) is crucial to ensure safety for motorists. The key objective of this study was to develop robust retroreflectivity prediction models that can be used by transportation agencies to reliably predict the retroreflectivity of their pavement markings utilizing the initially measured retroreflectivity and other key project conditions. A total of 49,632 transverse skip retroreflectivity measurements of seven types of marking materials were retrieved from the eight most recent test decks covered under the National Transportation Product Evaluation Program (NTPEP). Decision Tree (DT) and Artificial Neural Network (ANN) algorithms were considered for developing performance prediction models to estimate retroreflectivity at different prediction horizons for up to three years. The models were trained with randomly selected 80% data points and tested with the remaining 20% data points. Sequential ANN models exhibited better performance with the testing data than the sequential DT models. The training and testing R^{2} ranges of the sequential ANN models were from 0.76 to 0.96 and 0.55 to 0.94, respectively, which were significantly higher than the R^{2} range (0.14 to 0.75) from the regression models proposed in past studies. Initial or predicted retroreflectivity, snowfall, and traffic were found to be the most important inputs to model predictions.
Introduction
Pavement Markings are retroreflective longitudinal and transverse lines that are installed on the pavement surfaces to delineate their profile [1, 2]. The most commonly used marking materials in the U.S. are either nondurable (waterborne paint) or durable (thermoplastic, epoxy, polyurea, methyl methacrylate [MMA], and tape) [3]. Markings are a critical component of overall traffic signalization as they define boundaries between moving vehicles and a wellmaintained marking system enhances safety for motorists during daytime, nighttime, and in poor visibility conditions [4].
As per the Federal Highway Administration (FHWA), over 50% of all traffic fatalities occur at night while the majority of travel happens during the daytime [5]. The visibility of pavement markings is primarily dependent on their retroreflectivity [6]. Retroreflectivity is the property of markings that describes the phenomenon in which light originating from vehicle headlights illuminates the visible pavement marking surface and a substantial portion of it returns to the eye of the motorists [7]. Retroreflectivity, measured by the coefficient of retroreflected luminance (R_{L}) in millicandelas per square meter per lux (mcd/m^{2}/lux), is provided through a partial embedment of transparent glass beads inside the markings [6]. An adequate level of R_{L} can reduce 50% of crashes during nighttime and 28% of fatalities in dark, rainy, or snowy weather conditions [8].
R_{L} degrades over time depending on the type of marking material, glass bead properties, climatic conditions, traffic loading, road surface type, and snowplow frequency [9]. Recently, FHWA published a new rule that established national standards for minimum R_{L} levels for longitudinal marking lines on all roads depending on speed limits [10]. This final rule requires all state agencies to implement a method before 2027 to maintain the specified minimum R_{L} level and consider retroreflectivity in future restriping activities, highlighting the importance of regular retroreflectivity monitoring for the marking lines. However, instead of continuously monitoring and restriping when R_{L} drops below the minimum threshold, many transportation agencies restripe markings as per a fixed schedule or visual inspection [11]. These restriping strategies are not optimum as markings are often restriped before or after the end of their service life, resulting in misspending available funds or compromising the safety of the motorists, respectively. As such, modeling retroreflectivity degradation is critical to estimate the service life of the markings and to accordingly plan for future restriping activities.
Objective and scope
The objective of this study was to develop machine learning models that can be used by U.S. transportation agencies to predict, with superior accuracy, the retroreflectivity of pavement markings, over a period of 3 years, based on the initial measured retroreflectivity and other key project conditions. Two different machine learning algorithms, Decision Tree and Artificial Neural Network algorithms, were utilized to develop performance prediction models for seven types of pavement marking materials (waterborne paint, thermoplastic, preformed thermoplastic, permanent polymeric tape, epoxy, polyurea, and MMA) located in three different U.S. climate zones (Southeast, Northeast, and Upper Midwest). The proposed models are expected to provide a scientific basis for transportation agencies in predicting the service life of pavement markings based on local conditions and only one initial retroreflectivity measurement, thereby eliminating the need for the costly monitoring of the retroreflectivity of marking products.
Background
This section provides a brief background about the NTPEP as it is the source of data used in this study. After that, this section presents the results of previous studies that modeled the retroreflectivity degradation of pavement markings to be used as a baseline for comparison to the results of this study. Eventually, this section is concluded with the shortcomings in the current state of knowledge and how these shortcomings are addressed in the developed models presented in this study.
Overview of the National Transportation Product Evaluation Program (NTPEP)
American Association of State Highway and Transportation Officials (AASHTO) implements a consensusbased work plan every year through National Transportation Product Evaluation Program (NTPEP) to evaluate the field performance of a variety of pavement marking products [12]. NTPEP selects test decks from across the U.S., representing various traffic and geographical conditions and installs marking products on those in transverse direction [13]. In a typical test deck, four marking lines are installed sidebyside on an asphalt or concrete surface from the inner side of the “edge” line to the far side of the “skip” line [12] (see Fig. 1). For each line, retroreflectivity measurements are collected from both the “skip” area (within the first 9 in. from the “skip” line, known as transverseskip retroreflectivity, R_{S}) and “wheel” area (9 in. on both sides of the left wheel path, known as transversewheel retroreflectivity, R_{W}) with a handheld retroreflectometer [11]. These readings are collected at 12 different time intervals (after 0, 1, 2, 3, 11, 12, 15, 21, 24, 27, 33, and 36 months) for up to 3 years. Accelerated degradation of retroreflectivity occurs in the ‘wheel’ area due to continuous friction with vehicle tires and does not mimic the service life of an actual longitudinal marking stripe [11]. Traffic condition in “skip” area is more representative of the skipline stripes in the field [14], and therefore, only R_{S} measurements were considered in this study for modeling.
Previous retroreflectivity degradation models
A literature review was conducted on past studies that proposed retroreflectivity degradation models to estimate the service life of pavement markings, as summarized in Table 1. Previous studies mostly adopted the parametric approach, more specifically, regression models to predict future retroreflectivity [9, 15,16,17,18,19,20,21]. Regression models make stringent assumptions about the shape of the mapping function which often significantly differ from the true shape of the relationship between the inputs and output [22]. The adopted datasets in the parametric approach typically demonstrate high dimensionality (i.e., data have many inputs) and high multicollinearity (i.e., high correlations between input variables) [23]. This high dimensionality and high multicollinearity of the adopted datasets for the past regression models prevented the input variables to be independent of each other, violating one of the fundamental assumptions of the parametric approach. As such, an unpredictable variance was imposed on the models weakening their statistical power. High variability in retroreflectivity data made it challenging for the regression models to estimate the service life of the markings with a high level of statistical confidence, even with the collection of more data [24]. Due to these limitations, past regression models [9, 15,16,17,18,19,20,21] predicted retroreflectivity with relatively low accuracy (the coefficient of determination (R^{2}) ranging between 0.14 to 0.75).
Because of the poor reliability of the past regression models, retroreflectivity degradation models were developed using supervised machine learning algorithms [11, 25, 26]. Machine learning algorithms construct models based on a “nonparametric” approach without making explicit assumptions about the shape of the mapping function. These models can efficiently capture complex patterns from the dataset and in general, achieve higher prediction accuracy than the regression models [22]. Karwa and Donnell [25] proposed an Artificial Neural Network model to estimate the service life of thermoplastic markings without considering climatic conditions (i.e., rainfall and/or snowplow activities) as input variables. Mousa et al. [11] predicted retroreflectivity for up to three years with reasonable accuracy (R^{2} = 0.83 – 0.98) using a Categorical Boosting model, but its applicability was limited to waterborne paint markings. More recently, Idris et al. [26] developed Genetic Algorithm models (R^{2} = 0.64 – 0.93) to predict retroreflectivity. However, the models were developed for only thermoplastic markings which only predicted retroreflectivity for up to a year.
Advancements based on previous research
Based on Table 1 it can be observed that most previous retroreflectivity prediction models had limited scope as those were sitespecific and marking materialspecific models. These models were developed considering a few input variables (i.e., time, traffic, line lateral location, and/or initial retroreflectivity). To address the shortcomings of the past studies, retroreflectivity degradation models were developed in this study using two different machine learning algorithms, Decision Tree and Artificial Neural Network. These models were constructed considering all the significant input variables affecting retroreflectivity. Additionally, the scope of the proposed models has been further extended to seven types of commonly used marking materials along with traffic and climatic conditions of different geographical regions across the U. S.
It is worth mentioning that other common machine algorithms besides Decision Tree and Artificial Neural Network, including Support Vector Machine, LightGBM, and KNearest Neighbors, were initially implemented in this study to develop retroreflectvity degradation models. Decision Tree and Artificial Neural Network models yielded the highest prediction accuracy among those, and therefore, the results of Decision Tree and Artificial Neural Network models were only presented in this study.
Data collection
In this study, the measured R_{S} and other relevant variables were retrieved from NTPEP’s online data repository and assembled into a dataset. This dataset included R_{S} readings from the following eight recent NTPEP test decks distributed over three different U.S. climate zones (Southeast, Northeast, and Upper Midwest):

Minnesota: 2010, and 2013

Pennsylvania: 2011, and 2014

Florida: 2012, 2015, and 2019

Wisconsin: 2017
A total of 517 marking products were considered in this study. Each product was installed as eight transverse lines, four lines on an asphalt surface and four lines on a concrete surface, resulting in a total of 4,136 marking lines (517 products × 2 pavement surfaces × 4 marking lines = 4,136 lines). For each line of these 4,136 lines, R_{S} measurements were collected at 12 time intervals (after 0, 1, 2, 3, 11, 12, 15, 21, 24, 27, 33, and 36 months) resulting in a total of 49,632 R_{S} values (517 products × 8 lines per product × 12 R_{S} per line = 49,632). The descriptions of retrieved variables for each marking are presented in Table 2.
Exploratory data analysis
Descriptive statistics
The general descriptive statistics (i.e., minimum, and maximum values, interquartile range, mean, median, and outliers) of the numerical variables were calculated. These descriptive statistics are presented in Fig. 2 as boxwhisker plots. It is worth mentioning that the outliers in MR_{S}, as shown in Fig. 2d, represented true variability of R_{S} measurements. Therefore, these outliers were not removed to maintain the original statistical distribution of the study dataset, which is a common practice for handling outliers [27].
Correlation analysis
A correlation analysis was conducted to determine the degree of association between all the collected variables. Pearson’s R, [28], eta (η) coefficient [29], and Cremer’s V [30], were utilized to evaluate the association between numericnumeric, numericcategorical, and categoricalcategorical variable pairs, respectively. Pearson’s R ranges from 1.0 (a perfect, decreasing, linear association) to 1.0 (a perfect, increasing, linear association) while the values of η coefficient and Cremer’s V lie between 0 (no association) to 1 (a perfect association). The developed correlation matrix is presented in Fig. 3. Among the numericcategorical variable pairs, (TH, T) had the highest η coefficient of 0.9. (E, TR) showed the highest Pearson’s R of 0.7 among the numericnumeric variable pairs. (M, T) exhibited the highest Cramer’s V of 0.6 among categoricalcategorical variable pairs. These high values of η coefficient, Pearson’s R, and Cramer’s V indicate high multicollinearity in the compiled dataset.
Data preprocessing
Both DT and ANN algorithms cannot process categorical data [31]. Therefore, as a data preprocessing step, the categorical variables were converted into numerical forms utilizing Label Encoding technique [32]. This technique designated every unique category a number ranging between 0 and (X1), where X represented the total number of unique categories for this categorical variable.
Model development
This section provides technical details of the machine learning algorithms implemented in this study, followed by an overview of the model development process.
Machine learning algorithms
Two machine learning algorithms were utilized in this study for model development, as described in the following subsections.

a)
Decision Tree (DT) is a machine learning algorithm that builds models in the form of treelike structures consisting of a root node, internal nodes, and leaves that are connected through branches [33]. A simple DT model consisting of two predictors (X_{1} and X_{2}) is illustrated in Fig. 4a. DT uses a series of splitting rules (X_{1} ≤ s_{1}, X_{2} ≤ s_{2},…, X_{2} ≤ s_{4}) to divide the training observations into various regions of the input space utilizing recursive binary splitting technique (see Fig. 4b) [22]. This process is iteratively applied until the objective function is optimized and the leaves (R_{1}, R_{2}, …, R_{5}) are established [34]. The mean response value for the observations (C_{1}, C_{2},…, C_{5}) falling under each leaf is used as the final prediction [35]. The most important hyperparameters that control model architecture and require tuning during the training process of a DT model consist of maximum depth (D), minimum samples split (S), minimum samples leaf (L), and maximum features (F) [36]. The general mathematical form of a DT model and the objective function are presented in Eqs. 1 and 2, respectively.
where,
q(x) = splitting rule
m = number of input variables
t = total number of leaves
\({C}_{q\left(x\right)}\) = mean response of a leaf
where,
\(Obj\left(T\right)\) = objective function
\({\sum }_{i=1}^{N}({{y}_{i}{\widehat{y}}_{i})}^{2}\) = loss function
\(\alpha (T)\)= regularization term

b)
Artificial Neural Network (ANN) is another machine learning algorithm that builds an artificial neural network with a multilayer perceptron (MLP) architecture, consisting of an input layer, the maximum of two hidden layers, and an output layer [37, 38]. A simple ANN model consisting of a hidden layer is illustrated in Fig. 5. The primary processing elements in each layer are neurons (N_{1}, N_{2}, ….., N_{n}) that are interconnected by certain weights. The network is built through a 2stage optimization process, “forward pass” and “backpropagation” [39]. In the forward pass, the inputs (X_{1}, X_{2},., X_{j}) and associated weights (W_{1,1}, W_{2,1},…., W_{hh,k}) are are multiplied by the weights, summed, and added with a bias term \(\left({a}_{H}\right)\) to produce a linear output. The linear output is passed through an activation function \(\left({\varnothing }_{H}\right)\) (i.e., sigmoid, hyperbolic tangent, or Rectified Linear Unit Function (ReLU)) to obtain a nonlinear output [22]. The output from a neuron in the hidden layer acts as an input to the neuron in output layer (K). In backpropagation, the errors made in the forward pass are distributed from the output layer to the input layer through the weights using an optimizer (i.e., adam, or Stochastic Gradient Descent (SGD)) and as a result, the weights are updated [40]. Forward pass and backpropagation are iteratively performed until the objective function is optimized [22]. The most important hyperparameters requiring tuning for an ANN model include number of hidden layers (H), number of neurons (N), batch size (B), activation function (A), optimizer (O), and learning rate (Lr) [36]. The mathematical formulations of an ANN model and objective function are presented in Eqs. 3 and 4, respectively.
where,
\({\widehat{y}}_{n}\)= model prediction
\({w}_{jh}, {w}_{hk}\) = weights between input and the hidden layers and between hidden and output layers
\({x}_{ij}\) = inputs
\({a}_{h}, {a}_{k}\) = bias terms in the hidden layer neuron and output neuron, respectively
N = number of neurons in the hidden layer
\({\varnothing }_{h},{\varnothing }_{0,}\) = activation functions in the hidden layer neuron, and output neuron, respectively.
where,
\(Obj\left(\theta \right)\) = objective function
\(\frac{1}{M}{\sum }_{m =1}^{M}({{y}_{n}{\widehat{y}}_{n})}^{2}\) = loss function
\(\Omega \alpha (\theta )\) = regularization term.
Overview of model development process
In this research, two model development strategies (Strategy A and Strategy B) and were utilized for model development and their results were compared to identify the best strategy of constructing R_{S} prediction models. In Strategy A, two different integral models was developed, one for DT and one for ANN, to predict R_{S} after one month (PR_{S1}), two months (PR_{S2}), three months (PR_{S3}), and 11 months (PR_{S11}), and similarly after 12, 15, 21, 24, 27, 33, and 36 months using S, T, C, M, TH, b, B, E, TR, and SN as inputs and MR_{S} as the target variable. As per this strategy, the general formulations for the DT and ANN models are presented in Eq. 5.
where,
\({PR}_{S}\) = Predicted R_{S}
\(S, T, C, M, TH, b, B\) = Time independent inputs
\(TR, SN\) = Time dependent inputs
On the other hand, utilizing Strategy B, a sequential R_{S} prediction models were developed for both DT and ANN and two separate sets of 11 models (DTA through DTK and ANNA through ANNK) were developed. The schematic of the model inputs and outputs for DT models is illustrated in Fig. 6, as an example. As shown in Fig. 6, Model DTA utilizes initially measured R_{S} (MR_{S0}) and other key input variables at the time of installation (S, T, C, M, TH, b, B, TR_{0}, and SN_{0}) to predict R_{S} after E = 1 month. For Model DTB, the output from Model DTA (PR_{S1}) was combined with other inputs at E = 1 month to predict R_{S} after E = 2 months. This process was iteratively used for the remaining models. Based on the framework illustrated in Fig. 6, the general formulations for the DT and ANN models are presented in Eq. 6.
where,
\({E}_{i}\)= Elapsed times (\({E}_{0}\), \({E}_{1}\), …,\({E}_{36}\))
\({X}_{{E}_{i1}}\)= \({MR}_{S0}\), for \({E}_{i}\) = \({E}_{0}\) or \({PR}_{S{E}_{i1}}\), for \({E}_{i}\) > \({E}_{0}\)
\({PR}_{S{E}_{i}}\) = Predicted R_{S} for month \({E}_{i}\)
\(S, T, C, M, TH, b, B\) = Time independent inputs
\({TR}_{S{E}_{i1}},{SN}_{S{E}_{i1}}\)= Time dependent inputs at previous elapsed time (\({E}_{i1}\))
\({MR}_{S0}\) = Initially measured R_{S}
\({PR}_{S{E}_{i1}}\) = Predicted R_{S} at previous elapsed time (\({E}_{i1}\))
Model training
Training DT models
DT models, developed with Strategy A or Strategy B, were trained with the training dataset (randomly selected 80% of the total data points) when the models learnt patterns from the training dataset. DTbased models are unaffected by multicollinearity [41, 42]. These models are also insensitive to the scale of the inputs as the nodes are split based on a single input and are not affected by other inputs [43]. Therefore, inputs were not scaled for training DT models.
Training ANN models
ANN models, developed with Strategy A or Strategy B, were also trained with the training dataset. ANN uses the gradient descent technique to optimize the objective function and scaling the input variables enables it to reach global minima faster [44]. As such, Standard Scaler [45] was implemented to normalize the input variables by removing the mean and scaling to unit variance to ensure fast convergence. During training, each model was set to train for 1000 iterations. Early stopping was included in the training process to terminate training when validation R^{2} did not improve for 100 consecutive iterations.
Hyperparameter tuning
For both DT and ANN models, model hyperparameters were tuned during the training (Strategy A and Strategy B). Maximum depth (D), minimum samples split (S), minimum samples leaf (L), and maximum features (F) hyperparameters were tuned to prevent overfitting of the DT models. On the other hand, number of hidden layers (H) and neurons (N) were tuned for the ANN models to control model complexity and prevent overfitting. Batch size (B), activation function (A), optimizer (O), and learning rate (Lr) were tuned to achieve improved predictive performance from the ANN models. Moreover, L2 regularization was utilized by tuning alpha (α) hyperparameter to mitigate the effect of high multicollinearity of the assembled dataset.
The tuning of the model hyperparameters for both Strategy A and Strategy B were achieved through the combined implementation of grid search and 10fold crossvalidation techniques [11]. Grid search evaluated all possible combinations of values within the defined hyperparameter space to identify their optimal combination with maximum accuracy. Grid search was accompanied by 10fold crossvalidations, segmenting the training dataset into ten subsets. Training was performed with nine subsets, and validation was done with the remaining subset. This was repeated ten times by changing the validation subset. The average R^{2} value for the ten trials was used to evaluate the performance of the models. The developed hyperparameter spaces and optimum hyperparameter combinations for the models developed with Strategy A and Strategy B are presented in Tables 3, 4, 5 and 6.
The training performance of the models developed with Strategy A and Strategy B was evaluated using coefficient of determination (R^{2}), mean absolute percentage error (MAPE), and root mean square error (RMSE), as illustrated in Table 7 and Fig. 7, respectively. It can be observed from Table 7 that in Strategy A, ANN model provided better training performance than DT model with higher R^{2} (0.82), lower MAPE (44.6%), and lower RMSE (106.9 mcd/m^{2}/lux).
On the other hand, in Strategy B, the training R^{2}, MAPE, and RMSE values were similar for both DT and ANN models until a prediction horizon of 3 months (i.e., DTA to DTC and ANNA to ANNC) (see Fig. 7). Although, in general, a downward trend in R^{2} and an upward trend in MAPE were observed when the prediction horizon was further extended from 3 to 36 months (i.e., DTD to DTK or ANND to ANNK), no trend was observed in RMSE. The decrease in overall training accuracy with an increase in prediction horizon in Strategy B was expected because the predicted values were used as inputs to make further predictions, inducing errors in model predictions. Utilizing Strategy B, DT and ANN models fitted the training data with average R^{2}, MAPE, and RMSE of 0.89/27.1%/53.9 mcd/m^{2}/lux and 0.86/38.1%/61.6 mcd/m^{2}/lux, respectively and therefore, demonstrated better training performance than the models developed with Strategy A. The ranges of training R^{2}, MAPE, and RMSE for DT models were from 0.77 to 0.96, 9.7 to 57.6%, and 46.0 to 67.3 mcd/m^{2}/lux, respectively while for ANN models, these ranges were from 0.76 to 0.96, 10.1 to 62.3%, and 53.5 to 73.5 mcd/m^{2}/lux, respectively. Both DT and ANN models fitted training data well utilizing Strategy B with R^{2} values higher than the models listed in Table 1. Overall, the DT models demonstrated a better training performance than ANN models in Strategy B.
Model testing
Testing dataset (remaining 20% of total data points) was used to evaluate the performance of the developed models with the unseen data. The testing performance of the models developed with Strategy A is presented in Fig. 8. From Fig. 8, it can be observed that, like training performance, ANN provided better testing performance than DT with higher R^{2} (0.76), lower MAPE (50.5%), and lower RMSE (121.6 mcd/m^{2}/lux).
The testing performance of the DT and ANN models developed with Strategy B are shown in Figs. 9 and 10, respectively. Results showed that, like training accuracy, the testing accuracy of these models decreased with an increase of prediction horizon beyond 3 months (Figs. 9 and 10). Using Strategy B, DT and ANN models provided average R^{2}, MAPE, and RMSE of 0.76/41.9%/79.7 mcd/m^{2}/lux and 0.77/45.5%/74.8 mcd/m^{2}/lux, respectively and therefore, demonstrated better testing performance than the models developed with Strategy A. The ranges of testing R^{2}, MAPE, and RMSE for DT models developed with Strategy B were between 0.54 to 0.93, 12.6 to 95.1%, and 67.9 to 95.2 mcd/m^{2}/lux, respectively; while for ANN models, these ranges were between 0.55 to 0.94, 10.6 to 80.7%, and 61.2 to 92.9 mcd/m^{2}/lux, respectively. These models provided close R_{S} estimates at different prediction horizons from a dataset that was not used during model training, and thereby demonstrating their robustness. Most of the ANN models exhibited better testing performance than the corresponding DT models. The testing R^{2} values of these models were also reasonably higher than the R^{2} of most past regression models predicting retroreflectivity for up to 3 years. These high R^{2} values indicated that proposed machine learning models developed with Strategy B can predict R_{S} with a superior level of accuracy than the traditional regression models.
It is worth mentioning that machine learning aims to build models that generalize well to unseen data, ensuring their effectiveness in realworld applications [46]. As such, testing performance is typically prioritized over training performance when selecting the best models [22]. With Strategy B, DT models exhibited better training performance than the ANN models, while the ANN models demonstrated superior testing performance and hence, a better generalization ability on unseen data than the DT models. Therefore, ANN was selected as the better algorithm than DT for the adopted dataset and ANN models developed with Strategy B were considered for further evaluation.
Feature importance study
The importance of each input for every ANN model was assessed using the SHapley Additive exPlanations (SHAP) values [47]. SHAP value for an input represents the contribution of that input to the difference between actual and expected prediction, averaged over all possible permutations of inputs. An input with a higher SHAP value indicates its higher impact on model prediction. For every ANN model, SHAP values for each input were calculated by taking the weighted average of the marginal contribution of the inputs. The mean absolute SHAP values of ANNA inputs are presented in Fig. 11, as an example. The mean absolute SHAP values for the remaining models are reported in Table 8.
For ANNA, initially measured retroreflectivity (MR_{S} at E = 0) was the most important input (mean absolute SHAP value = 166.9) (in Fig. 11). For the remaining models (in Table 8), predicted skip retroreflectivity (PR_{S}) had the highest impact on model prediction. Additionally, SN followed by TR were the most important inputs after MR_{S} or PR_{S} for all ANN models (in Fig. 11 and Table 8). Most retroreflectivity degradation models in the literature did not consider these key inputs simultaneously which might explain their relatively low accuracy.
Illustrative implementation of the proposed models
Before using a pavement marking product in a project, a transportation agency might be interested in determining the expected service life of a specific marking product based on a specified minimum retroreflectivity threshold. As an example, the process of implementing the proposed models to estimate the service life of one of the marking products (included in testing dataset) is explained in the following steps with the aid of Tables 9 and 10.
Step 1: Initial retroreflectivity measurement
Firstly, initial R_{S} of the marking line right after field installation needs to be determined. This could be measured in the field at the project level or assumed based on previous similar projects. As shown in Table 9, this value was 329 mcd/m^{2}/lux.
Step 2: Collection of inputs over 3 years
In Step 2, all the other inputs in Tables 9 and 10 should be collected at different elapsed times (i.e., E = 0, 1, 2, 3, 11, 12, 15, 21, 24, 27, and 33 months). S, T, C, M, Th, b, and B are timeindependent variables and could be easily determined for a specific product. TR and SN vary with E, which could be determined, predicted, or assumed based on historical data. It should be noted that for accurate prediction, all the numerical inputs applied at different E should lie within the ranges illustrated in Fig. 2.
Step 3: Sequential use of the proposed models
In this step, the user will assign all the input data at E = 0 and employ ANNA to calculate PR_{S} at month 1 (369 mcd/m^{2}/lux in Table 9). Afterward, the user will employ all the input data at E = 1 along with PR_{S} at month 1 (369 mcd/m^{2}/lux in Table 9) and employ ANNB to calculate PR_{S} at month 2 (343 mcd/m^{2}/lux in Table 9). This process is repeated for all the remaining values of E and for all the remaining models in Tables 9 and 10 until the PR_{S} at month 36 (96 mcd/m^{2}/lux in Table 10) is obtained. A comparison between measured R_{S} (as collected from NTPEP) and predicted R_{S} from the proposed models is illustrated in Fig. 12.
Step 4: Transverse to longitudinal retroreflectivity conversion
Based on the agency’s policy, the predicted transverse skip retroreflectivity (R_{S}) values at different E should be transformed into longitudinal retroreflectivity (R_{L}), which mimics the actual field conditions. It has been widely accepted to assume that R_{S} correlated well to R_{L} [14]. Furthermore, a recent study proposed simple models to perform the conversion between R_{S} and R_{L} [48].
Step 5: Service life estimation
Based on the R_{L} values obtained in step 4 at different E, the service life of the marking line can be estimated until the time when a marking product’s R_{L} drops below 100 mcd/m^{2}/lux. As per the model predictions in Fig. 12, the expected service life of this marking product is somewhere between 27 to 33 months.
Summary and conclusions
The objective of this study was to develop machine learning models that could be utilized by U.S. local and state agencies to reliably predict the retroreflectivity of pavement markings. To fulfill this objective, transverse skip retroreflectivity data and other key variables were retrieved from NTPEP database. DT and ANN algorithms were considered to develop models for predicting R_{S} sequentially for up to 3 years using two different model development strategies. All the models were trained with 80% of the total data points and tested with the remaining 20% data points. The findings and conclusions of the study were as follows:

Correlation analysis of the collected variables indicated that some variables were highly correlated which confirmed high multicollinearity in the study dataset.

Sequential retroreflectivity prediction models demonstrated higher accuracy than integral retroreflectivity prediction models both in training and testing.

Both sequential DT and ANN models, with overall R^{2} ranging between 0.54 to 0.96 and 0.55 to 0.96, predicted retroreflectivity at different prediction horizons with a superior level of accuracy as compared with the regression models proposed in the literature.

ANN models exhibited better testing performance than the DT models, and therefore, were selected as the better algorithm than DT for the adopted dataset.

Study of the feature importance of the input variables to ANN models revealed that initial retroreflectivity or predicted retroreflectivity followed by snowfall and traffic were the most important inputs to model predictions.

The proposed models are expected to assist state agencies and transportation officials in determining the service life of pavement marking products and plan for future restriping activities accordingly.
Overall, nonparametric supervised machine learning algorithms seemed to be a promising alternative to traditional parametric methods in modeling retroreflectivity degradation of pavement markings. In the future, longitudinal retroreflectivity data should be collected from northern and western states. More sophisticated DTbased ensemble algorithms (i.e., XGboost and LightGBM) and deep learning techniques (i.e., Recurrent Neural Network) should be employed to develop marking performance prediction models. Additionally, lightweight machine learning models that require less data should be explored to increase applicability in cases with limited data availability. Models compatible with multifunctional road condition detection vehicles, which are being increasingly used for pavement surveys, should also be developed. The models with best performance might be utilized to develop a simple retroreflectivity prediction tool that can enhance the decisionmaking process of transportation agencies regarding future restriping activities for the marking products.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Rasdorf WJ, Hummer JE, Zhang G, Sitzabee W (2009) Pavement marking performance analysis. Fin Rep, NCDOT, Raleigh
Mohamed MM (2019) Evaluation and modeling of pavement marking characteristics based on laboratory and field data. Dissertation, University of Idaho, Ann Arbor
Migletz J, Graham JL, Harwood DW, Bauer KM (2001) Service life of durable pavement markings. Transp Res Rec 1749(1):13–21
Fu H, Wilmot CG (2013) evaluating alternative pavement marking materials. Public Work Manag Policy 18(3):279–297
Benz RJ, Pike AM, Kuchangi SP, Brackett Q (2009) Serviceable pavement marking retroreflectivity levels. Tech Rep FHWA/TX09/056561, TxDOT, Austin
Sitzabee WE, White ED, Dowling AW (2013) Degradation modeling of polyurea pavement markings. Public Work Manag Policy 18(2):185–199
Xu L, Chen Z, Li X, Xiao F (2021) Performance, environmental impact and cost analysis of marking materials in pavement engineering, thestateofart. J Clean Prod 294:126302
Hussein M, Sayed T, ElBasyouny K, de Leur P (2020) Investigating safety effects of wider longitudinal pavement markings. Accid Anal Prev 142:105527
Ozelim L, Turochy RE (2014) Modeling retroreflectivity performance of thermoplastic pavement markings in alabama. J Transp Eng 140(6):05014001
FHWA (2022) National standards for traffic control devices; the manual on uniform traffic control devices for streets and highways; maintaining pavement marking retroreflectivity. Fed Regist 87(150):47921–47931
Mousa MR, Mousa SR, Hassan M, Carlson P, Elnaml IA (2021) Predicting the retroreflectivity degradation of waterborne paint pavement markings using advanced machine learning techniques. Transp Res Rec 2675(9):483–494
Wang S (2010) Comparative analysis of ntpep pavement marking performance evaluation results. Dissertation, University of Akron, Akron
Thomas GB, Schloz C (2001) Durable, costeffective pavement markings phase I: synthesis of current research. Fin Rep Project No. TR454, IOWA DOT, Ames
Zhang Y, Wu D (2010) Methodologies to predict service lives of pavement marking materials. J Transp Res Forum 45(3):5–18
Lee JT, Maleck TL, Taylor WC (1999) Pavement making material evaluation study in michigan. ITE J 69(7):44
Abboud N, Bowman BL (2002) Costand longevitybased scheduling of paint and thermoplastic striping. Trans Res Rec 1794(1):55–62
Hollingsworth JD (2012) Understanding the impact of bead type on paint and thermoplastic pavement markings. Dissertation, Airforce Institute of Technology, Ohio
Sitzabee WE, Hummer JE, Rasdorf W (2009) Pavement marking degradation modeling and analysis. J Infrastruct Syst 15(3):190–199
Sarasua WA, Clarke DB, Davis WJ (2003) Evaluation of interstate pavement marking retroreflectivity. Fin Rep No. FHWASC0301, SCDOT, Columbia
Robertson J, Sarasua W, Johnson J, Davis W (2013) A methodology for estimating and comparing the lifecycles of highbuild and conventional waterborne pavement markings on primary and secondary roads in south carolina. Public Work Manag Policy 18(4):360–378
Malyuta DA (2015) Analysis of factors affecting pavement markings and pavement marking retroreflectivity in tennessee highways. Dissertation, University of Tennessee at Chattanooga, Chattanooga
James G, Witten D, Hastie T, Tibshirani R (2021) An introduction to statistical learning. Springer, New York
Umali J, Barrios E (2014) Nonparametric principal components regression. Commun Stat Comput 43(7):1797–1810
Kopf J (2004) Reflectivity of pavement markings: analysis of retroreflectivity curves. Res Rep WARD 592.1, WSDOT, Seattle
Karwa V, Donnell ET (2011) Predicting pavement marking retroreflectivity using artificial neural networks: exploratory analysis. J Transp Eng 137(2):91–103
Idris II, Mousa MR, Hassan M, Dhasmana H (2022) Predicting the retroreflectivity degradation of thermoplastic pavement markings with genetic algorithm. San Antonio
Aguinis H, Gottfredson RK, Joo H (2013) Bestpractice recommendations for defining, identifying, and handling outliers. Organ Res Methods 16:270–301
Benesty J, Chen J, Huang Y (2008) On the importance of the pearson correlation coefficient in noise reduction. IEEE Trans Audio Speech Lang Process 16(4):757–765
Mk UÇAR (2019) Eta correlation coefficient based feature selection algorithm for machine learning: escore feature selection algorithm. J Intell Syst Theory Appl 2(1):7–12
Jorge I (2011) The influence of the etutor on the development of collaborative critical thinking in a student’s eforum: association levels with cramer’s v. In: Old Meets New Media Educ 61st Int Counc Educ Media XIII Int Symp Comput Educ Jt Conf, University of Lisbon, Portugal
Yadav D (2019) Categorical encoding using labelencoding and onehotencoder. https://towardsdatascience.com/categoricalencodingusinglabelencodingandonehotencoder911ef77fb5bd. Accessed 15 Jul 2022
Agajanian S, Oluyemi O, Verkhivker GM (2019) Integration of random forest classifiers and deep convolutional neural networks for classification and biomolecular modeling of cancer driver mutations. Front Mol Biosci 6:44
De Ville B (2013) Decision trees. Wiley Interdiscip Rev Comput Stat 5(6):448–455
Karballaeezadeh N, Mohammadzadeh SD, Moazemi D, Band SS, Mosavi A, Reuter U (2020) Smart structural health monitoring of flexible pavements using machine learning methods. Coatings 10(11):1100
Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 1(1):14–23
Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316
Walczak S, Cerpa N (2019) Artificial neural networks. Encycl Phys Sci Technol 631–645
Mostafa B, ElAttar N, AbdElhafeez S, Awad W (2020) Machine and deep learning approaches in genome: review article. Alfarama J Basic Appl Sci 2(1):105–113
Jin M, Liao Q, Patil S, Abdulraheem A, AlShehri D, Glatz G (2022) Hyperparameter tuning of artificial neural networks for well production estimation considering the uncertainty in initialized parameters. ACS Omega 7(28):24145–24156
Sharma S, Sharma S, Athaiya A (2017) Activation functions in neural networks. Int J Eng Appl Sci Technol 4(12):310–316
Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39:261–283
Badr W (2019) Why feature correlation matters.... A Lot!. https://towardsdatascience.com/whyfeaturecorrelationmattersalot847e8ba439c4. Accessed 3 Apr 3 2023
Thenraj R (2020) Do decision trees need feature scaling. https://towardsdatascience.com/dodecisiontreesneedfeaturescaling97809eaa60c6. Accessed 3 Apr 2023
Roy B (2020) All about feature scaling. https://towardsdatascience.com/allaboutfeaturescalingbcc0ad75cb35. Accessed 15 2022
Thara TDK, Prema PS, Xiong F (2019) Autodetection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recognit Lett 128:544–550
Bengio Y, Goodfellow I, Courville A (2016) Deep learning. MIT press, Cambridge
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: 31st Conf. Neural Inf Process Syst (NIPS 2017), Neural Information Processing Systems Foundation, Inc. (NeurIPS), Long Beach
Pike AM, Songchitruksa P (2015) Predicting pavement marking service life with transverse test deck data. Transp Res Rec 2482(1):16–22
Acknowledgements
Not applicable.
Funding
This research was funded by the National Cooperative Highway Research Program (NCHRP) (Project Number: 2030/IDEA 237).
Author information
Authors and Affiliations
Contributions
MM and III developed the study concept. III collected the data, performed the analysis, developed models, and interpreted the results. III, MM, and MH prepared the draft manuscript. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Idris, I.I., Mousa, M. & Hassan, M. Modeling retroreflectivity degradation of pavement markings across the US with advanced machine learning algorithms. J Infrastruct Preserv Resil 5, 3 (2024). https://doi.org/10.1186/s4306502400094z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4306502400094z